CosmicVo

CosmicVo t1_ix2q8f1 wrote

Reply to comment by michael_mullet in 2023 predictions by ryusan8989

Scale is indeed not all we need. In fact GPT-4 has less parameters than GPT-3. Or the same. Idk. Anyway the focus is shifting toward trainingdata (e.g. learning rate, batch size, sequence length, etc). They’re trying to find optimal models instead of just bigger ones. Hyperparameter tuning is unfeasible for larger models but result in a performance increase equivalent to doubling the number of parameters.

4