alrunan

alrunan t1_jdmm3lw wrote on March 25, 2023 at 3:02 PM

Reply to comment by harharveryfunny in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700

The 7B model is trained on 1T tokens and performs really well for its number of parameters.

alrunan t1_jdmbv4k wrote on March 25, 2023 at 1:42 PM

Reply to comment by harharveryfunny in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700

The chinchilla scaling laws is just used to calculate the optimal scale for dataset and model size for a particular training budget.

You should read the LLaMA paper.