Submitted by Vegetable-Skill-9700 t3_121a8p4 in MachineLearning
PilotThen t1_jdppmpl wrote
Reply to comment by currentscurrents in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
There's also the point that they optimise for computer power at training time.
In mass deployment computer power at inference time starts to matter.
Viewing a single comment thread. View all comments