Submitted by Vegetable-Skill-9700 t3_121a8p4 in MachineLearning
DataBricks's open-source LLM, Dolly performs reasonably well on many instruction-based tasks while being ~25x smaller than GPT-3, challenging the notion that is big always better?
From my personal experience, the quality of the model depends a lot on the fine-tuning data as opposed to just the sheer size. If you choose your retraining data correctly, you can fine-tune your smaller model to perform better than the state-of-the-art GPT-X. The future of LLMs might look more open-source than imagined 3 months back?
Would love to hear everyone's opinions on how they see the future of LLMs evolving? Will it be few players (OpenAI) cracking the AGI and conquering the whole world or a lot of smaller open-source models which ML engineers fine-tune for their use-cases?
P.S. I am kinda betting on the latter and building UpTrain, an open-source project which helps you collect that high quality fine-tuning dataset
soggy_mattress t1_jdl4zkg wrote
I think of the 100b parameter models as analogous to the first room-sized computers that were built in the 70s. Seems the pattern is to first prove a concept, no matter how inefficiently, and then optimize it as much as possible.