Hello! I'm trying to understand what available LLMs one can "relatively easily" play with. My goal is to understand the landscape since I haven't worked in this field before. I'm trying to run them "from the largest to the smallest".

By "relatively easy", I mean doesn't require to setup a GPU cluster or costs more than $20:)

Here are some examples I have found so far:

ChatGPT (obviously) - 175B params
OpenAI api to access GPT-3s (from ada (0.5B) to davinci (175B)). Also CodeX
Bloom (176B) - text window on that page seems to work reliably, you just need to keep pressing "generate"
OPT-175B (Facebook LLM), the hosting works surprisingly fast, but slower than ChatGPT
Several models on HuggingFace that I made to run with Colab Pro subscription: GPT-NeoX 20B, Flan-t5-xxl 11B, Xlm-roberta-xxl 10.7B, GPT-j 6B. I spent about $20 total on running the models below. None of the Hugging face API interfaces/spaces didn't work for me :(. Here is an example notebook I made for NeoX.

Does anyone know more models that are easily accessible?

P.S. Some large models I couldn't figure out (yet) how to run easily: Galactica-120b 120B Opt-30b 30B

Comments

You must log in or register to comment.

gopher9 t1_j7cbdlg wrote on February 5, 2023 at 7:07 PM

RWKV 14B, trained on The Pile.

m98789 t1_j7ffidp wrote on February 6, 2023 at 12:11 PM

 (final release around Feb-15-2023):

gopher9 t1_j7fq4kw wrote on February 6, 2023 at 1:53 PM

With RWKV-4-Pile-14B-20230204-7324.pth released 2 hours ago, as you can see at https://huggingface.co/BlinkDL/rwkv-4-pile-14b/tree/main.

But yeah, it's still WIP.

MysteryInc152 t1_j7c1kwr wrote on February 5, 2023 at 6:03 PM

GLM-130b https://huggingface.co/spaces/THUDM/GLM-130B

Cohere's models https://cohere.ai/

Aleph Alpha's models https://app.aleph-alpha.com/

AI21's models https://www.google.com/url?sa=t&source=web&rct=j&url=https://studio.ai21.com/&ved=2ahUKEwigktrH-_78AhWAFlkFHefHC3IQFnoECAsQAQ&usg=AOvVaw1L0TKoIvBtSFFB1oJsG5nW

Cheap_Meeting t1_j7cit9i wrote on February 5, 2023 at 7:58 PM

Are any benchmark scores such as MMLU or BigBench available for Aleph Alpha's models?

MysteryInc152 t1_j7fymgb wrote on February 6, 2023 at 2:58 PM

don't think so

mrpogiface t1_j7g03gj wrote on February 6, 2023 at 3:09 PM

Do we actually know that chatGPT is the full 175B? With codex being 13B and still enormously powerful, and previous instruction tuned models (in the paper) being 6.7B it seems likely that they have it working on a much smaller parameter count

Cheap_Meeting t1_j7chivx wrote on February 5, 2023 at 7:49 PM

In terms of Consumer Apps, the Poe app from Quora has access to two models from Open AI and one from Anthropic.

Perplexity.ai, YouChat and Neeva are search engines that integrated LLMs.

Google has an AI + Search Event on Wednesday where they are likely to announce something as well.

In terms of APIs and getting a feeling for these models, I would use OpenAI's APIs. Their models are the best publically available models. Open Source models are still far behind.

danysdragons t1_j7gt2ak wrote on February 6, 2023 at 6:19 PM

To pre-empt possible confusion by people wanting to try YouChat, its URL is you.com/chat, while youchat.com is an unrelated messaging service.

MysteryInc152 t1_j7g83pw wrote on February 6, 2023 at 4:04 PM

GLM-130B is really really good. https://crfm.stanford.edu/helm/latest/?group=core_scenarios

I think some instruction tuning is all it needs to match the text-davinci models

Cheap_Meeting t1_j7j70tj wrote on February 7, 2023 at 4:17 AM

That's not my takeway. GLM-130B is even behind OPT according to the mean win rate, and the instruction tuned version of OPT in turn is worse than FLAN-T5 which is a 10x smaller model (https://arxiv.org/pdf/2212.12017.pdf Table 14)

MysteryInc152 t1_j7ja39c wrote on February 7, 2023 at 4:45 AM

I believe the fine-tuning dataset matters as well as the model but I guess we'll see. I think they plan on fine-tuning.

The set used to tune OPT doesn't contain any chain of thought.

NoLifeGamer2 t1_j7geyw5 wrote on February 6, 2023 at 4:50 PM

I love how bloom was just like "F*ck it let's one-up openAI"

sinavski OP t1_j7gi1xs wrote on February 6, 2023 at 5:10 PM

Yeah, I think its a just like a 1B MLP with random weights not connected to any outputs:)

NoLifeGamer2 t1_j7gin1l wrote on February 6, 2023 at 5:13 PM

Honestly wouldn't be surprised lol

visarga t1_j7hhqvc wrote on February 6, 2023 at 8:55 PM

Does Bloom do tasks? is it well behaved?

farmingvillein t1_j7jboe1 wrote on February 7, 2023 at 5:00 AM

bloom is pretty terrible, unfortunately

yaosio t1_j7gtm5q wrote on February 6, 2023 at 6:23 PM

I've been trying out you.com's chatbot and it seems to work well, sometimes. It has the same problem ChatGPT has with just making stuff up, but it provides sources (real and imagined) so if it lies you can actually check. I asked it what Todd Howard's favorite cake it and it gave me an authorative answer without a source, and when I asked for a source it gave me a Gamerant link that didn't exist. When it does provide a source it notates it like Wikipedia. It also can access the Internet as it was able to tell me about events that happened in the last 24 hours.

It's able to produce code, and you can have a conversation with it but it really prefers to give information from the web whenever possible. It won't tell me what model they use, it could be their own proprietary model. They also have Stable Diffusion, and a text generator but I don't know what model that is.

Chatbot: https://you.com/search?q=who+are+you&tbm=youchat&cfr=chat

Stable Diffusion: https://you.com/search?q=python&fromSearchBar=true&tbm=imagine

Text generator: https://you.com/search?q=python&fromSearchBar=true&tbm=youwrite

CriticalTemperature1 t1_j7fzacn wrote on February 6, 2023 at 3:03 PM

Google has their AI Test Kitchen for LaMDA

Taenk t1_j8nfwkh wrote on February 15, 2023 at 4:30 PM

Comprehensive list of LLMs.

xeneks t1_j7ki4qg wrote on February 7, 2023 at 1:36 PM

I am looking at parametric search, where I can highlight in a graph-database style way, the mistakes with the results, by reassigning weights or links, to redo the search, until I get answers that are more correct, based off things like 'water isn't useful for cleaning dried paint, acetone or paint thinners may be more useful'. Is it possible to build such features into any of the open source tools here, or are lacking any gui for the feedback, beyond text and a thumb up or down as one sees in the commercial packages?

lostmsu t1_j7mia4m wrote on February 7, 2023 at 9:38 PM

I would love to see comparison of these models on some common tasks.