Submitted by sinavski t3_10uh62c in MachineLearning

Hello! I'm trying to understand what available LLMs one can "relatively easily" play with. My goal is to understand the landscape since I haven't worked in this field before. I'm trying to run them "from the largest to the smallest".

By "relatively easy", I mean doesn't require to setup a GPU cluster or costs more than $20:)

Here are some examples I have found so far:

  1. ChatGPT (obviously) - 175B params
  2. OpenAI api to access GPT-3s (from ada (0.5B) to davinci (175B)). Also CodeX
  3. Bloom (176B) - text window on that page seems to work reliably, you just need to keep pressing "generate"
  4. OPT-175B (Facebook LLM), the hosting works surprisingly fast, but slower than ChatGPT
  5. Several models on HuggingFace that I made to run with Colab Pro subscription: GPT-NeoX 20B, Flan-t5-xxl 11B, Xlm-roberta-xxl 10.7B, GPT-j 6B. I spent about $20 total on running the models below. None of the Hugging face API interfaces/spaces didn't work for me :(. Here is an example notebook I made for NeoX.

Does anyone know more models that are easily accessible?

P.S. Some large models I couldn't figure out (yet) how to run easily: Galactica-120b 120B Opt-30b 30B

86

Comments

You must log in or register to comment.

mrpogiface t1_j7g03gj wrote

Do we actually know that chatGPT is the full 175B? With codex being 13B and still enormously powerful, and previous instruction tuned models (in the paper) being 6.7B it seems likely that they have it working on a much smaller parameter count

7

Cheap_Meeting t1_j7chivx wrote

In terms of Consumer Apps, the Poe app from Quora has access to two models from Open AI and one from Anthropic.

Perplexity.ai, YouChat and Neeva are search engines that integrated LLMs.

Google has an AI + Search Event on Wednesday where they are likely to announce something as well.

In terms of APIs and getting a feeling for these models, I would use OpenAI's APIs. Their models are the best publically available models. Open Source models are still far behind.

6

MysteryInc152 t1_j7g83pw wrote

GLM-130B is really really good. https://crfm.stanford.edu/helm/latest/?group=core_scenarios

I think some instruction tuning is all it needs to match the text-davinci models

1

Cheap_Meeting t1_j7j70tj wrote

That's not my takeway. GLM-130B is even behind OPT according to the mean win rate, and the instruction tuned version of OPT in turn is worse than FLAN-T5 which is a 10x smaller model (https://arxiv.org/pdf/2212.12017.pdf Table 14)

1

MysteryInc152 t1_j7ja39c wrote

I believe the fine-tuning dataset matters as well as the model but I guess we'll see. I think they plan on fine-tuning.

The set used to tune OPT doesn't contain any chain of thought.

1

NoLifeGamer2 t1_j7geyw5 wrote

I love how bloom was just like "F*ck it let's one-up openAI"

3

sinavski OP t1_j7gi1xs wrote

Yeah, I think its a just like a 1B MLP with random weights not connected to any outputs:)

2

yaosio t1_j7gtm5q wrote

I've been trying out you.com's chatbot and it seems to work well, sometimes. It has the same problem ChatGPT has with just making stuff up, but it provides sources (real and imagined) so if it lies you can actually check. I asked it what Todd Howard's favorite cake it and it gave me an authorative answer without a source, and when I asked for a source it gave me a Gamerant link that didn't exist. When it does provide a source it notates it like Wikipedia. It also can access the Internet as it was able to tell me about events that happened in the last 24 hours.

It's able to produce code, and you can have a conversation with it but it really prefers to give information from the web whenever possible. It won't tell me what model they use, it could be their own proprietary model. They also have Stable Diffusion, and a text generator but I don't know what model that is.

Chatbot: https://you.com/search?q=who+are+you&tbm=youchat&cfr=chat

Stable Diffusion: https://you.com/search?q=python&fromSearchBar=true&tbm=imagine

Text generator: https://you.com/search?q=python&fromSearchBar=true&tbm=youwrite

3

xeneks t1_j7ki4qg wrote

I am looking at parametric search, where I can highlight in a graph-database style way, the mistakes with the results, by reassigning weights or links, to redo the search, until I get answers that are more correct, based off things like 'water isn't useful for cleaning dried paint, acetone or paint thinners may be more useful'. Is it possible to build such features into any of the open source tools here, or are lacking any gui for the feedback, beyond text and a thumb up or down as one sees in the commercial packages?

1

lostmsu t1_j7mia4m wrote

I would love to see comparison of these models on some common tasks.

1