Taenk t1_jdwlejh wrote on March 27, 2023 at 6:50 PM

Reply to comment by JohnyWalkerRed in [D] Instruct Datasets for Commercial Use by JohnyWalkerRed

The Open Assistant project is working on that as well.

Taenk t1_jdw3pn3 wrote on March 27, 2023 at 4:58 PM

Reply to [D] Instruct Datasets for Commercial Use by JohnyWalkerRed

https://open-assistant.io / /r/openassistant

Taenk t1_jctdmvi wrote on March 19, 2023 at 12:38 PM

Reply to comment by starstruckmon in [Research] Alpaca 7B language model running on my Pixel 7 by simpleuserhere

I haven’t tried the larger models unfortunately. However I wonder how the model could be „shockingly bad“ despite having almost three times the parameter count.

Taenk t1_jcs5eon wrote on March 19, 2023 at 3:26 AM

Reply to comment by legendofbrando in [Research] Alpaca 7B language model running on my Pixel 7 by simpleuserhere

A proper port to the neural engine would be especially interesting. There was one by Apple for Stable Diffusion.

Taenk t1_jcs53iw wrote on March 19, 2023 at 3:23 AM

Reply to comment by starstruckmon in [Research] Alpaca 7B language model running on my Pixel 7 by simpleuserhere

The results for LLaMA-33B quantised to 3bit are rather interesting. That would be an extremely potent LLM capable of running on consumer hardware. Pity that there are no test results for the 2bit version.

Taenk t1_jckzuxm wrote on March 17, 2023 at 4:23 PM

Reply to comment by londons_explorer in [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng

Sorry, I am not an expert, just an enthusiast, so this is a stupid question: Where can I see a list of these few hundred tests and is there some page where I can see comparisons between different models?

Taenk t1_jc33k5h wrote on March 13, 2023 at 6:12 PM

Reply to comment by cyvr_com in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef

Can you please link a source?

Taenk t1_jc02fzb wrote on March 13, 2023 at 1:19 AM

Reply to [R] Introducing Ursa from Speechmatics | 25% improvement over Whisper by jplhughes

Excellent demo on your page, I just used it on a YT video featuring a non-native English speaker. There was only a slight error in punctuation due to an ambiguously long pause in the speech.

Is this a purely commercial product or will there be an open source release?

Taenk t1_jbzaeau wrote on March 12, 2023 at 9:49 PM

Reply to comment by kkg_scorpio in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692

Isn't 1-bit quantisation qualitatively different as you can do optimizations only available if the parameters are fully binary?

Taenk t1_jbdidpy wrote on March 8, 2023 at 6:51 AM

Reply to comment by CKtalon in [D] Can someone explain the discrepancy between the findings of LLaMA and Chinchilla? by __Maximum__

Can you rephrase that a little bit? Does it mean that Chinchilla answers „assuming that you have one Teraflop of compute time, use 20 tokens of data per parameter of model, then you hit diminishing returns in the sense that you could train another model from scratch faster“ and LLaMA answers „assuming you want optimal performance at inference time, regardless of compute budget, even small models can benefit from larger datasets“?

Taenk t1_ja4jjxn wrote on February 26, 2023 at 8:07 PM

Reply to comment by currentscurrents in [P] [N] Democratizing the chatGPT technology through a Q&A game by coconautico

Especially having conversation trees in multiple languages is very valuable.

Taenk t1_ja4jcxm wrote on February 26, 2023 at 8:06 PM

Reply to [P] [N] Democratizing the chatGPT technology through a Q&A game by coconautico

Subreddit: /r/openassistant

Taenk t1_j99bo8q wrote on February 20, 2023 at 6:11 AM

Reply to [P] I've been commissioned to make 1000+ variations of my unique geometric art, while retaining its essential characteristics. It's been suggested that I use GAN to create permutations of my art. Any advice/directions? by eternalvisions

Maybe ask over at /r/stablediffusion and check out aesthetic gradients over there. Might be able to replicate your art style and scale it to the thousands of images you'll need to generate.

Taenk t1_j95rfg2 wrote on February 19, 2023 at 1:38 PM

Reply to [D] Toolformer implementation using only few-shot prompting by MysteryInc152

Can you please link the demo without going through twitter? It won’t load for me.

Taenk t1_j8nfwkh wrote on February 15, 2023 at 4:30 PM

Reply to [D] List of Large Language Models to play with. by sinavski

Comprehensive list of LLMs.

Taenk t1_j8ckvh2 wrote on February 13, 2023 at 8:19 AM

Reply to [R] [N] Toolformer: Language Models Can Teach Themselves to Use Tools - paper by Meta AI Research by radi-cho

Now what if the tool the LLM uses is the training API for itself …

Taenk t1_j68a468 wrote on January 28, 2023 at 1:18 PM

Reply to comment by picardythird in [D] MusicLM: Generating Music From Text by carlthome

> Whenever I see music generation models, I immediately go to the "classical" examples (or as close to classical as are provided). The reason for this is that while some genres such as techno, drum 'n' bass, 8-bit, and hip hop are "simple" (from a music theory perspective), and other genres such as ambient, relaxing jazz, swing, and dream pop are vague enough that the model can get by just from spitting out the right general timbre, generating classical music requires understanding of structure, style, and form.

> Frankly, I'm not particularly impressed. […]

> […]

> This is not to say that the model is not impressive in other ways. Its ability to mimic the styles of different genres is quite good (although the "swing" example in the Long Generation section loses focus halfway through), and the style transfer elements are quite interesting as well. However, music generation models have a long way to go when it comes to idiomatic understanding of the structural elements of music.

It feels similar to earlier LLMs: It is, by today's standards, extremely easy to generate a model that generates vaguely correct looking text in the sense that the words have reasonable length and the characters have a reasonable distribution. Only at later stages do the models manage to output vaguely correct words with minor spelling mistakes. At that point the grammar is still complete nonsense, as well as the semantics. Only very recently did LLMs manage to stay coherent over larger blocks of text.

Relatedly, diffusor-based image generation has a similar thing going on: Textures are frighteningly great. Image composition and logic not so much.

I think for music generating models we are at the stage where they get the texture and syllables right, that is the overall sound, but not at the stage where image composition and grammer is quite there, that is chord progression, melody, themes and overall composition.

Taenk t1_j688cev wrote on January 28, 2023 at 1:00 PM

Reply to comment by maizeq in [R] SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot by Secure-Technology-78

> I’m not sure a 2.8 trillion token dataset actually exists

DeepMind's Massive Text is assumed to be 10TB large, the largest publically available dataset is The Pile and weighs in at about 820GB.

A 2.8 trillion token dataset would need to be more than 20TB large, which could be possible by including more of Common Crawl - weighing in at 380TiB - or non-English resources. I have a suspicion that training LLMs on more languages, especially outside of the Indo-European family, will improve performance within the Indo-European family.

Taenk t1_j60gdbl wrote on January 26, 2023 at 8:59 PM

Reply to comment by cdsmith in Few questions about scalability of chatGPT [D] by besabestin

Do these also increase inference speed? How much work is it to switch from CUDA based software to one of these?

Taenk t1_j4zcu0e wrote on January 19, 2023 at 8:49 AM

Reply to [P] RWKV 14B Language Model & ChatRWKV : pure RNN (attention-free), scalable and parallelizable like Transformers by bo_peng

Do I understand correctly that I could run this model at home on a graphics card with 8GB VRAM?

Taenk t1_j2sgndx wrote on January 3, 2023 at 5:19 PM

Reply to comment by Purplekeyboard in [R] Massive Language Models Can Be Accurately Pruned in One-Shot by starstruckmon

Compared to what? I have been playing with it for a little bit via Petals and it performs decently, although ChatGPT certainly sets a very high bar of success. Personally I think that it is a shame, that OpenAI gets exclusive access to the absolutely massive dataset of interacting with actual humans and models like BLOOM could certainly profit from having publically accessible interactions.

Taenk t1_j2sc1a2 wrote on January 3, 2023 at 4:50 PM

Reply to [R] Massive Language Models Can Be Accurately Pruned in One-Shot by starstruckmon

So you need 5 RTX 3090 to run BLOOM-176B at home instead of 8.

Taenk t1_j27vxn1 wrote on December 30, 2022 at 7:26 AM

Reply to [P]Run CLIP on your iPhone to Search Photos offline. by RingoCatKeeper

I think you could port this to the M-chip MacBooks as well.

Taenk t1_j16lvo8 wrote on December 22, 2022 at 1:17 AM

Reply to comment by [deleted] in [R] Nonparametric Masked Language Modeling - MetaAi 2022 - NPM - 500x fewer parameters than GPT-3 while outperforming it on zero-shot tasks by Singularian2501

Anyone got a demo running?

Taenk t1_j12vbzy wrote on December 21, 2022 at 7:00 AM

Reply to [D] What GPT-esque model/platform returns peer-reviewed sources with outputs? by EntireInflation8663

There is this project presented in this sub.