Taenk
Taenk t1_jdw3pn3 wrote
https://open-assistant.io / /r/openassistant
Taenk t1_jctdmvi wrote
Reply to comment by starstruckmon in [Research] Alpaca 7B language model running on my Pixel 7 by simpleuserhere
I haven’t tried the larger models unfortunately. However I wonder how the model could be „shockingly bad“ despite having almost three times the parameter count.
Taenk t1_jcs5eon wrote
Reply to comment by legendofbrando in [Research] Alpaca 7B language model running on my Pixel 7 by simpleuserhere
A proper port to the neural engine would be especially interesting. There was one by Apple for Stable Diffusion.
Taenk t1_jcs53iw wrote
Reply to comment by starstruckmon in [Research] Alpaca 7B language model running on my Pixel 7 by simpleuserhere
The results for LLaMA-33B quantised to 3bit are rather interesting. That would be an extremely potent LLM capable of running on consumer hardware. Pity that there are no test results for the 2bit version.
Taenk t1_jckzuxm wrote
Reply to comment by londons_explorer in [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng
Sorry, I am not an expert, just an enthusiast, so this is a stupid question: Where can I see a list of these few hundred tests and is there some page where I can see comparisons between different models?
Taenk t1_jc33k5h wrote
Reply to comment by cyvr_com in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
Can you please link a source?
Taenk t1_jc02fzb wrote
Excellent demo on your page, I just used it on a YT video featuring a non-native English speaker. There was only a slight error in punctuation due to an ambiguously long pause in the speech.
Is this a purely commercial product or will there be an open source release?
Taenk t1_jbzaeau wrote
Reply to comment by kkg_scorpio in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692
Isn't 1-bit quantisation qualitatively different as you can do optimizations only available if the parameters are fully binary?
Taenk t1_jbdidpy wrote
Reply to comment by CKtalon in [D] Can someone explain the discrepancy between the findings of LLaMA and Chinchilla? by __Maximum__
Can you rephrase that a little bit? Does it mean that Chinchilla answers „assuming that you have one Teraflop of compute time, use 20 tokens of data per parameter of model, then you hit diminishing returns in the sense that you could train another model from scratch faster“ and LLaMA answers „assuming you want optimal performance at inference time, regardless of compute budget, even small models can benefit from larger datasets“?
Taenk t1_ja4jjxn wrote
Reply to comment by currentscurrents in [P] [N] Democratizing the chatGPT technology through a Q&A game by coconautico
Especially having conversation trees in multiple languages is very valuable.
Taenk t1_ja4jcxm wrote
Subreddit: /r/openassistant
Taenk t1_j99bo8q wrote
Reply to [P] I've been commissioned to make 1000+ variations of my unique geometric art, while retaining its essential characteristics. It's been suggested that I use GAN to create permutations of my art. Any advice/directions? by eternalvisions
Maybe ask over at /r/stablediffusion and check out aesthetic gradients over there. Might be able to replicate your art style and scale it to the thousands of images you'll need to generate.
Taenk t1_j95rfg2 wrote
Can you please link the demo without going through twitter? It won’t load for me.
Taenk t1_j8nfwkh wrote
Taenk t1_j8ckvh2 wrote
Reply to [R] [N] Toolformer: Language Models Can Teach Themselves to Use Tools - paper by Meta AI Research by radi-cho
Now what if the tool the LLM uses is the training API for itself …
Taenk t1_j68a468 wrote
Reply to comment by picardythird in [D] MusicLM: Generating Music From Text by carlthome
> Whenever I see music generation models, I immediately go to the "classical" examples (or as close to classical as are provided). The reason for this is that while some genres such as techno, drum 'n' bass, 8-bit, and hip hop are "simple" (from a music theory perspective), and other genres such as ambient, relaxing jazz, swing, and dream pop are vague enough that the model can get by just from spitting out the right general timbre, generating classical music requires understanding of structure, style, and form.
> Frankly, I'm not particularly impressed. […]
> […]
> This is not to say that the model is not impressive in other ways. Its ability to mimic the styles of different genres is quite good (although the "swing" example in the Long Generation section loses focus halfway through), and the style transfer elements are quite interesting as well. However, music generation models have a long way to go when it comes to idiomatic understanding of the structural elements of music.
It feels similar to earlier LLMs: It is, by today's standards, extremely easy to generate a model that generates vaguely correct looking text in the sense that the words have reasonable length and the characters have a reasonable distribution. Only at later stages do the models manage to output vaguely correct words with minor spelling mistakes. At that point the grammar is still complete nonsense, as well as the semantics. Only very recently did LLMs manage to stay coherent over larger blocks of text.
Relatedly, diffusor-based image generation has a similar thing going on: Textures are frighteningly great. Image composition and logic not so much.
I think for music generating models we are at the stage where they get the texture and syllables right, that is the overall sound, but not at the stage where image composition and grammer is quite there, that is chord progression, melody, themes and overall composition.
Taenk t1_j688cev wrote
Reply to comment by maizeq in [R] SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot by Secure-Technology-78
> I’m not sure a 2.8 trillion token dataset actually exists
DeepMind's Massive Text is assumed to be 10TB large, the largest publically available dataset is The Pile and weighs in at about 820GB.
A 2.8 trillion token dataset would need to be more than 20TB large, which could be possible by including more of Common Crawl - weighing in at 380TiB - or non-English resources. I have a suspicion that training LLMs on more languages, especially outside of the Indo-European family, will improve performance within the Indo-European family.
Taenk t1_j60gdbl wrote
Reply to comment by cdsmith in Few questions about scalability of chatGPT [D] by besabestin
Do these also increase inference speed? How much work is it to switch from CUDA based software to one of these?
Taenk t1_j4zcu0e wrote
Reply to [P] RWKV 14B Language Model & ChatRWKV : pure RNN (attention-free), scalable and parallelizable like Transformers by bo_peng
Do I understand correctly that I could run this model at home on a graphics card with 8GB VRAM?
Taenk t1_j2sgndx wrote
Reply to comment by Purplekeyboard in [R] Massive Language Models Can Be Accurately Pruned in One-Shot by starstruckmon
Compared to what? I have been playing with it for a little bit via Petals and it performs decently, although ChatGPT certainly sets a very high bar of success. Personally I think that it is a shame, that OpenAI gets exclusive access to the absolutely massive dataset of interacting with actual humans and models like BLOOM could certainly profit from having publically accessible interactions.
Taenk t1_j2sc1a2 wrote
So you need 5 RTX 3090 to run BLOOM-176B at home instead of 8.
Taenk t1_j27vxn1 wrote
I think you could port this to the M-chip MacBooks as well.
Taenk t1_j16lvo8 wrote
Reply to comment by [deleted] in [R] Nonparametric Masked Language Modeling - MetaAi 2022 - NPM - 500x fewer parameters than GPT-3 while outperforming it on zero-shot tasks by Singularian2501
Anyone got a demo running?
Taenk t1_j12vbzy wrote
Reply to [D] What GPT-esque model/platform returns peer-reviewed sources with outputs? by EntireInflation8663
There is this project presented in this sub.
Taenk t1_jdwlejh wrote
Reply to comment by JohnyWalkerRed in [D] Instruct Datasets for Commercial Use by JohnyWalkerRed
The Open Assistant project is working on that as well.