Taenk

Taenk t1_jckzuxm wrote

Sorry, I am not an expert, just an enthusiast, so this is a stupid question: Where can I see a list of these few hundred tests and is there some page where I can see comparisons between different models?

3

Taenk t1_jbdidpy wrote

Can you rephrase that a little bit? Does it mean that Chinchilla answers „assuming that you have one Teraflop of compute time, use 20 tokens of data per parameter of model, then you hit diminishing returns in the sense that you could train another model from scratch faster“ and LLaMA answers „assuming you want optimal performance at inference time, regardless of compute budget, even small models can benefit from larger datasets“?

1

Taenk t1_j99bo8q wrote

Maybe ask over at /r/stablediffusion and check out aesthetic gradients over there. Might be able to replicate your art style and scale it to the thousands of images you'll need to generate.

5

Taenk t1_j68a468 wrote

> Whenever I see music generation models, I immediately go to the "classical" examples (or as close to classical as are provided). The reason for this is that while some genres such as techno, drum 'n' bass, 8-bit, and hip hop are "simple" (from a music theory perspective), and other genres such as ambient, relaxing jazz, swing, and dream pop are vague enough that the model can get by just from spitting out the right general timbre, generating classical music requires understanding of structure, style, and form.

> Frankly, I'm not particularly impressed. […]

> […]

> This is not to say that the model is not impressive in other ways. Its ability to mimic the styles of different genres is quite good (although the "swing" example in the Long Generation section loses focus halfway through), and the style transfer elements are quite interesting as well. However, music generation models have a long way to go when it comes to idiomatic understanding of the structural elements of music.

It feels similar to earlier LLMs: It is, by today's standards, extremely easy to generate a model that generates vaguely correct looking text in the sense that the words have reasonable length and the characters have a reasonable distribution. Only at later stages do the models manage to output vaguely correct words with minor spelling mistakes. At that point the grammar is still complete nonsense, as well as the semantics. Only very recently did LLMs manage to stay coherent over larger blocks of text.

Relatedly, diffusor-based image generation has a similar thing going on: Textures are frighteningly great. Image composition and logic not so much.

I think for music generating models we are at the stage where they get the texture and syllables right, that is the overall sound, but not at the stage where image composition and grammer is quite there, that is chord progression, melody, themes and overall composition.

4

Taenk t1_j688cev wrote

> I’m not sure a 2.8 trillion token dataset actually exists

DeepMind's Massive Text is assumed to be 10TB large, the largest publically available dataset is The Pile and weighs in at about 820GB.

A 2.8 trillion token dataset would need to be more than 20TB large, which could be possible by including more of Common Crawl - weighing in at 380TiB - or non-English resources. I have a suspicion that training LLMs on more languages, especially outside of the Indo-European family, will improve performance within the Indo-European family.

2

Taenk t1_j2sgndx wrote

Compared to what? I have been playing with it for a little bit via Petals and it performs decently, although ChatGPT certainly sets a very high bar of success. Personally I think that it is a shame, that OpenAI gets exclusive access to the absolutely massive dataset of interacting with actual humans and models like BLOOM could certainly profit from having publically accessible interactions.

3