TemperatureAmazing67 t1_jbzcn6a wrote on March 12, 2023 at 10:05 PM

Reply to comment by hebekec256 in [D] Is anyone trying to just brute force intelligence with enormous model sizes and existing SOTA architectures? Are there technical limitations stopping us? by hebekec256

>extensions of LLMs (like
>
>PALM-E
>
>) are a heck of a lot more than an abacus. I wonder what would happen if Google just said, "screw it", and scaled it from 500B to 50T parameters. I'm guessing there are reasons in the architecture that it would

The problem is that we have scaling laws for NN. We just do not have the data for 50T parameters. We need somehow to get these data. The answer on this question costs a lot.

TemperatureAmazing67 t1_jbzc8cc wrote on March 12, 2023 at 10:02 PM

Reply to comment by Username912773 in [D] Is anyone trying to just brute force intelligence with enormous model sizes and existing SOTA architectures? Are there technical limitations stopping us? by hebekec256

'require input to generate an output and do not have initiative' - use random or other's network output.

Also, the argument about next token is skrewed up. For a lot of task everything you need is perfectly predicted next token.