Viewing a single comment thread. View all comments

currentscurrents t1_j3eo4uc wrote

There's plenty of work to be done in researching language models that train more efficiently or run on smaller machines.

ChatGPT is great, but it needed 600GB of training data and megawatts of power. It must be possible to do better; the average human brain runs on 12W and has seen maybe a million words tops.

2

singularpanda OP t1_j3eohh7 wrote

Yes, it is quite costy. However, it seems not easy to modify it in our research as it is not open.

1

KBM_KBM t1_j3g7swj wrote

https://github.com/lucidrains/PaLM-rlhf-pytorch

Similar to chat get architecture you can play with this

2

singularpanda OP t1_j3gdv9p wrote

Thanks! Yes, there are many similar things. But the ChatGPT seems to have the most amazing performance.

1

KBM_KBM t1_j3gere2 wrote

True but practically training a gpt model is not computationally cheap. I think instead of making such generalized language models we need to focus more one subject specific language models.

1

f_max t1_j3frhxs wrote

Megawatt sounds right for training. But kilowatts for inference. Take a look at tim dettmer’s work (he’s at UW) on int8 to see some of this kind of efficiency work. There’s definitely significant work happening in the open.

1