Submitted by bo_peng t3_1135aew in MachineLearning
Hi everyone. I am an independent researcher working on my pure RNN language model RWKV. I have finished the training of RWKV-4 14B (FLOPs sponsored by Stability EleutherAI - thank you!) and it is indeed very scalable. Note RWKV is parallelizable too, so it's combining the best of RNN and transformer.
The ChatRWKV project (let's build together):
https://github.com/BlinkDL/ChatRWKV
Zero-shot comparison with NeoX / Pythia (same dataset: the Pile) at same params count (14.2B):
​
Generation results (simply topP=0.85, no repetition penalty) - looks great with my magic prompt (sometimes even better than NeoX 20B):
​
​
​
​
​
Explanation, fine-tuning, training and more:
mz_gt t1_j8ofbiq wrote
This is really awesome! I’ve been seeing the progress of your work on RWKV and I have to ask: I know you’ve mentioned a lot of RWKV is using tricks from here and there, and adding a lot of your own tweaks of course, but have you considered writing a paper? There are plenty of highly renowned published works with less to say than RWKV.
I think a renewed discussion about RNNs is more than warranted right now given the current direction with transformers, and the highly complicated nature of HiPPOs are personally not something I see replacing it anytime soon.