Submitted by ThePerson654321 t3_11lq5j4 in MachineLearning
The machine learning (ML) community is progressing at a remarkable pace and is embracing new techniques very quickly. Based on my comprehension of this model, it appears to offer a distinct set of advantages relative to transformers, while lacking any real drawbacks. Despite these benefits, it remains unclear why adopting this approach is not more widespread among individuals and organizations in the field.
Why is this the case? I really can't wrap my head around it. The RWKV principle has existed for more than a year now and has more than 2k stars on GitHub! I feel like we should have seen wider adoption.
Any thoughts?
Just to sum things up:
/u/LetterRip explains this by saying that the larger organizations basically just haven't noticed/understood it's potential yet.
My explaination is that it's actually something problematic with the RWKV architecture. Still wondering what it is though.
LetterRip t1_jbjfiyg wrote
The larger models (3B, 7B, 14B) have only been released quite recently
Information about the design has been fairly scarce/hard to track down because no paper has been written on it and submitted
people want to know that it actually scales before investing work into it.
Mostly people are learning about it from the release links to reddit and the posts haven't been in such a manner to attract interest.