Submitted by benanne t3_107g3yf in MachineLearning
[deleted] t1_j3opz0l wrote
Reply to comment by rodeowrong in [R] Diffusion language models by benanne
I think worth looking at for sure. The math behind isn’t “that” complex and the idea is pretty intuitive in my opinion. Take that from someone who took months to wrap their head around attention as a concept lol.
thecodethinker t1_j3pichs wrote
Attention is still pretty confusing for me. I find diffusion much more intuitive fwiw.
DigThatData t1_j3v2gjs wrote
attention is essentially a dynamically weighted cross-product. if you haven't already seen this blog post, it's one of the more popular explanations: https://jalammar.github.io/illustrated-transformer/
benanne OP t1_j3qy47x wrote
I have an earlier blog post which is intended precisely to build intuition about diffusion :) https://benanne.github.io/2022/01/31/diffusion.html
DigThatData t1_j3v26zy wrote
i think you read that comment backwards :)
Viewing a single comment thread. View all comments