Submitted by benanne t3_107g3yf in MachineLearning
thecodethinker t1_j3pichs wrote
Reply to comment by [deleted] in [R] Diffusion language models by benanne
Attention is still pretty confusing for me. I find diffusion much more intuitive fwiw.
DigThatData t1_j3v2gjs wrote
attention is essentially a dynamically weighted cross-product. if you haven't already seen this blog post, it's one of the more popular explanations: https://jalammar.github.io/illustrated-transformer/
benanne OP t1_j3qy47x wrote
I have an earlier blog post which is intended precisely to build intuition about diffusion :) https://benanne.github.io/2022/01/31/diffusion.html
DigThatData t1_j3v26zy wrote
i think you read that comment backwards :)
Viewing a single comment thread. View all comments