Viewing a single comment thread. View all comments

Cogwheel OP t1_j8o29pz wrote

>1. Distributed models would have to be updated. How do we update weights from two sources? (There might be options for this, I haven't looked.)

This strikes me as more of a software/hardware engineering challenge rather than one of network and training architecture. Definitely a challenge though.

>2. Potential for undesirable and unstable predictions/generations.

I think the same is true for humans. Given enough "perverse" inputs we can all go crazy. So it's definitely something to think about and mitigate. There would definitely need to be components built to work against these "forces"

>3. I think you'd have to allow the weights to update pretty dramatically at each inference to get any real variation. I think this would lead to #2

Interesting point... The time between acts of inference in an ML model are on the order of clocks (milliseconds for realtime perception systems, seconds to minutes for things like ChatGPT). Whereas animals experience essentially continuous input. Our eyes alone present us with many Mbps of data, is it were.

So without these vast swathes of data constantly being fed in, the alternative is to make bigger changes based on the limited data.

>4. Attention components probably do what you're looking for more accurately and efficiently.

Attention had crossed my mind when I posted this. I agree its intention is to accomplish a kind of weight redistribution based on previous input. But I still think this is more superficial/ephemeral than what I'm asking about. Humans certainly have attention mechanisms in our brains, but those attention mechanisms are subject to the same kinds of changes over time as the rest.

2