Submitted by Desi___Gigachad t3_126rgih in MachineLearning
mattsverstaps t1_jedlch4 wrote
Reply to comment by FermiAnyon in [D] Turns out, Othello-GPT does have a world model. by Desi___Gigachad
So is that saying that there is a kind of linear transformation happening between some space (the reality? Our personal model?) and the embedding space? I don’t know what embedding space is and I shouldn’t be here but you are saying interesting things.
FermiAnyon t1_jee34lx wrote
Glad you're here. This would be a really interesting chat for like a bar or a meetup or stunting ;)
But yeah, I'm just giving my impressions. I don't want to make any claims of authority or anything as I'm self taught with this stuff...
But yeah, I have no idea how our brains do it, but when you're building a model whether it's a neural net or you're just factoring a matrix, you'll end up with a high dimensional representation that'll get used as an input to another layer or that'll just be used straight away for classification. It may be overly broad, but I think of all of those high dimensional representations as embeddings and the dimensionality available for encoding an embedding as the embedding space.
Like if you were into sports and you wanted to organize your room so that distance represents relationships between equipment. Maybe the baseball is right next to the softball and the tennis racket is close to the table tennis paddle, but they're a little farther away from the baseball stuff, then you've got some golf clubs and they're kind of in one area of the room because they all involve hitting things with another thing. Then your kite flying stuff and your fishing stuff and your street luge stuff is kind of as far apart as possible from the other stuff because it's not obvious to me anyway that they're related. Your room is a two dimensional embedding space.
When models do it, they just do it with more dimensions and more concepts, but they learn where to put things so that the relationships are properly represented and they just learn all that from lots of cleverly crafted examples.
monks-cat t1_jefqotb wrote
Context radically changes the "distance" between concepts. So in your example isotropy isn't necessarily a desired property of a LLM. In poetry, for example, we combine two concepts that would seemingly be very far apart in the original space but should be mapped rather closely in the embedding.
​
The problem I see with this whole idea though is that a "concept" doesn't inherently seem to be represented by list of features. Two concepts interacting aren't necessarily the intersection of their features.
I'll try to see if I can come up with concrete examples in language.
FermiAnyon t1_jegh3hd wrote
In this case, I'm using a fuzzy word "concept" to refer to anything that's differentiable from another thing. That includes things like context and semantics and whether a word is polysemantic and even whether things fit a rhyme scheme. Basically anything observable.
But again, I'm shooting from the hip
Pas7alavista t1_jefcp4e wrote
Embedding is a way to map the high dimension vectors in your input space to a lower dimension space.
mattsverstaps t1_jefvu45 wrote
So the extra dimensions are unnecessary? I just realised that there could be some situations in which non orthogonal dimensions are preferable. I can’t exactly think of them. Doesn’t it suggest a pattern in data if a mapping is found that reduces the dimension? Like I picture from linear algebra 101 finding a line that everything is a multiple of so one dimension would do and that line is a ‘pattern’? Sorry I’m high.
Pas7alavista t1_jeg5dhh wrote
>so the extra dimensions are unnecessary
Yes one reason for embedding is to get extract relevant features.
Also, any finite dimensional inner product space has an orthonormal basis, and the math is easiest this way so there's not much of a reason to describe a space using non orthogonal dimensions. There is also nothing stopping you from doing so though.
>Doesn't it suggest a pattern in data if a mapping is found that reduces dimension
Yeah generally you wouldn't attempt to use ML methods on data where you think there is no pattern
>Something something Linear algebra
I think you might be thinking about the span and or basis but it's hard for me to interpret your question
mattsverstaps t1_jegdsq4 wrote
Yes the span, so if we discover that a set of points is actually all in the span of a line, that line is a kind of fact or pattern about the points. So probably there is an equivalent in higher dimensions. I am seeing there is a problem whereby we introduce our own bias in creating our model.
Pas7alavista t1_jegu8de wrote
The span describes the entire space. It's a set of vectors that you can combine using addition and multiplication in order to obtain any other vector in the space. For example a spanning set over the real number plane would be {(1,0), (0,1)}. This particular set is also an orthonormal basis and you can think of each vector as representing two orthogonal dimensions. This is because their dot product is 0.
However, any set of two vectors that are not on the same line will span the real number plane. For example, {(1,1), (0,1)} spans the real number plane, but they are not orthogonal.
Overall though it is always important to be aware of your input space, and the features/dimensions that you use to represent it. You can easily introduce bias or just noise in a number of ways if you aren't thorough. One example would be not normalizing your data.
Viewing a single comment thread. View all comments