olmec-akeru OP t1_iy2zjoi wrote on November 28, 2022 at 10:26 AM

Reply to comment by NonOptimized in [D] What method is state of the art dimensionality reduction by olmec-akeru

https://arxiv.org/pdf/2204.04273.pdf

https://arxiv.org/pdf/2203.09347.pdf

https://arxiv.org/pdf/2206.06513.pdf

and the one speaking to categorical variables: https://arxiv.org/pdf/2112.00362.pdf

BrisklyBrusque t1_iy3s0ha wrote on November 28, 2022 at 3:14 PM

Cool links. I’ll add “entity embeddings” into the mix. Entity embeddings reimagine a categorical variable as a continuous-valued vector and allow us to skip one-hot encoding.

olmec-akeru OP t1_iy7a1yc wrote on November 29, 2022 at 7:11 AM

I fear that the location in the domain creates a false relationship to those closer on the same domain

i.e. if you encode at 0.1, 0.2, …, 0.9 you're saying that the category encoded to 0.2 is more similar to 0.1 and 0.3 than it is to 0.9. This may not be true.

BrisklyBrusque t1_iy8wfoa wrote on November 29, 2022 at 5:04 PM

I freely admit I haven’t looked into the math. But my understanding was the embeddings are a learned representation. They are not arbitrary; instead they aim to put categories close to one another on a continuous scale only in those situations where it is justified.

NonOptimized t1_iy32aw7 wrote on November 28, 2022 at 11:05 AM

Neat, thanks, will give them a read!

Honest-Debate-6863 t1_iy4xe0x wrote on November 28, 2022 at 7:54 PM

Where do y’all work at? How are you guys so knowledgeable