If I understood you correctly, you can handle dates and diffs of dates as a diff of Unix timestamp representations of these dates. Any programming language should have a time data manipulation lib and should offer APIs for converting dates to they Unix timestamp values.
It is an approach, but has its limitations. Using months, days and years as different features is also possible. Using cyclical encoding of dates is also possible, buy I use to see this kind of thing only when dealing with the hours, minutes and seconds of a date.
Embedding these dates, if we're talking about embedding dates by using an ML algorithm to generate these representations, seems a really, really bad idea, as, in my point of view, it adds work without adding any benefits to your solution. If you're not talking about that, then sorry, but I couldn't understand what you meant by taking about these "embeddings of dates" :)
Yes I by embedding I meant transforming each number of months to a vector, like nn.embedding in pytorch (knowing that the difference between dates can’t be more than 5 years so 60 months)
Thanks for the answer !
So maybe you could make every date an Unix timestamp, which is an integer, then you get the difference between those integers, then you can use an standard or min max scaler to put it under a certain interval.
I do not think anyone ever encoded dates as embeddings the way you're proposing, just because you can already get these kind of representations by using Unix timestamp.
In some application one way to represent dates for periodic cycles is by encoding year, month an week period by sin/cos pair. For example if you think yearly period (seasons) has meaning - create teo featurs cos(year_day/3652pi) an sin(). (In financial applications day of month makes sense,in consumer tradic - day of week)
What is your ground truth? How does the data available for prediction differ from your GT? Depending on your answers, dates may add noise to your predictions.
I have couples of dates and procedures/tests ans their results. So having the date is important (per example a patient had cancer 5 years ago and was treated using conization)
Gotcha. In that case I’d use sinusoid embedding like others have suggested. Another alternative is normalizing all of the dates onto some small range, eg [0,1]
Marvsdd01 t1_ismmzln wrote
If I understood you correctly, you can handle dates and diffs of dates as a diff of Unix timestamp representations of these dates. Any programming language should have a time data manipulation lib and should offer APIs for converting dates to they Unix timestamp values. It is an approach, but has its limitations. Using months, days and years as different features is also possible. Using cyclical encoding of dates is also possible, buy I use to see this kind of thing only when dealing with the hours, minutes and seconds of a date. Embedding these dates, if we're talking about embedding dates by using an ML algorithm to generate these representations, seems a really, really bad idea, as, in my point of view, it adds work without adding any benefits to your solution. If you're not talking about that, then sorry, but I couldn't understand what you meant by taking about these "embeddings of dates" :)