Submitted by DreamyPen t3_zsbivc in MachineLearning
DreamyPen
DreamyPen OP t1_ivvm1k0 wrote
Reply to comment by Ulfgardleo in [Discussion] Can we train with multiple sources of data, some very reliable, others less so? by DreamyPen
-
Yes I did mean outputs/targets. The features are always known, they correspond to testing conditions (a certain temperature, a certain processing speed, etc.) Given these testing conditions (inputs / labels), can we predict the material properties (outputs/targets) Experimental measurements are very reliable.
-
The physics based model can always output a prediction for any given labels (testing conditions). But it is not always reliable. We would still like to include them because it allows us to augment the small experimental data set, and, often times, it is quite good approximation from the ground truth. This will also answer 4. Indeed, since the physics based model can always make predictions, we will have in some instances both reliable and unreliable data.
-
Correct! :)
-
We do indeed.
-
Hopefully my response to 1. clarified it.
Let me know if the goal is clearer, and thank you for your help.
DreamyPen OP t1_ivvhmu0 wrote
Reply to comment by Ulfgardleo in [Discussion] Can we train with multiple sources of data, some very reliable, others less so? by DreamyPen
- There are two sources of data. One experimental measurements with small amount of scatter, so it is considered highly reliable data. The second source is data predicted using physics-based models. They are sometimes quite accurate, sometimes a bit off. So it is indeed a supervised problem, with unreliable outputs not labels.
- I'm learning material properties. Ideally able to learn from the experimental data (ground truth), while capturing the trends from the synthetic model-based data.
- The experimental data is always considered highly reliable. The model-based data can be accurate or not, so a fixed reliability score should be suitable without knowing with certainty whether the models prediction is reliable or not for given input.
- Answered previously.
- We are mainly interested in predicting material properties that are close to the experimental (reliable) data, while still picking some useful signal from the less accurate physics-based data.
I hope this helps clarifying my objectives. Thank you.
DreamyPen OP t1_ivvgkqs wrote
Reply to comment by _Arsenie_Boca_ in [Discussion] Can we train with multiple sources of data, some very reliable, others less so? by DreamyPen
Thank your the clarification. I'm dealing with a regression problem however. Not sure its applicable in my case.
DreamyPen OP t1_ivveamf wrote
Reply to comment by _Arsenie_Boca_ in [Discussion] Can we train with multiple sources of data, some very reliable, others less so? by DreamyPen
Can I ask you what you mean by "smoothen labels"?
DreamyPen OP t1_ivve7nt wrote
Reply to comment by DarwinianThunk in [Discussion] Can we train with multiple sources of data, some very reliable, others less so? by DreamyPen
Could this be done by adding a feature column "weight" with a value ranging from 0 to 1. The closer to 1, the more reliable?
DreamyPen OP t1_ivvdy75 wrote
Reply to comment by LurkAroundLurkAround in [Discussion] Can we train with multiple sources of data, some very reliable, others less so? by DreamyPen
I like this idea! Definitely worth trying, thank you.
DreamyPen OP t1_ivvdvqs wrote
Reply to comment by RSchaeffer in [Discussion] Can we train with multiple sources of data, some very reliable, others less so? by DreamyPen
Beautiful! Thank you
DreamyPen OP t1_ivu0v9o wrote
Reply to comment by Erosis in [Discussion] Can we train with multiple sources of data, some very reliable, others less so? by DreamyPen
Thank you for your comment. I am not sure what that custom loop would look like for an ensemble method (trees/gradient boosted), and how to proceed with down-weighing? Is it a documented technique I can read more about, or more of a workaound you are thinking of?
DreamyPen OP t1_ivty1ow wrote
Reply to comment by Erosis in [Discussion] Can we train with multiple sources of data, some very reliable, others less so? by DreamyPen
Unfortunately not, I'm predicting material properties on the continuous scale.
DreamyPen t1_iznlauj wrote
Reply to [P] I made a command-line tool that explains your errors using ChatGPT (link in comments) by jsonathan
I think its brilliant!