__lawless t1_j4r9ebs wrote on January 17, 2023 at 6:37 PM

Reply to comment by monkeysingmonkeynew in [D] Is it possible to update random forest parameters with new data instead of retraining on all data? by monkeysingmonkeynew

Ok let me elaborate a bit. Imagine the old model is called m_0. Your newly obtained training data is X, y, features and labels, respectively. Now calculate the residual error which is the difference between y and prediction of m_0: dy = y - m_0(X). Now train a new model m_1. The labels and features are X, dy. Finally at inference time the prediction is the sum of the two models: y_pred = m_0(X_new) + m_1(X_new).

[deleted] t1_j4rjw0s wrote on January 17, 2023 at 7:41 PM

[deleted]

monkeysingmonkeynew OP t1_j4un2xm wrote on January 18, 2023 at 11:13 AM

OK I can almost see this working, thanks for the suggestion. The only thing that would prevent me from implementing this solution is that by taking the sum of the two models, it would let m_1 give as equal a contribution to the result as m_1. However I expect a single days data to be noisy, Thus I would need the contribution of the new days data to be down weighted somehow.