Submitted by jesusfbes t3_yexifs in MachineLearning
TheLionKing2020 t1_iu1bcw8 wrote
Well, you don't need to train on all of these data
First take samples of 10k, 50k and 100k and see if you have different results. Do you get different number of clusters?
jesusfbes OP t1_iu3915z wrote
That was an initial idea, probably it is what I would do. However, it is good to now about efficient approaches
TheLionKing2020 t1_iu3g14t wrote
Also before going to make tests over 100k of samples check if you can lower the dimensions: feature selection, low variance, PCA, etc.
Viewing a single comment thread. View all comments