TheLionKing2020 t1_iu1bcw8 wrote on October 27, 2022 at 8:33 PM

Well, you don't need to train on all of these data

First take samples of 10k, 50k and 100k and see if you have different results. Do you get different number of clusters?

jesusfbes OP t1_iu3915z wrote on October 28, 2022 at 5:58 AM

That was an initial idea, probably it is what I would do. However, it is good to now about efficient approaches

Also before going to make tests over 100k of samples check if you can lower the dimensions: feature selection, low variance, PCA, etc.