killver
killver t1_jdgt1sn wrote
Reply to comment by devzaya in [N] ChatGPT plugins by Singularian2501
How exactly are you using the vector database there? It seems rather like querying the web for this info and the first example is about the docs.
killver t1_jcbpq7c wrote
Reply to [D] Is there an expectation that epochs/learning rates should be kept the same between benchmark experiments? by TheWittyScreenName
You actually rather found an issue in many research papers, that they do unfair comparisons on different methods based on un-tuned hyperparameters. If you run an EfficientNet vs. a VIT model on the same learning rate, you will get vastly different results.
killver t1_j2123h0 wrote
I was so waiting for someone to do this, hope it works well.
killver t1_j0txqyl wrote
Reply to comment by iamgianluca in [D] Will there be a replacement for Machine Learning Twitter? by MrAcurite
Yeah, no thanks.
We need something better :/
Or Twitter turnaround, which is possible.
killver t1_j069atv wrote
Reply to [D] What would happen if you normalize each sample on its on before sending it to the neural net? by xylont
This is already done in computer vision most of the time by just dividing the pixels by 255. You can also do actual sample normalization by let's say dividing by maximum value of the sample.
But as always there is no free lunch. Just try all options and see what works better for your problem.
killver t1_iz0bw96 wrote
Reply to comment by rahuldave in [D] Model comparison (train/test vs cross-validation) by Visual-Arm-7375
> But because of the hyperparameter optimization on them, the actual errors (like MSE) you calculate will be too optimistic.
This is the only argument for me to have a separate test dataset that you can make a more unbiased statement regarding accuracy. But I can promise you that no practicioner or researcher will set this test dataset apart and not make a decision on it, even if only subconsciously, which again biases it.
I think the better strategy is to focus on not making too optimistic statements on k-fold validation scores such as not doing automatic early stopping, not doing automatic learning rate schedulers, etc. The goal is to always only select hyperparameters that are optimal on all folds, vs. only optimal separate per fold.
killver t1_iz0a6xk wrote
Reply to comment by [deleted] in [D] Model comparison (train/test vs cross-validation) by Visual-Arm-7375
No the opposite. So why would you need a test set?
I am arguing that the test data is basically useless, because if you make a decision on it based on performance it is just another validation dataset, and if not you can better use the data for training.
killver t1_iz06mz6 wrote
Reply to comment by Visual-Arm-7375 in [D] Model comparison (train/test vs cross-validation) by Visual-Arm-7375
Maybe that's your confusion, getting a raw accuracy score that you are communicating, vs. finding and selecting hyperparameters/models. Your original post asked about model comparison.
Anyways, I suggest you take a look at how research papers are doing it, and also browse through Kaggle solutions. Usually people are always doing local cross validation, and the actual production data is the test set (e.g. ImageNet, Kaggle Leaderboard, Business Production data, etc.).
killver t1_iz04uyh wrote
Reply to comment by Visual-Arm-7375 in [D] Model comparison (train/test vs cross-validation) by Visual-Arm-7375
Look - I will not read now through a random blog, either you believe me and try to critically think it through or you already made up your mind anyways, then you should not have asked.
I will add a final remark.
If you make another decision (whether it generalizes well or not) on your holdout test dataset, you are basically just making another decision on it. If it does not generalize, what do you do next? You change your hyperparameters so that in works better on this test set?
What is different then vs. doing this decision on your validation data?
The terms validation and test data are mixed a lot in literature. In principle the test dataset how you define it, is just another validation dataset. And you can be more robust, by just doing multiple validation datasets, which k-fold is doing. You do not need this extra test dataset.
If you feel better doing it, go ahead. It is not "wrong" - but just not necessary and you lose train data.
killver t1_iz03u95 wrote
Reply to comment by Visual-Arm-7375 in [D] Model comparison (train/test vs cross-validation) by Visual-Arm-7375
And then what?
killver t1_iz03hvr wrote
Reply to comment by Visual-Arm-7375 in [D] Model comparison (train/test vs cross-validation) by Visual-Arm-7375
Do a 5-fold cross validation, train both models 5 times, and compare the OOF scores.
And of course optimize hyperparameters for each model type.
killver t1_iz02xc3 wrote
Reply to comment by Visual-Arm-7375 in [D] Model comparison (train/test vs cross-validation) by Visual-Arm-7375
Other question: how can hyperparameters overfit on validation data, if it is a correct holdout set?
In your definition, if you make the decision on another local test holdout, the setting is exactly the same, no difference. And if you do not make a decision on this test dataset, why do you need it?
The important thing is that your split is not leaky and represents the unseen test data well.
killver t1_iz02ql9 wrote
Reply to comment by Visual-Arm-7375 in [D] Model comparison (train/test vs cross-validation) by Visual-Arm-7375
I think you are misunderstanding it. Each validation fold is always a separate holdout dataset, so when you evaluate your model on it, you are not training on it. Why would it be a problem training on that fold for another validation holdout.
Actually your point 5 is also what you can do in the end, for production model to make use of all data.
The main goal of cross validation is to find hyperparamters that make your model generalize well.
If you take a look at papers or Kaggle, you will never find someone having both validation and test data locally. The test data usually is the real production data, or data you compare the models on. But you make decisions on your local cross validation to find a model that can generalize well on unseen test data (that is not in your current possession).
killver t1_iz023cj wrote
Reply to comment by Visual-Arm-7375 in [D] Model comparison (train/test vs cross-validation) by Visual-Arm-7375
The validation data is new data. You are not training on it obviously.
Test data in your definition, would be just another validation daset.
killver t1_iz01lwc wrote
Reply to comment by Visual-Arm-7375 in [D] Model comparison (train/test vs cross-validation) by Visual-Arm-7375
Well, you already answered it yourself. Why would you need a separate test dataset? It is just another validation dataset, and you already have five of those in case of 5-fold cross validation.
The only important thing is that you optimize your hyperparamters so that they are best across all folds.
The real test data is your future production data, where you apply your predictions.
killver t1_iz013vj wrote
Reply to comment by MUSEy69 in [D] Model comparison (train/test vs cross-validation) by Visual-Arm-7375
> you should always have an independent test split
nope, this is not true
killver t1_iz00wp1 wrote
Having a separate test dataset is useless, and you just waste available data. Just do proper cross-validation, evaluate on all folds, and you are good to go.
killver t1_iyv1zfo wrote
Reply to comment by somebodyenjoy in [D] Best object detection architecture out there in terms of accuracy alone by somebodyenjoy
EfficientDet if you care about license.
killver t1_ixiah49 wrote
Reply to comment by fxmarty in [P] BetterTransformer: PyTorch-native free-lunch speedups for Transformer-based models by fxmarty
Thanks a lot for all these replies. I have one more question if you do not mind: Sometimes I have huggingface models as a backbone in my model definitions, how would I go along to only apply the transformer based quantization on only the backbone? Usually these are called on the full model, but if my full model is already in onnx format it is complicated.
killver t1_ixi5dns wrote
Reply to comment by fxmarty in [P] BetterTransformer: PyTorch-native free-lunch speedups for Transformer-based models by fxmarty
I actually only tried dynamic quantization by using onnxruntime.quantization.quantize_dynamic
- is there anything better?
killver t1_ixhqi87 wrote
Reply to comment by fxmarty in [P] BetterTransformer: PyTorch-native free-lunch speedups for Transformer-based models by fxmarty
Thanks for the reply. Yeah ONNX and Openvino are already promising, but quantization on top makes the accuracy awful and actually it is even getting slower, maybe I am doing something wrong. I also had no luck with optimum library, which honestly has very bad documentation and API and is a bit too much tailored to using the transformers library out of the box.
killver t1_ixgqjha wrote
Reply to [P] BetterTransformer: PyTorch-native free-lunch speedups for Transformer-based models by fxmarty
Sorry for hijacking the topic, but I recently started researching improving transformer inference speed on CPUs and am a bit overwhelmed with all the different methods out there.
The only thing that helped me for now is to transform to ONNX. Are there any other low-hanging fruits to apply?
killver t1_iwv0pg0 wrote
Reply to comment by spruce5637 in [D] NLP folks who have used AllenNLP, how do you migrate your projects to other framework(s)? by spruce5637
You didn't really answer my question what parts of your pipeline you want to try to move. But in general AllenNLP is for quite some time now already irrelevant in the space, Id suggest to move to Huggingface asap.
killver t1_iwuucud wrote
Reply to [D] NLP folks who have used AllenNLP, how do you migrate your projects to other framework(s)? by spruce5637
What exactly do you want to migrate? If you have models in production I am sure you can keep them in production. And for training you can switch fresh to new frameworks like huggingface.
killver t1_jdzbsaz wrote
Reply to [P] ChatGPT Survey: Performance on NLP datasets by matus_pikuliak
I think the tricky thing about actually validating zero-shot capabilities is again a question of in-sample vs. out-of-sample. Which of these samples has ChatGPT actually already seen?