jrkirby
jrkirby t1_jdzx1ef wrote
Reply to comment by hadaev in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-
I'm guessing the hard part is that you can't "untrain" a model. They hadn't thought "I want to benchmark on these problems later" when they started. Then they spent 20K$+ compute on training. Then they wanted to test it. You can easily find the stuff you want to test on in your training dataset, sure. But you can't so easily remove it and train everything again from scratch.
jrkirby t1_j8ibjzo wrote
jrkirby t1_j1bnhkx wrote
Reply to comment by sanman in [D] When chatGPT stops being free: Run SOTA LLM in cloud by _underlines_
Why do you think they'll make us pay, when they could instead the treasure trove of personal information to sell to advertisers and train the AI to subliminally (or explicitly) advertise to us?
jrkirby t1_j1603hk wrote
Reply to comment by m_nemo_syne in [D] Different types of pooling in Neural Nets by Difficult-Race-1188
That's what I would have imagined Rank Based Average Pooling referred to, but apparently Rank Based Average Pooling is some complicated mess.
jrkirby t1_ivx9xjl wrote
What happens when all the weights to a ReLU neuron are 0? The ReLU function's derivative is discontinuous at zero. I figure in most practical situations this doesn't matter because the odds of many floating point numbers adding up to exactly 0.0 floating point is negligible. But this paper begs the question of what that would do. Is the derivative of ReLU at 0.0 equal to NaN, 0 or 1?
jrkirby t1_je2f63r wrote
Reply to comment by Thorusss in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-
2 million dollars or 20 million dollars is greater than 20 thousand. And it makes the main thesis more salient - the more money you've spent training, the less willing you'll be to retrain the entire model from scratch just to run some benchmarks the "proper" way.