hcarlens OP t1_jbnrg6z wrote on March 10, 2023 at 10:53 AM

Reply to comment by scaldingpotato in [R] Analysis of 200+ ML competitions in 2022 by hcarlens

Good point! Edited to clarify.

hcarlens OP t1_jbnreef wrote on March 10, 2023 at 10:52 AM

Reply to comment by ilovekungfuu in [R] Analysis of 200+ ML competitions in 2022 by hcarlens

Hi! I'm not sure I fully understand your question, but if you're asking whether the rate of progress in competitive ML is slowing down, I think probably not. A lot of the key areas of debate (gbdt vs nn in tabular data, cnn vs transformers in vision) are seeing a lot of research still and I expect the competitive ML community to adopt new advances when they happen. Also in NLP there's a move towards more efficient models, which would also be very useful.

hcarlens OP t1_jbdvb7c wrote on March 8, 2023 at 9:50 AM

Reply to comment by NitroXSC in [R] Analysis of 200+ ML competitions in 2022 by hcarlens

Thanks! It depends on exactly what you're planning, but CodaLab (https://codalab.lisn.upsaclay.fr/competitions/) or their new platform CodaBench (https://www.codabench.org/) will probably work well.

They both allow you to run competitions for free, and people do use them for teaching purposes (you'll see some examples if you browse through the list of competitions).

I'm planning on writing a shorter blog post on running competitions soon.

hcarlens OP t1_jbdv5cu wrote on March 8, 2023 at 9:48 AM

Reply to comment by Alchera_QQ in [R] Analysis of 200+ ML competitions in 2022 by hcarlens

Another, older post on this: https://thegradient.pub/state-of-ml-frameworks-2019-pytorch-dominates-research-tensorflow-dominates-industry/

hcarlens OP t1_jbdv2vl wrote on March 8, 2023 at 9:47 AM

Reply to comment by XGDragon in [R] Analysis of 200+ ML competitions in 2022 by hcarlens

Thanks! Yes, but I didn't manage to get as much data as I wanted for the competitions on there. I emailed some of the competition organisers but didn't get a response.

hcarlens OP t1_jbdv11j wrote on March 8, 2023 at 9:46 AM

Reply to comment by senacchrib in [R] Analysis of 200+ ML competitions in 2022 by hcarlens

Thanks! I didn't create a separate category for learning-to-rank problems because they often overlap with other domains.

For example, some of the conservation competitions (https://mlcontests.com/state-of-competitive-machine-learning-2022/#conservation-competitions) are L2R problems on image data.

Or the Amazon/AIcrowd competitions (https://mlcontests.com/state-of-competitive-machine-learning-2022/#nlp--search) which were L2R with NLP.

In reality the mapping of competition:(competition type) is almost always one:many, and I'm planning on updating the ML Contests website to reflect that!

If I'd had more time and better data I would have sliced the data in multiple different ways to also look into e.g. L2R problems specifically in more depth.

hcarlens OP t1_jb9woj6 wrote on March 7, 2023 at 3:04 PM

Reply to comment by WirrryWoo in [R] Analysis of 200+ ML competitions in 2022 by hcarlens

I found that for a lot of time-series problems, people often treated them as if they were standard tabular/supervised learning problems. There's a separate page of the report which goes into these in detail: https://mlcontests.com/tabular-data?ref=mlc_reddit

For example, for the Kaggle Amex default prediction competition, the data is time-series in the sense that you're given a sequence of customer statements, and then have to predict the probability of them defaulting within a set time period after that. The winner's solution mostly seemed to flatten the features and use LightGBM, but they did use a GRU for part of their final ensemble: https://www.kaggle.com/competitions/amex-default-prediction/discussion/348111

The M6 forecasting competition finished recently, I'm looking forward to seeing what their winners did: https://m6competition.com/

hcarlens OP t1_jb9vu1f wrote on March 7, 2023 at 2:58 PM

Reply to comment by jamesmundy in [R] Analysis of 200+ ML competitions in 2022 by hcarlens

Yeah that one is really cool! They had an initial competition stage, open to everyone, where evaluation was done in a simulation environment (in software) as opposed to real robots. Competitors were given data from dozens of hours of actual robot interaction which they could use to train their policies.

The teams that qualified there made it through to the real robot stage. At that point they could submit their policies for weekly evaluation on actual robots - so they could have a few practice runs on the actual robots before the final leaderboard run.

hcarlens OP t1_jb9q3cm wrote on March 7, 2023 at 2:16 PM

Reply to comment by backhanderer in [R] Analysis of 200+ ML competitions in 2022 by hcarlens

Yeah, not just competitive ML but the research community as a whole seem to have almost entirely switched to PyTorch now (based on the Papers With Code data). I was expecting to see some people using JAX though!