Zealousideal_Low1287 t1_jefx466 wrote on March 31, 2023 at 6:46 PM

Reply to comment by shn29 in [D] [R] On what kind of machine does Midjorney, the art generating AI, runs on? by shn29

Somewhere between a high end gaming card and 8 A100s probably. So £1k - 160k ish worth of GPUs?

Zealousideal_Low1287 t1_jdmcuti wrote on March 25, 2023 at 1:50 PM

Reply to [D] Do you use a website or program to organise and annotate your papers? by who_here_condemns_me

I print them out and put them in piles. I write on them with pens.

Zealousideal_Low1287 t1_jdlm2c0 wrote on March 25, 2023 at 8:36 AM

Reply to [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700

It seems that contrary to conventional wisdom, models with more parameters learn more efficiently. My personal ‘hunch’ is that training large models and then some form of distillation may become the standard thing to do.

Zealousideal_Low1287 t1_jb1ce8q wrote on March 5, 2023 at 6:06 PM

Reply to comment by szidahou in [R] [N] Dropout Reduces Underfitting - Liu et al. by radi-cho

IDK did you read it?

Zealousideal_Low1287 t1_j7mjppj wrote on February 7, 2023 at 9:47 PM

Reply to Wouldn’t it be a good idea to bring a more energy efficient language into the ML world to reduce the insane costs a bit?[D] by thedarklord176

‘Hey, I know nothing about what I’m saying but…’

Zealousideal_Low1287 t1_j6191sq wrote on January 27, 2023 at 12:08 AM

Reply to comment by HateRedditCantQuitit in [D] Why are GANs worse than (Latent) Diffusion Models for text2img generation? by TheCockatoo

I guess for it to really count as a variational autoencoder you need to be reconstructing the input

Zealousideal_Low1287 t1_j4neybb wrote on January 16, 2023 at 11:05 PM

Reply to comment by nateharada in [P] A small tool that shuts down your machine when GPU utilization drops too low. by nateharada

Yeah, that’s something which would be useful indeed. Don’t worry yourself though, I can put in a PR.

Zealousideal_Low1287 t1_j4n2ahm wrote on January 16, 2023 at 9:43 PM

Reply to [P] A small tool that shuts down your machine when GPU utilization drops too low. by nateharada

Looks nice. I probably wouldn’t use it for shutting down or anything, but a notification on failure might be useful!

Zealousideal_Low1287 t1_j498b5d wrote on January 14, 2023 at 1:14 AM

Reply to Why is Super Learning / Stacking used rather rarely in practice? [D] by Worth-Advance-1232

Kaggle would like to know your location

Zealousideal_Low1287 t1_j24at04 wrote on December 29, 2022 at 3:45 PM

Reply to comment by arcxtriy in [D] SOTA Multiclass Model Calibration by arcxtriy

Classes

Zealousideal_Low1287 t1_j240c9r wrote on December 29, 2022 at 2:29 PM

Reply to [D] SOTA Multiclass Model Calibration by arcxtriy

I would just try isotonic regression or platt scaling.

Zealousideal_Low1287 t1_j17zk4j wrote on December 22, 2022 at 9:39 AM

Reply to comment by farmingvillein in [D] Hype around LLMs by Ayicikio

Reads more like an HMM

Zealousideal_Low1287 t1_j1111zo wrote on December 20, 2022 at 9:54 PM

Reply to [D] Techniques to optimize a model when the loss over the training dataset has a Power Law type curve. by Dartagnjan

Perhaps you can do some data augmentation and resampling proportional to the difficulty.

Also perhaps a scheme like this could be appropriate:

https://arxiv.org/abs/2206.07137

Zealousideal_Low1287 t1_j10wgd9 wrote on December 20, 2022 at 9:23 PM

Reply to comment by vprokopev in [D] Why are we stuck with Python for something that require so much speed and parallelism (neural networks)? by vprokopev

Cool just write a deep learning framework

Zealousideal_Low1287 t1_j0zycf1 wrote on December 20, 2022 at 5:45 PM

Reply to comment by vprokopev in [D] Why are we stuck with Python for something that require so much speed and parallelism (neural networks)? by vprokopev

And you feel if you wrote raw C++ it would be as fast as the PyTorch ops, or you seek to replace the Python part, or something else?

Zealousideal_Low1287 t1_j0zutkt wrote on December 20, 2022 at 5:22 PM

Reply to comment by vprokopev in [D] Why are we stuck with Python for something that require so much speed and parallelism (neural networks)? by vprokopev

Yeah, no

Zealousideal_Low1287 t1_j0zuq39 wrote on December 20, 2022 at 5:22 PM

Reply to comment by vprokopev in [D] Why are we stuck with Python for something that require so much speed and parallelism (neural networks)? by vprokopev

I have no idea what you’re suggesting. Use C++ instead of vectorising properly and using PyTorch? Do you currently do much compute ‘outside’ PyTorch?

Zealousideal_Low1287 t1_j0zr6ix wrote on December 20, 2022 at 4:59 PM

Reply to comment by vprokopev in [D] Why are we stuck with Python for something that require so much speed and parallelism (neural networks)? by vprokopev

The GIL has nothing to do with ‘flexibility’

Zealousideal_Low1287 t1_j0zqyr2 wrote on December 20, 2022 at 4:58 PM

Reply to [D] Why are we stuck with Python for something that require so much speed and parallelism (neural networks)? by vprokopev

Even if you wrote in a language like Rust, Go, C, C++ you wouldn’t avoid ‘calling into a framework’ syndrome.

There are several reasons why things like the compute kernels available in PyTorch are fast / optimised, and it’s not as simple as they are compiled. Operations are written efficiently with hardware and memory access patterns in mind, and we also have to ensure we do things like correctly implement the API for backward mode autodiff.

If you wanted to swap over to writing all of this in bare C++ it would undoubtedly be much slower than using PyTorch. We use Python because it’s a nice convenient language for calling into the framework. The overhead from this usually isn’t particularly significant. If you have a use-case where it is significant then sure use C++.

Zealousideal_Low1287 t1_izsiwrs wrote on December 11, 2022 at 3:06 PM

Reply to What makes you choose standard iphone 14 and 14 plus over an android phone within the same price range? by xucai

I only buy iPhones and have never had an Android. You could argue it’s habit / inertia, but nothing has every made me feel swayed towards an Android.

Zealousideal_Low1287 t1_iysq8w6 wrote on December 3, 2022 at 9:35 PM

Reply to [D] Ensemble Training Logistics and Mathematical Equivalences by twocupv60

Perhaps of relevance:

https://arxiv.org/abs/1902.04422

Zealousideal_Low1287 t1_iy7v0ly wrote on November 29, 2022 at 12:04 PM

Reply to comment by [deleted] in Is coding from scratch a requirement to be able to do research? [D] by [deleted]

The merit in a paper isn’t in how hard it was or how long it took. It’s the novelty / value of the contribution.

Zealousideal_Low1287 t1_ivuxa37 wrote on November 10, 2022 at 8:06 PM

Reply to [Discussion] Can we train with multiple sources of data, some very reliable, others less so? by DreamyPen

Ye

Zealousideal_Low1287 t1_iutmq2q wrote on November 2, 2022 at 10:47 PM

Reply to [R] Is there any work being done on reduction of training weight vector size but not reducing computational overhead (eg pruning)? by Moose_a_Lini

As in, you care about compression for the sake of communication and not computation?

Zealousideal_Low1287 t1_irn8pa7 wrote on October 9, 2022 at 3:41 PM

Reply to comment by redditnit21 in [D] CSV File to training and testing split by redditnit21

I must say, I find it really weird that someone who would ask people online to write trivially simple code for them would be this defensive. Can you not look at yourself and think, huh maybe something is wrong with my attitude?