artsybashev
artsybashev t1_je65qs7 wrote
13B model is quite small. Given that the company is focusing in AI hardware, the dataset and other parts of the model might be lagging a bit. Lack of comparison to other models also suggests that the performance is not that good.
artsybashev t1_jdu2hjs wrote
Reply to comment by super_deap in [D] GPT4 and coding problems by enryu42
The physical world that we know is very different from the virtual twin that we see. The human mind lives in a virtual existence created by the material human brain. This virtual world creates nonexisting things like pain, colors, feelings and also the feeling of existence.
The virtual world that each of our brain creates is the wonderful world where a soul can emerge. Virtual worlds can also be created by computers. There is no third magical place besides these two in my view.
artsybashev t1_jds4ekt wrote
Reply to comment by addition in [D] GPT4 and coding problems by enryu42
And soon people understand that this feedbackloop is what creates the thing we call consciousness.
artsybashev t1_jdmpwwd wrote
Reply to comment by danielbln in [R] Reflexion: an autonomous agent with dynamic memory and self-reflection - Noah Shinn et al 2023 Northeastern University Boston - Outperforms GPT-4 on HumanEval accuracy (0.67 --> 0.88)! by Singularian2501
The fluffy overly complex writing around your main message has worked as a barrier or prefilter to filter out bad job candidates or unqualified contributions to scientific discussion. LLMs are destroying this part. Interesting to see what this leads to.
artsybashev t1_jdlml1f wrote
Reply to comment by nekize in [R] Reflexion: an autonomous agent with dynamic memory and self-reflection - Noah Shinn et al 2023 Northeastern University Boston - Outperforms GPT-4 on HumanEval accuracy (0.67 --> 0.88)! by Singularian2501
Sounds like we need a LLM to generate padding for the academia and LLM to write the tldr for the readers. World is dumb.
artsybashev t1_jd6l85h wrote
Reply to comment by maizeq in [R] SPDF - Sparse Pre-training and Dense Fine-tuning for Large Language Models by CS-fan-101
nvidia has structured sparsity
artsybashev t1_j9puq9o wrote
Reply to comment by levand in Why bigger transformer models are better learners? by begooboi
It is in a way the same phenomena. If you think about information in images, overfitting would start to learn even the noise patterns in the images. If your training data does not have enough real information to fill the model capacity, the model will start to learn noise and overfit to your data.
artsybashev t1_j8i33cp wrote
Reply to comment by Appropriate_Ant_4629 in GPU comparisons: RTX 6000 ADA vs Hopper h100 by N3urAlgorithm
Yeah might be. I've only seen companies do machine learning in two ways. On is to rent a cluster of gpus and train something big for a week or two to explore something interesting. The other use pattern is to retrain a model every week with fresh data. Maybe this is the case for OP. Retraining a model each week and serving that model with some cloud platform. It makes sense to build a dedicated instance for a reoccuring tasks if you know that there is a need for it for more than a year. I guess it is also cheaper than using the upfront payment option in aws.
artsybashev t1_j8e2dmj wrote
Reply to comment by N3urAlgorithm in GPU comparisons: RTX 6000 ADA vs Hopper h100 by N3urAlgorithm
I understand that you have given up hope for Cloud. Just so you understand the options, $50k gives you about 1000 days of 4x A100 from vast.ai with todays pricing. Since in 3 years there is going to be at least one new generation, you will probably get more like 6 years of 4x A100 or one year of 4x A100 + 1 year of 4x H100. Keeping your rig at 100% utilization for 3 years might be hard if you plan to have holidays.
artsybashev t1_j7k04qr wrote
Reply to comment by ginger_beer_m in [N] Google: An Important Next Step On Our AI Journey by EducationalCicada
If Xi Jing Ping, Putin and Trump have taught you anything, being correct is absolutely useless. Just having some sort of a plan, coming up with a good story and some fact sounding arguments is a lot more valuable that what the average person thinks. Nothing more is required to be one of the the most influential person alive.
artsybashev t1_j5jj7fi wrote
Reply to comment by hiptobecubic in [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut
no
artsybashev t1_j5fhyex wrote
Reply to comment by EmmyNoetherRing in [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut
Infowars gets a new meaning in 10 years
artsybashev t1_j5fhioy wrote
Reply to comment by conchoso in [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut
Yeah it is easier to modify the color of a pixel that characters in a text in a way than humans do not detect. Something can be done through typos, weird choise of word or calculating the checksum of word choises, but those methods can easily sound unnatural to human readers.
artsybashev t1_j5fh1m8 wrote
Reply to comment by EmmyNoetherRing in [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut
I wonder if in 50 years, the LLM models are able to produce "viruses" that cause problems in competing models. Like AI hacking the other AI through injecting disruptive training data to the enemy training procedure.
artsybashev t1_j5fgnjm wrote
Reply to comment by EmmyNoetherRing in [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut
So they are effectively currently poluting the public space with their AI which only they have the tool to detect. Smells like anti-competition to me. This potentially makes the competing teams models worse since they will be eating the shit that GPT3 pushes out.
artsybashev t1_j2v9lx2 wrote
Reply to comment by currentscurrents in [R] Massive Language Models Can Be Accurately Pruned in One-Shot by starstruckmon
If you believe in singularity, at some point we reach an infinite loop where "AI" creates better methods to run calculations that it uses to build better "AI". In a way that is already happening but once that loop gets faster and more autonomous it can find a balance where the development is "optimally" fast.
artsybashev t1_j2suada wrote
Reply to comment by yahma in [R] Massive Language Models Can Be Accurately Pruned in One-Shot by starstruckmon
A100 can run about 75B parameters in 8bit. With pruning that is doable, but it wont be quite the same perplexity.
artsybashev t1_j1ph7f3 wrote
Reply to comment by AltruisticNight8314 in [D] Running large language models on a home PC? by Zondartul
one A100 80GB will get you started with models 500M-15B. You can rent that for a $50 per day. See where that takes you in a week.
artsybashev t1_j1c7pzh wrote
Reply to comment by step21 in [D] When chatGPT stops being free: Run SOTA LLM in cloud by _underlines_
a lot of stuff can be run locally with git clone ...
and docer compose up
artsybashev t1_j154fhy wrote
Reply to comment by caedin8 in [D] Running large language models on a home PC? by Zondartul
it is just the inference. Training requires more like 100 x A100 and a cluster to train on. Just a million to get started.
artsybashev t1_izoujfv wrote
Reply to comment by new_name_who_dis_ in [P] I made a command-line tool that explains your errors using ChatGPT (link in comments) by jsonathan
Yeah. A lot of times I get a better answer from chatgpt but you really need to take its responses witha grain of salt
artsybashev t1_izorw1t wrote
Reply to comment by _poisonedrationality in [P] I made a command-line tool that explains your errors using ChatGPT (link in comments) by jsonathan
yeah it is annoyingly confidently wrong. Even when you point out its mistake, it might try to explain like no mistakes where made. Sometimes it admits that there was a mistake. From a coworker this would be really annoying behaviour.
artsybashev t1_iw29zh1 wrote
A lot of deep learning has been modern equivalent of witchcraft. Just some ideas that might make sense being squashed together.
Hyperparameter tuning is one of the most obscure and hard to learn part of neural network training since it is hard to do multiple runs with it for models that take more than a few weeks/thousands of dollars to train. Most of the researchers just have learned some good initial guess and might run the model with some set of hyperparameters from which the best result is chosen.
Some of the hyperparameter tunings can also be done with a smaller model and the amount of hyperparameter tuning can be reduced while growing the model to the target size.
artsybashev t1_iv5vgsp wrote
Reply to comment by PicnicBasketPirate in New member! Been lurking for a while! Just bought my daughter her first small gumbo pot! It’s a vintage Magnalite 5qt! These puppies last forever! by cindy_lou_who_1982
The discussion section is definitely worth reading. Apparently aluminum is ok as long as the levels stay below the limits of your kidneys ability get rid of it.
This is around 0.1mg per day of aluminum getting into your blood stream. The bioavailability from different sources is a bit complicated to figure out. Main sources are probably deodorants, vaccines, cookware, processed food, drink water, drugs, sun lotions and exposure in some occupations.
artsybashev t1_jefp0o2 wrote
Reply to comment by Sopel97 in [P] Introducing Vicuna: An open-source language model based on LLaMA 13B by Business-Lead2679
Yes the only thing they can do is ban you from their service