Submitted by samobon t3_1040w4q in MachineLearning
rantana t1_j32bo5x wrote
128GB HBM would fit some serious models on a single device. But I have yet to see any real progress from AMD (something that I can buy) that would make me consider changing workflow away from nvidia hardware.
AlmightySnoo t1_j32iljg wrote
PyTorch 2.0 moving away from directly depending on CUDA and using instead Triton is good news for AMD. In the Triton Github repo they say that AMD GPU support is under development. AMD needs to invest some resources to help there.
Nhabls t1_j32qdjm wrote
AMD solutions have been in "development" for as long as i've been in contact with the space. The approaches rise and fall but never deliver fully. Maybe it'll be different in the future, who knows
AlmightySnoo t1_j32qxge wrote
Because AMD never goes all in in software. Hope that view will probably change with Victor Peng and $AMD starts throwing billions into software.
allenout t1_j33v08p wrote
Wierdly enough, Xilinx is a huge investor in software and has absolutely amazing software support and customer service. I hope that translates over to AMD.
geeky_username t1_j32l1cl wrote
>AMD needs to invest some resources to help there.
That's where it will fail
ZaZaMood t1_j32zcfc wrote
Tired of reddit cynics. Saying something will fail before it even starts
geeky_username t1_j331xz0 wrote
"Those that fail to learn from history are doomed to repeat it."
Especially on the software side, AMD has a habit of releasing something and then not doing much for continued support, expecting the community to foot the labor
HippoLover85 t1_j35en7z wrote
Previously amd didnt have the budget for it. They do now and have really only had it the last two-ish years.
Will they now put resources towards it? I hope so. But it also appears amd is trying to get products in mega dc/supercomputer applications and spreading use that way.
zeyus t1_j338cuu wrote
Isn't their continued support one of the selling points for AM5? That they supported previous gen for ages and they plan to again
geeky_username t1_j33cic6 wrote
Software.
Having AI compute hardware is rather pointless without the supporting software.
Nvidia has an entire CUDA ecosystem for developers to use
zeyus t1_j33nspg wrote
Absolutely agree, it's been a while since I've had AMD hardware, but I'd consider it again (especially CPU)...I just haven't been aware of specific issues with software either, I mean Intel, AMD and Nvidia all have had bugfixes and patching with drivers and firmware. Is there something I've missed about AMD and software?
BTW, I haven't had enough disposable income to upgrade so I've been stuck on 4590K for about 6 years and I hate my motherboard software (that's Asus bloatware) and had so much trouble getting the NVMe to work and RAID...but once I did it's been OK, and the 1070 I have is getting a bit to small for working with ML/AI, but what can you do...it still runs most newish games too.
geeky_username t1_j358k76 wrote
>Is there something I've missed about AMD and software?
They have this https://gpuopen.com/
Which seems great in theory, but some of that hasn't been touched in a long time.
Radeon Rays: May 2021
They'll release something, do a bunch of initial work on it, and then it fades
zeyus t1_j363mxu wrote
Well that is a genuine shame, nvidia really needs some competition in this space. I'm sure plenty of researchers and enthusiasts would happily use some different hardware (as long as porting was easy) I've written some CUDA C++ and it's not bad. Manufacturer-specific code always feels a bit gross, but the GPU agent based modeling framework I was using was strictly CUDA.
ZaZaMood t1_j39c7ot wrote
Nvidia needs some competition fr fr. I can't even consider buying AMD because the entire data science community has pinned to CUDA
rlvsdlvsml t1_j334v18 wrote
Rocm users have been failed for the past 3 years tho
kanink007 t1_j638gwv wrote
Any Info about AMD APUs? By now I gave up hoping for AMD making ROCm available for APUs. I dont know much about Triton: Does it support APUs like 5600g?
ApprehensiveNature69 t1_j337qjg wrote
Rocm works pretty well these days on my 6900xt?
AlmightySnoo t1_j33dyf3 wrote
But there is no official support for your card.
ApprehensiveNature69 t1_j33hkaz wrote
For an individual that’s pretty true of any card - Nvidia will probably ignore your random CUDA error and redirect you to the forums to figure it out wether it is a k80 or an H100.
learn-deeply t1_j342462 wrote
What models have you tried? Wonder what the gaps between CUDA and ROCm are.
ApprehensiveNature69 t1_j342gcc wrote
So far a lot of them - have not had any issues with various stable diffusion models, deolidfy, bloom3b, and basically anything I have tried
lostmsu t1_j4j95e9 wrote
Can you bench training with https://github.com/karpathy/nanoGPT and 100M+ GPT model?
Viewing a single comment thread. View all comments