allanmeter

allanmeter t1_j7ytp7i wrote

This is really good advice! Preprocessing input data for both training and inferencing is the best route to get efficiencies. Don’t feed it crazy large multidimensional dataset, try and break it up and have a look at if you can use old fashioned methods on windowing and down sampling.

Also model parameters type is important too. If you’re running fp64 then you will struggle vs a model that’s just int8. If you have mixed precision weights then you really need to think about looking at AWS Sage and get a pipeline going.

To OP, maybe you can share a little context on what models you’re looking to run? Or input data context.

1

allanmeter t1_j7yt39v wrote

Threadripper and Epyc purely to maximise your access to L3 cache as well. Yes lanes and cores are important too. TR and Epyc really are well engineered chips to handle sustained compute or memory optimised workloads too.

Some models use multiple GPUs with either a strategy that copied data, and then there are models that would segment layers and minimise copies of data. Hence have a look at the distribution strategies being used, and how the models support them. Some models even use the CPU as a collation model to merge split datasets and weights, I’ve rarely seen these models perform well, they’re usually highly optimised with deep layers.

Lastly there’s no real golden ratio to the Ram, vram and swap ratio, let the OS handle it, provide as much as you can, and bias towards random IOPs as the measure.

Also please keep an eye on your nvidia-sim, use the watch -n 1 nvidia-smi to keep an eye on voltage and utilisation and temperature. You might be going the exotic route and explore water cooling, else make sure there is ample room to get cool air flowing through.

Best of luck, keep at it.

2

allanmeter t1_j7u6v50 wrote

Yes the ram to vram transfer is not as crazy important as you think. Previously we hit this issue in the 3000 series as well, and as a result we supplemented with full TB Ram but still was not enough. Some models are incredibly greedy.

If you are on Linux, which is highly encouraged, also look to optimise your storage tier option for SWAP memory, which is similar to pagefiles in windows. You can define and mount extended Swap disks which you can trick out with multi TB nvme drives. Not same performance as RAM but last step optimisations, before you need to consider going to Quadro

2

allanmeter t1_j7u6egg wrote

You will need to be looking at Epyc or at a minimum Threadripper. I would highly encourage ECC memory if possible.

Assuming you have a handle on data vs model distribution strategy, you will need fast and ample RAM to help with data loading/offloading as you have correctly pointed out.

If in North America, plenty of choices available to you, else where in the world you will have to seek combinations out selectively as stock is always a issue.

1

allanmeter t1_iyz8yy4 wrote

Reply to 4080 vs 3090 by simorgh12

One more consideration, if you’re using Cuda check cuda version compatibility for the 4000 series, also check cudnn compatibility as well. Sometimes newest cards are more of a pain than simple incremental value.

2