whata_wonderful_day
whata_wonderful_day t1_jbcxdwf wrote
Reply to comment by adt in [D] Can someone explain the discrepancy between the findings of LLaMA and Chinchilla? by __Maximum__
Nice! How did you get access to Megatron-11B? I can't find it online anywhere
whata_wonderful_day t1_ja3kh4d wrote
Reply to comment by CKtalon in [P] What are the latest "out of the box solutions" for deploying the very large LLMs as API endpoints? by johnhopiler
Yeah this is what the big bois use. It'll give you max performance, but isn't exactly user friendly
whata_wonderful_day t1_j7ubutx wrote
Reply to comment by blackkettle in [P] Get 2x Faster Transcriptions with OpenAI Whisper Large on Kernl by pommedeterresautee
His point is that it's identical. They didn't use quantization or anything that would hurt performance. The whisper paper has a lot of the details you're asking for
whata_wonderful_day t1_ivjbsv1 wrote
Reply to comment by chuanli11 in [D] NVIDIA RTX 4090 vs RTX 3090 Deep Learning Benchmarks by mippie_moe
Thanks! Good to see a 78% bump in performance with 1 GPU at least
whata_wonderful_day t1_iv57znb wrote
Reply to comment by learn-deeply in [D] NVIDIA RTX 4090 vs RTX 3090 Deep Learning Benchmarks by mippie_moe
Performance will definitely get better as time goes, but fp8 is going to be extra work to use, just like fp16.
whata_wonderful_day t1_iv20f1u wrote
Awesome, much appreciate the detailed benchmarks! The dual GPU scaling in particular was of interest to me. I was wondering how the lack of nvlink would affect things.
BERT large benchmarks would also be great, if you could do them?
whata_wonderful_day t1_iu81vzp wrote
Reply to comment by sobagood in [D] How to get the fastest PyTorch inference and what is the "best" model serving framework? by big_dog_2k
I tried OpenVINO ~1.5 years back and it didn't match ONNXRuntime on transformers. For CNNs it's the fastest though. I also found OpenVINO to be pretty buggy and not user friendly. I needed to fix their internal transformer conversion script
whata_wonderful_day t1_istvc0k wrote
Yeah that sucks. On that note, I'm hiring! Feel free to dm me. Roles are remote
whata_wonderful_day t1_jbhp4gb wrote
Reply to comment by Jepacor in [D] Can someone explain the discrepancy between the findings of LLaMA and Chinchilla? by __Maximum__
Thanks, alas I thought it was an encoder model. I've been on the lookout for a big one, largest I've seen is deberta V2 with 1.5B params