whata_wonderful_day t1_jbhp4gb wrote on March 9, 2023 at 2:53 AM

Reply to comment by Jepacor in [D] Can someone explain the discrepancy between the findings of LLaMA and Chinchilla? by __Maximum__

Thanks, alas I thought it was an encoder model. I've been on the lookout for a big one, largest I've seen is deberta V2 with 1.5B params

whata_wonderful_day t1_jbcxdwf wrote on March 8, 2023 at 3:23 AM

Reply to comment by adt in [D] Can someone explain the discrepancy between the findings of LLaMA and Chinchilla? by __Maximum__

Nice! How did you get access to Megatron-11B? I can't find it online anywhere

whata_wonderful_day t1_ja3kh4d wrote on February 26, 2023 at 4:17 PM

Reply to comment by CKtalon in [P] What are the latest "out of the box solutions" for deploying the very large LLMs as API endpoints? by johnhopiler

Yeah this is what the big bois use. It'll give you max performance, but isn't exactly user friendly

whata_wonderful_day t1_j7ubutx wrote on February 9, 2023 at 1:47 PM

Reply to comment by blackkettle in [P] Get 2x Faster Transcriptions with OpenAI Whisper Large on Kernl by pommedeterresautee

His point is that it's identical. They didn't use quantization or anything that would hurt performance. The whisper paper has a lot of the details you're asking for

whata_wonderful_day t1_ivjbsv1 wrote on November 8, 2022 at 11:50 AM

Reply to comment by chuanli11 in [D] NVIDIA RTX 4090 vs RTX 3090 Deep Learning Benchmarks by mippie_moe

Thanks! Good to see a 78% bump in performance with 1 GPU at least

whata_wonderful_day t1_iv57znb wrote on November 5, 2022 at 12:10 PM

Reply to comment by learn-deeply in [D] NVIDIA RTX 4090 vs RTX 3090 Deep Learning Benchmarks by mippie_moe

Performance will definitely get better as time goes, but fp8 is going to be extra work to use, just like fp16.

whata_wonderful_day t1_iv20f1u wrote on November 4, 2022 at 6:01 PM

Reply to [D] NVIDIA RTX 4090 vs RTX 3090 Deep Learning Benchmarks by mippie_moe

Awesome, much appreciate the detailed benchmarks! The dual GPU scaling in particular was of interest to me. I was wondering how the lack of nvlink would affect things.

BERT large benchmarks would also be great, if you could do them?

whata_wonderful_day t1_iu81vzp wrote on October 29, 2022 at 7:41 AM

Reply to comment by sobagood in [D] How to get the fastest PyTorch inference and what is the "best" model serving framework? by big_dog_2k

I tried OpenVINO ~1.5 years back and it didn't match ONNXRuntime on transformers. For CNNs it's the fastest though. I also found OpenVINO to be pretty buggy and not user friendly. I needed to fix their internal transformer conversion script

whata_wonderful_day t1_istvc0k wrote on October 18, 2022 at 5:43 PM

Reply to [D] How frustrating are the ML interviews these days!!! TOP 3% interview joke by Mogady

Yeah that sucks. On that note, I'm hiring! Feel free to dm me. Roles are remote