stefanof93 t1_jbzeots wrote on March 12, 2023 at 10:19 PM Reply to [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692 Anyone evaluate all the quantized versions and compare them against smaller models yet? How many bits can you throw away before you're better of picking a smaller version? Permalink 26
stefanof93 t1_jbzeots wrote
Reply to [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692
Anyone evaluate all the quantized versions and compare them against smaller models yet? How many bits can you throw away before you're better of picking a smaller version?