currentscurrents t1_jc03yjr wrote
Reply to comment by Dendriform1491 in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692
You could pack more bits in your bit with in-memory compression. You'd need hardware support for decompression inside the processor core.
Dendriform1491 t1_jc0bgxd wrote
Or make it data free altogether
Viewing a single comment thread. View all comments