light24bulbs t1_jc5e0zk wrote
Reply to comment by Lajamerr_Mittesdine in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692
yeah theres definitely a threshold in there where its fast enough for human interaction. It's only an order of magnitude off, that's not too bad.
Viewing a single comment thread. View all comments