Viewing a single comment thread. View all comments

stefanof93 t1_jbzeots wrote

Anyone evaluate all the quantized versions and compare them against smaller models yet? How many bits can you throw away before you're better of picking a smaller version?

26

LetterRip t1_jc4rifv wrote

Depends on the model. Some have difficulty even with full 8bit quantization; others you can go to 4bit relatively easily. There is some research that suggests 3bit might be the useful limit, with rarely certain 2bit models.

3