maizeq t1_j69vuec wrote on January 28, 2023 at 8:18 PM

Reply to comment by Taenk in [R] SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot by Secure-Technology-78

Nice. How are you converting between dataset size and number of tokens?

Doesn’t common crawl get deduplicated and that’s why the number of usable tokens decreases - or is it also curation? How much of that 380TiB is actually utilisable.

Given the ostensibly impressive performance of the bilingual GLM-130B (Chinese+English) model that came out of Tsinghua university that might very well be the case.