Submitted by Secure-Technology-78 t3_10mdhxb in MachineLearning
maizeq t1_j69vuec wrote
Reply to comment by Taenk in [R] SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot by Secure-Technology-78
Nice. How are you converting between dataset size and number of tokens?
Doesn’t common crawl get deduplicated and that’s why the number of usable tokens decreases - or is it also curation? How much of that 380TiB is actually utilisable.
Given the ostensibly impressive performance of the bilingual GLM-130B (Chinese+English) model that came out of Tsinghua university that might very well be the case.
Viewing a single comment thread. View all comments