ICLR2025

Improving Data Efficiency via Curating LLM-Driven Rating Systems

Jinlong Pang, Jiaheng Wei, Ankit Shah, Zhaowei Zhu, Yaxuan Wang, Chen Qian, Yang Liu, Yujia Bao, Wei Wei

摘要

Recent studies challenge the general data scaling law, indicating that most of the knowledge is acquired during pre-training. New Censensus: data quality matters far more than quantity.