ICML2025
MASS: Mathematical Data Selection via Skill Graphs for Pretraining Large Language Models
Jiazheng Li, Lu Yu, Qing Cui, Zhiqiang Zhang, Jun Zhou, Yanfang Ye, Chuxu Zhang
摘要
Figure 1. The pipeline of MASS. (1) We first employ prompt engineering to extract mathematical skills from a reference dataset, thereby constructing a skill graph. (2) We then score and rank the target dataset using the constructed skill graph. The top-ranked subset can further be selected and used to train LLMs.