NeurIPS2022

Tenrec: A Large-scale Multipurpose Benchmark Dataset for Recommender Systems

Guanghu Yuan, Fajie Yuan, Yudong Li, Beibei Kong, Shujie Li, Lei Chen, Min Yang, Chenyun Yu, Bo Hu, Zang Li, Yu Xu, Xiaohu Qie

被引用 106 次

摘要

Existing benchmark datasets for recommender systems (RS) either are created at a small scale or involve very limited forms of user feedback. RS models evaluated on such datasets often lack practical values for large-scale real-world applications. In this paper, we describe Tenrec * , a novel and publicly available data collection for RS that records various user feedback from four different recommendation scenarios. To be specific, Tenrec has the following five characteristics: (1) it is large-scale, containing around 5 million users and 140 million interactions; (2) it has not only positive user feedback, but also true negative feedback (vs. one-class recommendation); (3) it contains overlapped users and items across four different scenarios; (4) it contains various types of user positive feedback, in forms of clicks, likes, shares, and follows, etc; (5) it contains additional features beyond the user IDs and item IDs. We verify Tenrec on ten diverse recommendation tasks by running several classical baseline models per task. Tenrec has the potential to become a useful benchmark dataset for a majority of popular recommendation tasks. Our source codes, datasets and leaderboards are available at https://github.com/ yuangh-x/2022-NIPS-Tenrec * . * Equal contribution. Fajie designed the research, Guanghu performed the research; Fajie, Guanghu, Lei and Min wrote the paper; Fajie, Min, Yudong and Beibei launched the research project; Guanghu, Beibei and Yudong collected the data; Shujie assisted in performing partial experiments. Experiments of this work were mainly performed when Guanghu interned at Tencent. * Tenrec (a hedgehog-like mammal) here means that the dataset is collected from the recommendation platforms of Tencent, and that it can be used to benchmark ten diversified recommendation tasks. * Email Fajie & Guanghu if you want to launch a new leaderboard for an important RS task using Tenrec. 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks.