KDD2026

Scaling Recommender Transformers to One Billion Parameters

Kirill Khrylchenko, Artem Matveev, Sergei S. Makeev, Vladimir Baikalov

被引用 12 次

摘要

While large transformer models have been successfully used in many real-world applications such as natural language processing, computer vision, and speech processing, scaling transformers for recommender systems remains a challenging problem. Recently, the Generative Recommenders framework was proposed as a way to scale beyond typical Deep Learning Recommendation Models (DLRMs). By reformulating recommendation as a sequential transduction task, it improves scaling properties in terms of compute. Nevertheless, the largest encoder configuration reported by the HSTU authors is only 176 million parameters --- far smaller than the hundreds of billions (or even trillions) that are now common in language models. In this work, we present a recipe for training large transformer recommenders with up to one billion parameters. We show that autoregressive learning on user histories naturally decomposes into two subtasks, feedback prediction and next-item prediction, and demonstrate that this decomposition scales effectively across a wide range of transformer sizes. Furthermore, we report a successful deployment on a large-scale music platform serving millions of users. In online A/B tests, the proposed model increases total listening time by +2.26% and raises the likelihood of user likes by +6.37%, constituting (to our knowledge) the largest improvement in recommendation quality reported for any deep learning-based system in the platform's history.