ICLR2026

Catalog-Native LLM: Speaking Item-ID dialect with Less Entanglement for Recommendation

Reza Shirkavand, Xiaokai Wei, Chen Wang, Zheng Hui, Heng Huang, Michelle Gong

被引用 4 次

摘要

While collaborative filtering delivers predictive accuracy and efficiency, and Large Language Models (LLMs) enable expressive and generalizable reasoning, modern recommendation systems must bring these strengths together. Growing user expectations, such as natural-language queries and transparent explanations, further highlight the need for a unified approach. However, doing so is nontrivial. Collaborative signals are often token-efficient but semantically opaque, while LLMs are semantically rich but struggle to model implicit user preferences when trained only on textual inputs. This paper introduces Item-ID + Natural-language Mixtureof-Experts Language Model (IDIOMoE), which treats item interaction histories as a native dialect within the language space, enabling collaborative signals to be understood in the same way as natural language. By splitting the Feed Forward Network of each block of a pretrained LLM into a separate text expert and an item expert with token-type gating, our method avoids destructive interference between text and catalog modalities. IDIOMoE demonstrates strong recommendation performance across both public and proprietary datasets, while preserving the text understanding of the pretrained model. INTRODUCTION Recommendation systems shape what people read, watch, buy, learn, and play. As AI shifts from static predictors to reasoning agents capable of following instructions, recommendation is also evolving from ranking fixed lists to assisting users in exploring, planning, and deciding. This trend is visible in practice: Amazon's Rufus provides LLM-powered conversational shopping (Amazon, 2024); Meta's Llama-3 assistant is embedded in WhatsApp, Instagram, and Facebook for task planning (Meta, 2024); and Netflix is adopting foundation-model approaches for personalization and LLM-based conversational retrieval (Netflix, 2025; Zhu et al., 2025) . These examples motivate bringing LLM knowledge and instruction-following into recommenders while preserving the collaborative patterns that make them accurate at scale. Conventional recommenders like collaborative filtering (CF) (Koren et al., 2009) , content-based (CB) (Lops et al., 2011) , and sequential models (Kang & McAuley, 2018; Sun et al., 2019; Zhai et al., 2024) perform well within their scope when data are abundant, but they depend heavily on the quality of logs and item attributes. They remain vulnerable to popularity bias (Abdollahpouri et al., 2019) , struggle to integrate heterogeneous signals (text, behavior, and context), and cannot support natural language queries. Pre-trained LLMs offer complementary strengths: they bring broad world knowledge, can follow natural-language instructions, and can reason about multi-objective trade-offs. Yet a fundamental gap remains. LLM pretraining centers on semantic understanding, whereas recommendation requires modeling collaborative preference patterns. The key challenge is leveraging LLMs for preference understanding without disrupting their semantic competence. Recent work has tried to bridge this gap by extending LLM vocabularies with item IDs (