WWW2026

Reinforcement Learning-Constrained Segmented User Modeling with Large Language Models for Recommendation

Yu Xia, Qing Tan, He Chen, Jingyu Chen, Qian Dong

Abstract

Large Language Models (LLMs) with powerful reasoning capabilities, offer new opportunities for recommendation systems (RS) , especially in understanding user preference. However, the direct application of LLMs to this task faces two major challenges: (1) limited context windows struggle to process the extensive user behavior sequences in real-world scenarios; (2) inherent hallucination effect can lead LLMs to infer spurious preferences that contradict true user intent, thereby degrading recommendation quality. To address this, we propose Rec 2, a Reinforcement learning-constrained segmented user modeling framework for recommendation. The framework segments lengthy behavior sequences into multiple segments and generates user preferences in a cascaded fashion. To suppress hallucinations and align with user intent, we introduce reinforcement learning, which uses the subsequent behavior segment as a supervisory signal to constrain and reward the preference inference process on the current segment, thereby generating more predictive and high-fidelity user preferences. Finally, the aligned LLMs infer preferences for each behavior segment; these are then processed by a specially-designed dynamic preference learning module to model preference evolution and are ultimately aggregated into a unified, dynamic long-term user preference embedding. This representation can be integrated into any recommendation model to boost its performance. Extensive experiments demonstrate that Rec2 significantly enhances recommendation performance by effectively capturing dynamic and authentic user preferences.