WWW2026

Optimizing Multi-Turn Interactive Recommendation Agents via Generative Intrinsic Motivation

Xueyang Feng, Jiakai Tang, Xu Chen, Quanyu Dai, Zhenhua Dong

摘要

Large language models have given rise to interactive recommendation agents (IRAs). Through proactive clarification, tool invocation, and dynamic dialogue, IRAs shift recommender systems from passive prediction to interactive, proactive intelligence. For training IRAs, agentic reinforcement learning offers a natural pathway, as it enables models to learn interactive capabilities directly from environmental feedback without requiring costly annotated data. However, this process faces three key challenges: credit assignment in multi-turn interactions, efficient exploration in large action spaces, and coordinated learning of multiple interactive skills. To tackle this challenge, we present a new preference optimization paradigm, GIMO, which treats the interaction and learning process of IRAs in sparse environments as the continuous fulfillment and stimulation of three intrinsic drives: Autonomy, Competence, and Relatedness. Instead of relying solely on external rewards, GIMO estimates the potential between adjacent states in a generative manner to construct intrinsic rewards. Such a generative motivation structure not only enables fine-grained credit assignment, but also naturally inspires a hint-guided action proposal mechanism that facilitates efficient exploration. Furthermore, during the multi-skill coordination training stage, we introduce an explicit KL regularization into policy orchestration to prevent global policy collapse. Theoretically, we prove that GIMO guarantees global policy consistency, and empirically, experiments in interactive recommendation environments built from three datasets demonstrate that GIMO consistently outperforms existing methods. Our code is anonymously available at https://github.com/XueyangFeng/GIMO.