KDD2025

Supervised Learning-enhanced Multi-Group Actor Critic for Live Stream Allocation in Feed

Jingxin Liu, Xiang Gao, YiSha Li, Xin Li, Haiyang Lu, Ben Wang

摘要

In the context of a short video & live stream mixed recommendation scenario, the live stream recommendation system (RS) decides whether to allocate at most one live stream to the video feed for each user request. The inappropriate policy which ignores the long-term negative impact of live stream allocation can significantly affect app usage duration and user retention. To maximize long-term user engagement, it is crucial to determine an optimal policy for accurate live stream allocation. Recently, reinforcement learning (RL) has been widely applied in recommendation systems to capture long-term user engagement. However, traditional RL algorithms often face divergence and instability problems, which restricts application and deployment in large-scale industrial recommendation systems, especially in the aforementioned challenging scenario. To address these challenges, we propose a novel Supervised Learning-enhanced Multi-Group Actor Critic algorithm (SL-MGAC). Specifically, we introduce a supervised learning-enhanced actor-critic framework that incorporates variance reduction techniques, where multi-task supervised reward learning helps restrict bootstrapping error accumulation during critic learning. Additionally, we design a multi-group state decomposition module for both actor and critic networks to reduce prediction variance and improve model stability. We also propose a novel reward function to prevent overly greedy live stream allocation. Empirically, we evaluate the SL-MGAC algorithm using offline policy evaluation (OPE) and online A/B testing. Experimental results demonstrate that the proposed method not only outperforms baseline methods under platform-level constraints, but also exhibits improved stability in online recommendation scenarios.