KDD2022
Multi-objective Optimization of Notifications Using Offline Reinforcement Learning
Prakruthi Prabhakar, Yiping Yuan, Guangyu Yang, Wensheng Sun, Ajith Muralidharan
被引用 6 次
摘要
Mobile notification systems play a major role in a variety of applications to communicate, send alerts and reminders to the users to inform them about news, events or messages. In this paper, we formulate the near-real-time notification decision problem as a Markov Decision Process where we optimize for multiple objectives in the rewards. We propose an end-to-end offline reinforcement learning framework to optimize sequential notification decisions. We address the challenge of offline learning using a Double Deep Qnetwork method based on Conservative Q-learning that mitigates the distributional shift problem and Q-value overestimation. We illustrate our fully-deployed system and demonstrate the performance and benefits of the proposed approach through both offline and online experiments. CCS CONCEPTS • Theory of computation → Markov decision processes; Reinforcement learning; • Computing methodologies → Q-learning; Neural networks.