KDD2022

Multi-objective Optimization of Notifications Using Offline Reinforcement Learning

Prakruthi Prabhakar, Yiping Yuan, Guangyu Yang, Wensheng Sun, Ajith Muralidharan

被引用 6 次

摘要

Mobile notification systems play a major role in a variety of applications to communicate, send alerts and reminders to the users to inform them about news, events or messages. In this paper, we formulate the near-real-time notification decision problem as a Markov Decision Process where we optimize for multiple objectives in the rewards. We propose an end-to-end offline reinforcement learning framework to optimize sequential notification decisions. We address the challenge of offline learning using a Double Deep Qnetwork method based on Conservative Q-learning that mitigates the distributional shift problem and Q-value overestimation. We illustrate our fully-deployed system and demonstrate the performance and benefits of the proposed approach through both offline and online experiments. CCS CONCEPTS • Theory of computation → Markov decision processes; Reinforcement learning; • Computing methodologies → Q-learning; Neural networks.