EMNLP2025
One Planner To Guide Them All ! Learning Adaptive Conversational Planners for Goal-oriented Dialogues
Huy Quang Dao, Lizi Liao
Abstract
Goal-oriented dialogues, such as recommendation and negotiation, often require balancing multiple conflicting objectives.Conventional approaches typically train separate policies for each predefined objective trade-off, which is computationally costly and scales poorly.In this work, we pursue a single dialogue policy that can dynamically adapt to varying objective preferences at inference time without retraining.This raises several challenges in terms of both (1) optimization strategy and (2) knowledge utilization.To address these, we propose a novel policy learning framework, Preference Adaptive Dialogue Policy Planner (PADPP), for multi-objective goal-oriented dialogues.Specifically, to tackle the former, we introduce a novel optimization scheme, which leverages information gained from training the model on previously updated objective weights, accelerating the learning capability on new weight settings.To address the latter, we utilize Generalized Policy Improvement (GPI) to ensure the effectiveness of leveraged knowledge.Experimental results demonstrate that PADPP achieves superior adaptability and performance compared to state-of-the-art approaches, offering a scalable and flexible solution for multiobjective, goal-oriented dialogues 1 . InferenceRoBERTa DQN LM Planner Action 1 (, , ) Action 2 (, , ) Recommendation Dialogue: DuRecDial 2.0