ICML2023

Hierarchical Diffusion for Offline Decision Making

Wenhao Li, Xiangfeng Wang, Bo Jin, Hongyuan Zha

被引用 80 次

摘要

Offline reinforcement learning typically introduces a hierarchical structure to solve the longhorizon problem so as to address its thorny issue of variance accumulation. Problems of deadly triad, limited data and reward sparsity, however, still remain, rendering the design of effective, hierarchical offline RL algorithms for generalpurpose policy learning a formidable challenge. In this paper, we first formulate the problem of offline long-horizon decision-MakIng from the perspective of conditional generative modeling by incorporating goals into the control-as-inference graphic models. A Hierarchical trajectory-level Diffusion probabilistic model is then proposed with classifier-free guidance. HDMI employs a cascade framework that utilizes the rewardconditional goal diffuser for the subgoal discovery and the goal-conditional trajectory diffuser for generating the corresponding action sequence of subgoals. Planning-based subgoal extraction and transformer-based diffusion are employed to deal with the sub-optimal data pollution and long-range subgoal dependencies in the goal diffusion. Numerical experiments verify the advantages of HDMI on long-horizon decision-making compared to SOTA offline RL methods and conditional generative models.