NeurIPS2023

Fast Bellman Updates for Wasserstein Distributionally Robust MDPs

Zhuodong Yu, Ling Dai, Shaohang Xu, Siyang Gao, Chin Pang Ho

12 citations

Abstract

Markov decision processes (MDPs) often suffer from the sensitivity issue under model ambiguity. In recent years, robust MDPs have emerged as an effective framework to overcome this challenge. Distributionally robust MDPs extend the robust MDP framework by incorporating distributional information of the uncertain model parameters to alleviate the conservative nature of robust MDPs. This paper proposes a computationally efficient solution framework for solving distributionally robust MDPs with Wasserstein ambiguity sets. By exploiting the specific problem structure, the proposed framework decomposes the optimization problems associated with distributionally robust Bellman updates into smaller subproblems, which can be solved efficiently. The overall complexity of the proposed algorithm is quasi-linear in both the numbers of states and actions when the distance metric of the Wasserstein distance is chosen to be L 1 , L 2 , or L ∞ norm, and so the computational cost of distributional robustness is substantially reduced. Our numerical experiments demonstrate that the proposed algorithms outperform other state-of-the-art solution methods. Kuhn, 2018) . In this paper, we focus on the Wasserstein ambiguity sets, which have been a popular choice for distributionally robust data-driven optimization in recent years because of their outstanding empirical performance as well as nice theoretical properties, such as consistency in optimality and finite-sample bounds (Mohajerin Esfahani and Kuhn, 2018; Gao and Kleywegt, 2022) . By using Wasserstein distance to formulate the ambiguity set, Yang (2017) shows that there exists an optimal policy that is stationary and Markovian for the corresponding Wasserstein distributionally robust MDP. While (distributionally) robust MDPs can be solved by extending standard solution methods in classical MDPs to their (distributionally) robust counterparts, these solution methods become much more computationally demanding. For example, each Bellman update for (distributionally) robust MDPs can be formulated as a convex optimization problem. Without making use of any specific problem structure, one would need to use generic convex optimization solvers to compute these Bellman updates, which have to be evaluated numerous times for computing the (distributionally) robust value function. This computational challenge restricted the application of (distributionally) robust MDPs to small or medium size of problems. In recent years, many efficient algorithms are proposed for solving robust MDPs to address this issue (Iyengar