ICML2024

Efficient Value Iteration for s-rectangular Robust Markov Decision Processes

Navdeep Kumar, Kaixin Wang, Kfir Yehuda Levy, Shie Mannor

被引用 4 次

摘要

We focus on s-rectangular robust Markov decision processes (MDPs), which capture interconnected uncertainties across different actions within each state. This framework is more general compared to sa-rectangular robust MDPs, where uncertainties in each action are independent. However, the introduced interdependence significantly amplifies the complexity of the problem. Existing methods either have slow performance guarantees or are inapplicable to even moderately large state spaces. In this work, we derive optimal robust Bellman operators in explicit forms. This leads to robust value iteration methods with significantly faster time complexities than existing approaches, which can be used in large state spaces. Further, our findings reveal that the optimal policies demonstrate a novel threshold behavior, selectively favoring a limited set of actions based on their respective advantage functions. Additionally, our study uncovers a noteworthy connection between the robustness of a policy and the variance in its value function, highlighting that policies with lower variance exhibit greater resilience. Method LP SDP Bisection Gradient Close form Algorithm Binary Search Solution Approx Approx Exact Approx Exact Exact Approx Complexity Õ S 11 2 A 9 2 S 11 2 + S 3 A S 3 A NA S 2 A S 2 A S 2 A Contraction factor γ γ γ γ(1+ √ Sβ * ) γ γ γ Valid Kernels Yes Yes Yes No Yes Yes Yes Optimal Policy Characterization No No No No Yes Yes Yes SDP and LP stand for semi-definite programming and linear programming, respectively, β * = max s∈S β s , Õ hides logarithmic factors in S, A, ϵ.