ICLR2025

Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form

Toshinori Kitamura, Tadashi Kozuno, Wataru Kumagai, Kenta Hoshino, Yohei Hosoe, Kazumi Kasaura, Masashi Hamaya, Paavo Parmas, Yutaka Matsuo

摘要

Erratum. We draw the reader's attention to a technical error in the proof of global optimality of stationary points of the maximum violation function ∆ b0 (π) in Theorem 4. Specifically, Equation ( 37 ) is false in general. The proof requires the identity min y∈Y max x∈X ⟨x, y⟩ = min y∈conv(Y) max x∈X ⟨x, y⟩ for compact sets X , Y ⊆ R d with X convex, but this identity does not hold in general; for instance, one may take Kitamura et al. (2026) for details. Unfortunately, Kitamura et al. (2026) further show that general RMDPs may admit multiple local minima. Moreover, even under (s, a)-rectangularity, finding an ε-optimal policy in RCMDPs is NP-hard. As a result, the main result of this paper, Corollary 1, is also incorrect.