ICLR2025
Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form
Toshinori Kitamura, Tadashi Kozuno, Wataru Kumagai, Kenta Hoshino, Yohei Hosoe, Kazumi Kasaura, Masashi Hamaya, Paavo Parmas, Yutaka Matsuo
摘要
Erratum. We draw the reader's attention to a technical error in the proof of global optimality of stationary points of the maximum violation function ∆ b0 (π) in Theorem 4. Specifically, Equation ( 37 ) is false in general. The proof requires the identity min y∈Y max x∈X ⟨x, y⟩ = min y∈conv(Y) max x∈X ⟨x, y⟩ for compact sets X , Y ⊆ R d with X convex, but this identity does not hold in general; for instance, one may take Kitamura et al. (2026) for details. Unfortunately, Kitamura et al. (2026) further show that general RMDPs may admit multiple local minima. Moreover, even under (s, a)-rectangularity, finding an ε-optimal policy in RCMDPs is NP-hard. As a result, the main result of this paper, Corollary 1, is also incorrect.