AAAI2025

Scalable Solutions for Decision-Making Systems Using Explainable Policy Representations

Muqsit Azeem

Abstract

Despite significant advancements in solving Markov Decision Processes (MDPs) and Simple Stochastic Games (SGs), scalability remains a challenge due to the exponential growth of their state spaces. This thesis aims to push the boundaries of state-of-the-art methods by tackling this issue using 1) explainability and 2) exploiting the model structure. First, we introduce the 1-2-3-Go approach, which learns explainable policies from small MDP models and generalizes them to larger instances, improving scalability in MDPs. We then extend Optimistic Value Iteration (OVI) and Sound Value Iteration (SVI)—originally designed for MDPs—to SGs, improving efficiency in adversarial settings. Finally, we aim to exploit the explainable policy representations and the model structure to enhance both scalability and interpretability in SGs. This thesis contributes to both theoretical advancements and practical solutions for decision-making systems under uncertainty.