ICLR2026

Operator Theory-Driven Autoformulation of MDPs for Control of Queueing Systems

Victor Baillet, Yuanzhang Xiao, Nicolás Astorga, Mihaela van der Schaar

Abstract

Autoformulation is an emerging field that uses large language models (LLMs) to translate natural-language descriptions of decision-making problems into formal mathematical formulations. Existing works have focused on autoformulating mathematical optimization problems for one-shot decision-making. However, many real-world decision-making problems are sequential, best modeled as Markov decision processes (MDPs). MDPs introduce unique challenges for autoformulation, including a significantly larger formulation search space, and for computing and interpreting the optimal policy. In this work, we address these challenges in the context of queueing problems-central to domains such as healthcare and logistics-which often require substantial technical expertise to formulate correctly. We propose a novel operator-theoretic autoformulation framework using LLMs. Our approach captures the underlying decision structure of queueing problems through constructing the Bellman equation as a graph of operators, where each operator is an interpretable transformation of the value function corresponding to certain event (e.g., arrival, departure, routing). Theoretically, we prove a universal three-level operator-graph topology covering a broad class of MDPs, significantly shrinking the formulation search space. Algorithmically, we propose customized Monte Carlo tree search to build operator graphs while incorporating self-evaluation, solver feedback, and intermediate syntax checking for early assessment, and present a provably low-complexity algorithm that automatically identifies structures of the optimal policy (e.g., threshold-based), accelerating downstream solving. Numerical results demonstrate the effectiveness of our approach in formulating queueing problems and identifying structural results. Problem Description by Domain Expert Consider a hospital with two wards: one for Critical patients and one for General patients. Both wards share beds. On average they receive critical patients and general patients per day, and discharge critical patients and general patients per day. On each arrival, we decide to admit the patient (joining service or the queue) or redirect/deny; denying incurs a one-time cost for critical patients and for general patients. Each admitted patient generates a holding cost of per unit time. Obj.: minimize long-term -discounted cost. Event Operators Formulation Challenge 1: Vast search space of operator graphs Set of Feasible Policies Monotone Policies Threshold Policies Uniformization Operators Cost Operators State Spaces Contribution: (Theorem 4.1) -Existence of universal graph topology greatly reduces search space.