ICLR2025

Neural Stochastic Differential Equations for Uncertainty-Aware Offline RL

Cevahir Köprülü, Franck Djeumou, Ufuk Topcu

Abstract

Model exploitation: Evaluation in rollouts from learned dynamics models in (a) random and (b) medium-replay tasks. We report the average score per step with (pessimistic, Pess) and without (groundtruth, GT) uncertainty penalization. Model analysis: We illustrate the evolution of model prediction error in different datasets for D4RL Walker2d. (a) In-distribution: Evaluation of the datasets in which the models are trained. (b) Out-of-distribution: Evaluation of models, trained via random, in trajectories from other datasets. TLDR 3: NUNO constructs pessimistic learned MDPs that are less conservative. TLDR 4: Neural SDEs are more accurate than Gaussian ensembles over longer horizons.