ICML2025

Distributionally Robust Policy Learning under Concept Drifts

Jingyuan Wang, Zhimei Ren, Ruohan Zhan, Zhengyuan Zhou

摘要

Distributionally robust policy learning aims to find a policy that performs well under the worstcase distributional shift, and yet most existing methods for robust policy learning consider the worst-case joint distribution of the covariate and the outcome. The joint-modeling strategy can be unnecessarily conservative when we have more information on the source of distributional shifts. This paper studies a more nuanced problem -robust policy learning under the concept drift, when only the conditional relationship between the outcome and the covariate changes. To this end, we first provide a doubly-robust estimator for evaluating the worst-case average reward of a given policy under a set of perturbed conditional distributions. We show that the policy value estimator enjoys asymptotic normality even if the nuisance parameters are estimated with a slower-than-rootn rate. We then propose a learning algorithm that outputs the policy maximizing the estimated policy value within a given policy class Π, and show that the sub-optimality gap of the proposed algorithm is of the order κ(Π)n -1/2 , where κ(Π) is the entropy integral of Π under the Hamming distance and n is the sample size. A matching lower bound is provided to show the optimality of the rate. The proposed methods are implemented and evaluated in numerical studies, demonstrating substantial improvement compared with existing benchmarks.