NeurIPS2020

Distributionally Robust Local Non-parametric Conditional Estimation

Viet Anh Nguyen, Fan Zhang, José H. Blanchet, Erick Delage, Yinyu Ye

28 citations

Abstract

Conditional estimation given specific covariate values (i.e., local conditional estimation or functional estimation) is ubiquitously useful with applications in engineering, social and natural sciences. Existing data-driven non-parametric estimators mostly focus on structured homogeneous data (e.g., weakly independent and stationary data), thus they are sensitive to adversarial noise and may perform poorly under a low sample size. To alleviate these issues, we propose a new distributionally robust estimator that generates non-parametric local estimates by minimizing the worst-case conditional expected loss over all adversarial distributions in a Wasserstein ambiguity set. We show that despite being generally intractable, the local estimator can be efficiently found via convex optimization under broadly applicable settings, and it is robust to the corruption and heterogeneity of the data. Experiments with synthetic and MNIST datasets show the competitive performance of this new class of estimators. where the maximization is taken over all probability measures Q that are within ρ distance in the ∞-Wasserstein sense of a benchmark nominal model, which often corresponds to the empirical distribution of available data. The probability measures Q are constrained so that Q(X ∈ N γ (x 0 )) > 0 to eliminate the complication of conditioning on a set of measure zero. Contributions. Resting on formulation (2), our main contributions are summarized as follows. 1. We introduce a novel paradigm of non-parametric local conditional estimation based on distributionally robust optimization. In contrast to classical non-parametric conditional estimators, our new class of estimators are endowed by design with robustness features. They are structurally built to mitigate the impact of model contamination and therefore they may be reasonably applied to heterogeneous data (e.g., non i.i.d. input). 2. We demonstrate that when the ambiguity set is a type-∞ Wasserstein ball around the empirical measure, the proposed min-max estimation problem can be efficiently solved in many applicable settings, including notably the local conditional mean and quantile estimation. 3. We show that this class of type-∞ Wasserstein local conditional estimators can be considered as a systematic robustification of the k-nearest neighbor estimator. We also provide further insights on the statistical properties of our approach and empirical evidence, with both a synthetic and real data sets, that our approach can provide more accurate estimations in practically relevant settings. Related work. One can argue that every single prediction task in machine learning ultimately relates to conditional estimation. So, attempting to provide a full literature survey on non-parametric conditional estimation is an impossible task. Since our contribution is primarily on introducing a novel conceptual paradigm powered by DRO, we focus on discussing well-understood estimators that encompass most of the conceptual ideas used to mitigate the challenges exposed earlier.