ICLR2026

When Shift Happens - Confounding Is to Blame

Abbavaram Gowtham Reddy, Celia Rubio-Madrigal, Rebekka Burkholz, Krikamol Muandet

被引用 4 次

摘要

Distribution shifts introduce uncertainty that undermines the robustness and generalization capabilities of machine learning models. While conventional wisdom suggests that learning causal-invariant representations enhances robustness to such shifts, recent empirical studies present a counterintuitive finding: (i) empirical risk minimization (ERM) can rival or even outperform state-of-the-art out-ofdistribution (OOD) generalization methods, and (ii) its OOD generalization performance improves when all available covariates-not just causal ones-are utilized. Drawing on both empirical and theoretical evidence, we attribute this phenomenon to hidden confounding. Shifts in hidden confounding induce changes in data distributions that violate assumptions commonly made by existing OOD generalization approaches. Under such conditions, we prove that effective generalization requires learning environment-specific relationships, rather than relying solely on invariant ones. Furthermore, we show that models augmented with proxies for hidden confounders can mitigate the challenges posed by hidden confounding shifts. These findings offer new theoretical insights and practical guidance for designing robust OOD generalization algorithms and principled covariate selection strategies. All variable models vs causal models. Recently, Nastl and Hardt [57] introduced a benchmark study where covariates are categorized into four groups: causal (conservatively chosen), arguably causal, anti-causal, and other spurious covariates. They show that across 16 benchmark datasets, models using all covariates Pareto-dominate those using only causal or arguably causal subsets on both ID and OOD data. However, there is limited theoretical work explaining these results. We present scenarios and arguments to explain their experimental findings. For linear causal models, anchor regression [68] introduces a framework that balances between two estimation paradigms: models that include all observed covariates and models that focus solely on causal covariates. We aim to explain the impact of adding more covariates that are not necessarily causal under a hidden confounding shift. Eastwood et al. [21] show that unstable covariates can boost performance when they carry information about the label, provided they are conditionally independent of the stable covariates given the label. They propose to adjust the distribution shift by looking at the test domain without labels. However, when applied to a medical real-world dataset not constructed for this particular problem [8], ERM still remains competitive with their method, in line with the findings of Nastl and Hardt [57]. This reflects the broader insight that, under well-specified covariate shifts, maximum likelihood estimation (MLE) achieves minimax optimality for OOD generalization [26] . Yet, real-world settings are rarely well-specified due to hidden confounding shift, which is the main focus of this work. Manifestations of hidden confounding shift In this section, we provide background on hidden confounding shift and motivate the need to address it using an example involving causal effect identification of observed covariates on the outcome.