ICLR2026

Transformers with Endogenous In-Context Learning: Bias Characterization and Mitigation

Haotian Wang, Hao Zou, Haoxuan Li, Haoang Chi, Yang Shi, Yuanxing Zhang, Wenjing Yang, Xinwang Liu, Zhouchen Lin

Abstract

In-context learning (ICL) enables pre-trained transformers (TFs) to perform fewshot learning across diverse tasks, fostering growing research into its underlying mechanisms. However, existing studies typically assume a causally-sufficient regime, overlooking spurious correlations and endogenous prediction bias introduced by hidden confounders (HCs). As HC commonly exists in real-world cases, current ICL understandings may not align with actual data structures. To fill this gap, we contribute the pioneer theoretical analysis towards a novel problem setup termed as Endogenous ICL (EICL), which offers understanding the effect of HC on the pre-training of TFs and the following ICL prediction. Our theoretical results entail that pre-trained TFs exhibits certain prediction bias with proportional to the confounding strength. To mitigate such prediction bias, we further propose a gradient-free debiasing method named Double-Debiasing (DDbias) by prompting the biased pre-trained TFs with a few unconfounded examples twice-once with the original label and once with residual, to yield unbiased ICL predictions. Extensive experiments on regression tasks across diverse designs of the TF architectures and data generation protocols verify both our theoretical results and the effectiveness of the proposed DDbias method.