ICLR2026

Pi-CCA: Prompt-Invariant CCA Certificates for Replay-Free Continual Multimodal Learning

Jiayu Zhang, Chuangxin Zhao, Canran Xiao, Ruibo Duan, Wenyi Mo, Haoyu Gao, Wenshuo Wang

摘要

When deployed on non-stationary data streams, foundation vision-language models require continual updates without access to past data. However, naive finetuning undermines their zero-shot recognition capabilities and prompt robustness. We seek a replay-free principle that preserves pre-trained cross-modal generalization under domain and prompt shifts. We introduce Prompt-Invariant CCA Certificates(PI-CCA), a geometry-first approach that summarizes image-text alignment with a compact certificate capturing the top-k canonical spectrum and subspace. During adaptation, we match this summary using only mini-batch statistics and induce prompt robustness via averaging over perturbations. Across MTIL, X-TAIL, VLCL, and ConStruct-VL, PI-CCA achieves state-of-the-art performance among replay-free methods. By optimizing alignment invariants rather than proxy signals, PI-CCA provides a simple, generator-free, constant-memory path to continual adaptation with strong zero-shot retention and resilience to prompt/style shifts.