ICLR2026

MaRS: Memory-Adaptive Routing for Reliable Capacity Expansion and Knowledge Retention

Gang Yan

Abstract

Large pre-trained models (LPMs) serve as universal backbones for vision and language tasks, but continual learning (CL) with frozen LPMs remains challenging, since shallow adaptation modules face the stability-plasticity dilemma and are prone to catastrophic forgetting. To address this problem, we propose MARS (Memory-adaptive Router with Statistical control), a modular framework that decouples stable representation from adaptive capacity through three components: a frozen encoder, a slot-based memory router, and a lightweight classifier. On this basis, we design two mechanisms: (i) Statistically-Grounded Slot Expansion (SGSE) formulates expansion as a statistical decision problem, ensuring controlled growth with guarantees on false alarms and detection delay; (ii) Dual-Stage Contrastive-Distillation Adaptation (DCDA) integrates new slots through supervised contrastive learning and knowledge distillation, preserving prior knowledge without raw replay. Experiments on diverse benchmarks show that MARS achieves state-of-the-art performance in continual learning with frozen LPMs, combining adaptability, efficiency, and retention. Published as a conference paper at ICLR 2026 niques (Rusu et al., 2016; Yoon et al., 2018; Dong et al., 2024) add new capacity for novel tasks, but they often rely on heuristic triggers that may cause uncontrolled growth. Prototype-based methods (De Lange & Tuytelaars, 2021; Liu et al., 2025; Zhu et al., 2025) compress historical knowledge into compact memory structures, improving efficiency but showing fragility under distribution shifts. Although these strategies offer useful insights, they are designed for conventional architectures rather than frozen LPMs. In parameter-efficient settings, shallow adapters have limited expressive power, and heuristic retention does not provide formal guarantees. Recent studies have begun to examine continual learning in the context of large pre-trained models. Adapter-based approaches (Ke et al., 2021a; Wang et al., 2022) improve efficiency but still suffer from forgetting as tasks accumulate. In the vision-language domain, methods such as VLM-CIL (Liu et al., 2023) , DIKI (Tang et al., 2024), and CoLeCLIP (Li et al., 2025) highlight both the promise and the fragility of frozen encoders. Parameter-efficient modules preserve adaptability, but retention often depends on heuristic replay or task-specific tuning. Recent designs, including dynamic LoRA ranks and mixture-of-expert adapters (Hu et al., 2022), provide partial relief but still rely on ad-hoc expansion rules and lack formal guarantees. Together, these efforts underscore a persistent gap: current methods demonstrate the feasibility of continual learning with frozen LPMs but do not provide principled mechanisms for expansion and retention.