ICLR2026

Discrete Latent Features Ablate Adversarial Attack: A Robust Prompt Tuning Framework for VLMs

Yang Chen, Yanbin Wei, James Kwok, Yu Zhang

摘要

While adversarial fine-tuning can enhance the robustness of vision-language models (VLMs), such approaches are computationally expensive. Adversarial prompt tuning has emerged as a practical alternative. However, existing methods are limited by their reliance on vulnerable continuous image features. To mitigate the vulnerability in the feature representation, we propose DEFEAT (Discrete LatEnt FeaturE based Adversarial Training), a robust prompt tuning framework for VLMs. Specifically, the DEFEAT method introduces a perturbation discrete shield module that reconstructs discrete latent features and designs a logits fusion strategy, substantially reducing the discrepancy between clean and adversarial image representations. Moreover, the DEFEAT method integrates prompt tuning with adversarial training while applying regularization from learnable prompts to hand-crafted prompts, further enhancing the adversarial robustness. Extensive experiments across 15 datasets validate the effectiveness of the proposed DEFEAT method among existing adversarial prompt tuning methods. The official code is available at https://github.com/cheny02/DEFEAT-ICLR2026 . * Equal contribution. † Corresponding author. RELATED WORK CLIP-based VLMs. VLMs have significantly boosted cognitive capabilities by merging visual and textual modalities, excelling in real-world vision tasks (Liu et al., 2023; Zhu et al., 2024) . The introduction of CLIP (Radford et al., 2021) , trained on about 400 million image-text pairs, was particularly transformative, establishing a new paradigm for vision-language representation learning. Numerous subsequent works have followed this paradigm, proposing a broad family of CLIP-like models, including ALIGN (Jia et al., 2021), EVA-CLIP (Sun et al., 2023), OpenCLIP (Ilharco et al.,