NeurIPS2025

HQA-VLAttack: Towards High Quality Adversarial Attack on Vision-Language Pre-Trained Models

Han Liu, Jiaqi Li, Zhi Xu, Xiaotong Zhang, Xiaoming Xu, Fenglong Ma, Yuanman Li, Hong Yu

摘要

Vision-Language Pre-training (VLP) models have become a cornerstone for cross-modal tasks, achieving remarkable success in applications such as image-text retrieval [39, 4, 7] , image captioning [27] , and visual grounding [20] . However, research has shown that these models are vulnerable to adversarial attacks [19, 10, 37, 6, 11] , posing significant societal concerns. Adversarial attacks inject imperceptible perturbations to text and image inputs, aiming to manipulate predictions of victim VLP models maliciously. Specifically, existing attacks can be broadly categorized into white-box attacks [37, 21, 32] and black-box attacks [19, 10, 34, 14, 5] . In white-box attacks, attackers have full access to the victim model, allowing them to exploit gradients for highly effective attacks. However, the white-box setting can be too idealistic in real-world scenarios. In contrast, black-box attacks assume limited access to the victim model, such as confidence scores or prediction labels, making them more practical for real-world applications. Black-box attacks can be categorized into query-based attacks [34, 14, 5, 18, 17] and transfer-based attacks [19, 10, 35, 38] . Query-based attacks employ an iterative cross-search strategy that requires repeatedly querying the victim model and utilizing its feedback to refine adversarial perturbations. While effective, these methods incur substantial query costs, limiting their practicality in real-world applications. In contrast, transfer-based attacks generate adversarial examples by optimizing them on a surrogate model, leveraging feature similarity and generalization to maintain their effectiveness against unseen victim models without requiring queries. Due to their independence from direct access to the victim model, transfer-based attacks are particularly well-suited for real-world adversarial scenarios, making their enhancement a critical research focus.