KDD2022

LeapAttack: Hard-Label Adversarial Attack on Text via Gradient-Based Optimization

Muchao Ye, Jinghui Chen, Chenglin Miao, Ting Wang, Fenglong Ma

16 citations

Abstract

Generating text adversarial examples in the hard-label setting is a more realistic and challenging black-box adversarial attack problem, whose challenge comes from the fact that gradient cannot be directly calculated from discrete word replacements. Consequently, the effectiveness of gradient-based methods for this problem still awaits improvement. In this paper, we propose a gradient-based optimization method named LeapAttack to craft high-quality text adversarial examples in the hard-label setting. To specify, LeapAttack employs the word embedding space to characterize the semantic deviation between the two words of each perturbed substitution by their difference vector. Facilitated by this expression, LeapAttack gradually updates the perturbation direction and constructs adversarial examples in an iterative round trip: firstly, the gradient is estimated by transforming randomly sampled word candidates to continuous difference vectors after moving the current adversarial example near the decision boundary; secondly, the estimated gradient is mapped back to a new substitution word based on the cosine similarity metric. Extensive experimental results show that in the general case LeapAttack can efficiently generate high-quality text adversarial examples with the highest semantic similarity and the lowest perturbation rate in the hard-label setting.