HQA-Attack: Toward High Quality Black-Box Hard-Label Adversarial Attack on Text (2402.01806v1)
Abstract: Black-box hard-label adversarial attack on text is a practical and challenging task, as the text data space is inherently discrete and non-differentiable, and only the predicted label is accessible. Research on this problem is still in the embryonic stage and only a few methods are available. Nevertheless, existing methods rely on the complex heuristic algorithm or unreliable gradient estimation strategy, which probably fall into the local optimum and inevitably consume numerous queries, thus are difficult to craft satisfactory adversarial examples with high semantic similarity and low perturbation rate in a limited query budget. To alleviate above issues, we propose a simple yet effective framework to generate high quality textual adversarial examples under the black-box hard-label attack scenarios, named HQA-Attack. Specifically, after initializing an adversarial example randomly, HQA-attack first constantly substitutes original words back as many as possible, thus shrinking the perturbation rate. Then it leverages the synonym set of the remaining changed words to further optimize the adversarial example with the direction which can improve the semantic similarity and satisfy the adversarial condition simultaneously. In addition, during the optimizing procedure, it searches a transition synonym word for each changed word, thus avoiding traversing the whole synonym set and reducing the query number to some extent. Extensive experimental results on five text classification datasets, three natural language inference datasets and two real-world APIs have shown that the proposed HQA-Attack method outperforms other strong baselines significantly.
- Generating natural language adversarial examples. In EMNLP, pages 2890–2896, 2018.
- HAPTR2: improved haptic transformer for legged robots’ terrain classification. Robotics and Autonomous Systems, 158:104236, 2022.
- A large annotated corpus for learning natural language inference. In EMNLP, pages 632–642, 2015.
- Universal sentence encoder for english. In EMNLP, pages 169–174, 2018.
- Why should adversarial perturbations be imperceptible? rethink the research paradigm in adversarial NLP. In EMNLP, pages 11222–11237, 2022.
- Seq2sick: Evaluating the robustness of sequence-to-sequence models with adversarial examples. In AAAI, pages 3601–3608. AAAI, 2020.
- Robust neural machine translation with doubly adversarial inputs. In ACL, pages 4324–4333. ACL, 2019.
- BERT: pre-training of deep bidirectional transformers for language understanding. In NAACL, pages 4171–4186, 2019.
- Hotflip: White-box adversarial examples for text classification. In ACL, pages 31–36, 2018.
- BAE: bert-based adversarial examples for text classification. In Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu, editors, EMNLP, pages 6174–6181, 2020.
- Explaining and harnessing adversarial examples. In ICLR, 2015.
- Gradient-based adversarial attacks against text transformers. In EMNLP, pages 5747–5757, 2021.
- Deberta: decoding-enhanced bert with disentangled attention. In ICLR, 2021.
- Long short-term memory. Neural Computation, pages 1735–1780, 1997.
- On the robustness of self-attentive models. In ACL, pages 1520–1529. ACL, 2019.
- DEEP: denoising entity pre-training for neural machine translation. In ACL, pages 1753–1766, 2022.
- Is BERT really robust? A strong baseline for natural language attack on text classification and entailment. In AAAI, pages 8018–8025, 2020.
- Yoon Kim. Convolutional neural networks for sentence classification. In EMNLP, pages 1746–1751, 2014.
- SHIELD: defending textual neural networks against multiple black-box adversarial attacks with stochastic multi-expert patcher. In ACL, pages 6661–6674, 2022.
- Query-efficient and scalable black-box adversarial attacks on discrete sequential data via bayesian optimization. In ICML, pages 12478–12497, 2022.
- Contextualized perturbation for textual adversarial attack. In NAACL, pages 5053–5069, 2021.
- Textbugger: Generating adversarial text against real-world applications. In NDSS, 2019.
- BERT-ATTACK: adversarial attack against BERT using BERT. In EMNLP, pages 6193–6202, 2020.
- Learning word vectors for sentiment analysis. In ACL, pages 142–150, 2011.
- Generating natural language attacks in a hard label black box setting. In AAAI, pages 13525–13533, 2021.
- A strong baseline for query efficient attacks in a black box setting. In EMNLP, pages 8396–8409, 2021.
- A geometry-inspired attack for generating natural language adversarial examples. In COLING, pages 6679–6689, 2020.
- Regularizing self-attention on vision transformers with 2d spatial distance loss. Artif. Life Robotics, 27(3):586–593, 2022.
- Counter-fitting word vectors to linguistic constraints. In NAACL, pages 142–148, 2016.
- Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In ACL, pages 115–124, 2005.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21:140:1–140:67, 2020.
- Alephbert: Language model pre-training and evaluation from sub-word to sentence level. In ACL, pages 46–56, 2022.
- Physically realizable adversarial examples for lidar object detection. In CVPR, pages 13713–13722, 2020.
- A broad-coverage challenge corpus for sentence understanding through inference. In NAACL, pages 1112–1122, 2018.
- Cvt: Introducing convolutions to vision transformers. In ICCV, pages 22–31, 2021.
- Contrastive learning-based robust object detection under smoky conditions. In CVPR, pages 4294–4301, 2022.
- Adversarial examples for semantic segmentation and object detection. In ICCV, pages 1378–1387, 2017.
- Leapattack: Hard-label adversarial attack on text via gradient-based optimization. In KDD, pages 2307–2315, 2022.
- Texthoaxer: Budgeted hard-label adversarial attacks on text. In AAAI, pages 3877–3884, 2022.
- Stealthy porn: Understanding real-world adversarial images for illicit online promotion. In IEEE S&P, pages 952–966, 2019.
- Word-level textual adversarial attacking as combinatorial optimization. In ACL, pages 6066–6080, 2020.
- An empirical study on adversarial attack on NMT: languages and positions matter. In ACL, pages 454–460, 2021.
- Character-level convolutional networks for text classification. In NeurIPS, pages 649–657, 2015.
- Defense against synonym substitution-based adversarial attacks via dirichlet neighborhood ensemble. In ACL, pages 5482–5492, 2021.
- Han Liu (340 papers)
- Zhi Xu (53 papers)
- Xiaotong Zhang (28 papers)
- Feng Zhang (180 papers)
- Fenglong Ma (66 papers)
- Hongyang Chen (61 papers)
- Hong Yu (114 papers)
- Xianchao Zhang (15 papers)