Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Self-Supervised Contrastive Learning with Adversarial Perturbations for Defending Word Substitution-based Attacks (2107.07610v3)

Published 15 Jul 2021 in cs.CL

Abstract: In this paper, we present an approach to improve the robustness of BERT LLMs against word substitution-based adversarial attacks by leveraging adversarial perturbations for self-supervised contrastive learning. We create a word-level adversarial attack generating hard positives on-the-fly as adversarial examples during contrastive learning. In contrast to previous works, our method improves model robustness without using any labeled data. Experimental results show that our method improves robustness of BERT against four different word substitution-based adversarial attacks, and combining our method with adversarial training gives higher robustness than adversarial training alone. As our method improves the robustness of BERT purely with unlabeled data, it opens up the possibility of using large text datasets to train robust LLMs against word substitution-based adversarial attacks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Zhao Meng (14 papers)
  2. Yihan Dong (2 papers)
  3. Mrinmaya Sachan (124 papers)
  4. Roger Wattenhofer (212 papers)
Citations (9)

Summary

We haven't generated a summary for this paper yet.