Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models (2110.07831v1)

Published 15 Oct 2021 in cs.CL and cs.LG

Abstract: Backdoor attacks, which maliciously control a well-trained model's outputs of the instances with specific triggers, are recently shown to be serious threats to the safety of reusing deep neural networks (DNNs). In this work, we propose an efficient online defense mechanism based on robustness-aware perturbations. Specifically, by analyzing the backdoor training process, we point out that there exists a big gap of robustness between poisoned and clean samples. Motivated by this observation, we construct a word-based robustness-aware perturbation to distinguish poisoned samples from clean samples to defend against the backdoor attacks on NLP models. Moreover, we give a theoretical analysis about the feasibility of our robustness-aware perturbation-based defense method. Experimental results on sentiment analysis and toxic detection tasks show that our method achieves better defending performance and much lower computational costs than existing online defense methods. Our code is available at https://github.com/lancopku/RAP.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Wenkai Yang (24 papers)
  2. Yankai Lin (125 papers)
  3. Peng Li (390 papers)
  4. Jie Zhou (688 papers)
  5. Xu Sun (194 papers)
Citations (88)

Summary

We haven't generated a summary for this paper yet.