Preserving Semantics in Textual Adversarial Attacks (2211.04205v2)

Published 8 Nov 2022 in cs.CL and cs.AI

Abstract: The growth of hateful online content, or hate speech, has been associated with a global increase in violent crimes against minorities [23]. Harmful online content can be produced easily, automatically and anonymously. Even though, some form of auto-detection is already achieved through text classifiers in NLP, they can be fooled by adversarial attacks. To strengthen existing systems and stay ahead of attackers, we need better adversarial attacks. In this paper, we show that up to 70% of adversarial examples generated by adversarial attacks should be discarded because they do not preserve semantics. We address this core weakness and propose a new, fully supervised sentence embedding technique called Semantics-Preserving-Encoder (SPE). Our method outperforms existing sentence encoders used in adversarial attacks by achieving 1.2x - 5.1x better real attack success rate. We release our code as a plugin that can be used in any existing adversarial attack to improve its quality and speed up its execution.

References (6)

Citations (6)

View on Semantic Scholar

Collections

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Preserving Semantics in Textual Adversarial Attacks (2211.04205v2)

Collections

Summary

Follow-up Questions

Authors (3)

Don't miss out on important new AI/ML research

Preserving Semantics in Textual Adversarial Attacks (2211.04205v2)

Collections

Summary

Follow-up Questions

Related Papers

Authors (3)

Don't miss out on important new AI/ML research