Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
124 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Impact of Adversarial Training on Robustness and Generalizability of Language Models (2211.05523v3)

Published 10 Nov 2022 in cs.CL and cs.AI

Abstract: Adversarial training is widely acknowledged as the most effective defense against adversarial attacks. However, it is also well established that achieving both robustness and generalization in adversarially trained models involves a trade-off. The goal of this work is to provide an in depth comparison of different approaches for adversarial training in LLMs. Specifically, we study the effect of pre-training data augmentation as well as training time input perturbations vs. embedding space perturbations on the robustness and generalization of transformer-based LLMs. Our findings suggest that better robustness can be achieved by pre-training data augmentation or by training with input space perturbation. However, training with embedding space perturbation significantly improves generalization. A linguistic correlation analysis of neurons of the learned models reveals that the improved generalization is due to 'more specialized' neurons. To the best of our knowledge, this is the first work to carry out a deep qualitative analysis of different methods of generating adversarial examples in adversarial training of LLMs.

Citations (8)

Summary

We haven't generated a summary for this paper yet.