Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Evaluating the Susceptibility of Pre-Trained Language Models via Handcrafted Adversarial Examples (2209.02128v1)

Published 5 Sep 2022 in cs.CL

Abstract: Recent advances in the development of LLMs have resulted in public access to state-of-the-art pre-trained LLMs (PLMs), including Generative Pre-trained Transformer 3 (GPT-3) and Bidirectional Encoder Representations from Transformers (BERT). However, evaluations of PLMs, in practice, have shown their susceptibility to adversarial attacks during the training and fine-tuning stages of development. Such attacks can result in erroneous outputs, model-generated hate speech, and the exposure of users' sensitive information. While existing research has focused on adversarial attacks during either the training or the fine-tuning of PLMs, there is a deficit of information on attacks made between these two development phases. In this work, we highlight a major security vulnerability in the public release of GPT-3 and further investigate this vulnerability in other state-of-the-art PLMs. We restrict our work to pre-trained models that have not undergone fine-tuning. Further, we underscore token distance-minimized perturbations as an effective adversarial approach, bypassing both supervised and unsupervised quality measures. Following this approach, we observe a significant decrease in text classification quality when evaluating for semantic similarity.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Hezekiah J. Branch (1 paper)
  2. Jonathan Rodriguez Cefalu (2 papers)
  3. Jeremy McHugh (2 papers)
  4. Leyla Hujer (1 paper)
  5. Aditya Bahl (1 paper)
  6. Daniel del Castillo Iglesias (1 paper)
  7. Ron Heichman (1 paper)
  8. Ramesh Darwishi (1 paper)
Citations (42)

Summary

We haven't generated a summary for this paper yet.