Papers
Topics
Authors
Recent
2000 character limit reached

Defense-Prefix for Preventing Typographic Attacks on CLIP

Published 10 Apr 2023 in cs.CV | (2304.04512v3)

Abstract: Vision-language pre-training models (VLPs) have exhibited revolutionary improvements in various vision-language tasks. In VLP, some adversarial attacks fool a model into false or absurd classifications. Previous studies addressed these attacks by fine-tuning the model or changing its architecture. However, these methods risk losing the original model's performance and are difficult to apply to downstream tasks. In particular, their applicability to other tasks has not been considered. In this study, we addressed the reduction of the impact of typographic attacks on CLIP without changing the model parameters. To achieve this, we expand the idea of "prefix learning" and introduce our simple yet effective method: Defense-Prefix (DP), which inserts the DP token before a class name to make words "robust" against typographic attacks. Our method can be easily applied to downstream tasks, such as object detection, because the proposed method is independent of the model parameters. Our method significantly improves the accuracy of classification tasks for typographic attack datasets, while maintaining the zero-shot capabilities of the model. In addition, we leverage our proposed method for object detection, demonstrating its high applicability and effectiveness. The codes and datasets are available at https://github.com/azuma164/Defense-Prefix.

Citations (7)

Summary

  • The paper introduces Defense-Prefix, a novel token-based mechanism that counters typographic attacks on CLIP without modifying model parameters.
  • The methodology leverages Defense and Identity Losses to align text features and maintain semantic consistency, achieving up to 17.70% boost on real-world datasets.
  • When applied to RegionCLIP, the approach enhances object detection resilience to adversarial text without fine-tuning, ensuring robust zero-shot learning.

Defense-Prefix for Preventing Typographic Attacks on CLIP

Introduction

The paper "Defense-Prefix for Preventing Typographic Attacks on CLIP" addresses the susceptibility of Vision-Language Pre-trained (VLP) models, specifically CLIP, to typographic attacks. Typographic attacks involve manipulating text within images to elicit misclassifications by the model. The authors introduce a method called Defense-Prefix (DP) to mitigate this vulnerability by enhancing the robustness against such attacks without altering the underlying model parameters. This novel approach leverages a token-based prefix strategy that can be seamlessly integrated into downstream tasks like object detection.

Problem Statement

VLP models like CLIP are known for their high accuracy in zero-shot learning across diverse vision-language tasks. However, they are vulnerable to typographic attacks, where textual content within images leads to misclassification (Figure 1). Traditional defenses modify model parameters through methods such as fine-tuning or architectural changes, which can degrade performance and complicate downstream task integration. Figure 1

Figure 1: (a) Image of a dog with a yellow tag that states ``mouse''. (b) Misclassification in CLIP against the image.

Defense-Prefix Methodology

Defense-Prefix introduces a token, [DP], appended before class names in text prompts processed by CLIP's text encoder. This strategy builds on class-prefix learning, commonly used in subject-driven image generation, to create text features resistant to typographic manipulation. The proposed DP learning involves two key components:

  1. Defense Loss: This cross-entropy loss ensures the learned DP token effectively neutralizes typographic attacks by aligning text features (Figures 2 and 3).
  2. Identity Loss: Utilizes KL-divergence to maintain semantic consistency between original and altered text features, preserving the model's classification accuracy on unaltered datasets (Figure 2). Figure 2

    Figure 2: Method overview. We keep the image encoder and text encoder of CLIP frozen. Our method trains only the DP vector, which is a word embedding for [DP].

    Figure 3

    Figure 3: Typographic attack datasets. (Left: a sample from synthetic typographic attack datasets, Right: a sample from our real-world typographic attack dataset.)

Experimental Results

The authors validate their approach using both synthetic and real-world typographic attack datasets. The Defense-Prefix demonstrates significant performance improvements, with typographic attack accuracy increasing by 9.61% on synthetic datasets and 17.70% on real-world datasets while maintaining nearly unchanged accuracy on original datasets.

The RTA-100 dataset, curated for the study, provided a robust real-world typographic dataset for training and evaluation, further proving DP's effectiveness (Figure 4). Figure 4

Figure 4: Images sampled from our typographic attack COCO dataset. The dataset consists of images from COCO with synthesized text.

Application to Object Detection

By applying DP in the RegionCLIP framework for object detection, the method shows substantial resilience against typographic attacks without compromising accuracy in standard inference scenarios. The implementation does not require fine-tuning RegionCLIP, emphasizing the method's compatibility with existing VLP applications (Figure 5). Figure 5

Figure 5: Visualization of RegionCLIP and RegionCLIP+Ours zero-shot inference on the typographic attack COCO dataset.

Conclusion

Defense-Prefix presents a scalable, non-intrusive solution to enhance VLP model robustness against typographic attacks without sacrificing original performance. The proposed method preserves zero-shot learning efficacy, requires minimal computational adjustments, and is applicable across various downstream tasks. Future work could explore similar strategies against other adversarial attack forms on VLP models.

This paper elucidates a versatile approach, contributing significantly to secure and reliable real-world implementations of VLP models.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.