Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Exact Adversarial Attack to Image Captioning via Structured Output Learning with Latent Variables (1905.04016v1)

Published 10 May 2019 in cs.CV and cs.AI

Abstract: In this work, we study the robustness of a CNN+RNN based image captioning system being subjected to adversarial noises. We propose to fool an image captioning system to generate some targeted partial captions for an image polluted by adversarial noises, even the targeted captions are totally irrelevant to the image content. A partial caption indicates that the words at some locations in this caption are observed, while words at other locations are not restricted.It is the first work to study exact adversarial attacks of targeted partial captions. Due to the sequential dependencies among words in a caption, we formulate the generation of adversarial noises for targeted partial captions as a structured output learning problem with latent variables. Both the generalized expectation maximization algorithm and structural SVMs with latent variables are then adopted to optimize the problem. The proposed methods generate very successful at-tacks to three popular CNN+RNN based image captioning models. Furthermore, the proposed attack methods are used to understand the inner mechanism of image captioning systems, providing the guidance to further improve automatic image captioning systems towards human captioning.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Yan Xu (258 papers)
  2. Baoyuan Wu (107 papers)
  3. Fumin Shen (50 papers)
  4. Yanbo Fan (46 papers)
  5. Yong Zhang (660 papers)
  6. Heng Tao Shen (117 papers)
  7. Wei Liu (1135 papers)
Citations (55)

Summary

We haven't generated a summary for this paper yet.