Intermediate Level Adversarial Attack for Enhanced Transferability (1811.08458v1)
Abstract: Neural networks are vulnerable to adversarial examples, malicious inputs crafted to fool trained models. Adversarial examples often exhibit black-box transfer, meaning that adversarial examples for one model can fool another model. However, adversarial examples may be overfit to exploit the particular architecture and feature representation of a source model, resulting in sub-optimal black-box transfer attacks to other target models. This leads us to introduce the Intermediate Level Attack (ILA), which attempts to fine-tune an existing adversarial example for greater black-box transferability by increasing its perturbation on a pre-specified layer of the source model. We show that our method can effectively achieve this goal and that we can decide a nearly-optimal layer of the source model to perturb without any knowledge of the target models.
- Qian Huang (55 papers)
- Zeqi Gu (8 papers)
- Isay Katsman (12 papers)
- Horace He (12 papers)
- Pian Pawakapan (3 papers)
- Zhiqiu Lin (19 papers)
- Serge Belongie (125 papers)
- Ser-Nam Lim (116 papers)