Label Universal Targeted Attack (1905.11544v2)

Published 27 May 2019 in cs.CR, cs.CV, and cs.LG

Abstract: We introduce Label Universal Targeted Attack (LUTA) that makes a deep model predict a label of attacker's choice for `any' sample of a given source class with high probability. Our attack stochastically maximizes the log-probability of the target label for the source class with first order gradient optimization, while accounting for the gradient moments. It also suppresses the leakage of attack information to the non-source classes for avoiding the attack suspicions. The perturbations resulting from our attack achieve high fooling ratios on the large-scale ImageNet and VGGFace models, and transfer well to the Physical World. Given full control over the perturbation scope in LUTA, we also demonstrate it as a tool for deep model autopsy. The proposed attack reveals interesting perturbation patterns and observations regarding the deep models.

Citations (5)

View on Semantic Scholar

Summary

The paper introduces the Label Universal Targeted Attack (LUTA), a novel adversarial technique that forces a deep learning model to predict a specific target label for *any* sample from a chosen source class, differing from attacks targeting individual samples or causing general misclassification.
LUTA employs stochastic gradient optimization with variants constrained by "l_" ```
The technique boasts high fooling ratios (averaging >80%) with minimal perturbation and low leakage to non-source classes, demonstrating efficacy across diverse models and datasets like ImageNet and VGGFace, including successful demonstrations in the physical world.
LUTA provides valuable insights for understanding model robustness and vulnerabilities; its unbounded variant, LUTA-U, specifically serves as a tool for model autopsy, potentially revealing non-optimization-based attack methods and aiding interpretability.

Insightful Overview of "Label Universal Targeted Attack"

The paper "Label Universal Targeted Attack," authored by Naveed Akhtar et al., presents an innovative approach within the domain of adversarial attacks on deep learning models, specifically introducing the Label Universal Targeted Attack (LUTA). This technique is designed to coerce a deep learning model into predicting a target label of the attacker’s choice for any sample from a specified source class. The novelty of LUTA lies in its ability to affect an entire class with a single perturbation, which is a significant departure from both targeted attacks on specific samples and universal untargeted attacks that misclassify any input to any incorrect class.

Core Concept and Methodology

LUTA is proposed as an iterative algorithm deploying stochastic gradient-based optimization, where log-probability maximization of the target label is conducted while suppressing information leakage to non-source classes. This paper explores three variants of LUTA, characterized by different norm constraints on the perturbations: bounded by $\ell_\infty$ and $\ell_2$ norms, and an unbounded form termed LUTA-U. The latter is particularly interesting due to its potential use as a tool for model autopsy to explore decision boundaries.

Extensive experiments demonstrate LUTA's efficacy on models such as VGG-16 and ResNet-50 trained on ImageNet, and on the VGGFace model for identity switching. These experiments leveraged thousands of source class samples and a broad range of target labels, consistently achieving high fooling ratios with imperceptible perturbations ( $\eta$ = 15 for $\ell_\infty$ and 4500 for $\ell_2$ norms).

Numerical Results and Observations

The empirical results convey the robustness of LUTA across multiple architectures and datasets, reflectively indicating its potency. The perturbations did not merely succeed in achieving high fooling rates, averaging over 80% in most settings, but also maintained relatively low leakage to non-source classes—an essential consideration for targeted attacks to mitigate suspicion.

A particularly compelling result is LUTA's extension into the Physical World where adversarial samples were printed and subsequently classified via live webcam feeds, indicating plausible real-world applicability of the perturbations without digital manipulation.

Theoretical Implications and Future Directions

LUTA introduces a new dimension to adversarial example generation by focusing on the broader class-level manipulation which could inform future research on improving model robustness at a more foundational level. The observations on perturbation patterns, that is, the emergence of target label features, suggest potential non-optimization methods for adversarial attacks, which could reduce computational costs in future explorations.

The unbounded LUTA variant, LUTA-U, particularly enriches the understanding of model decision profiles, identifying nuanced geometrical relationships in learned representations by deep models. This autopsy-like functionality opens avenues for model interpretability and insights into architectural vulnerability.

Conclusion

This research contributes a significant methodological advancement in adversarial attack strategies, demonstrating its efficacy both in digital and physical settings. By refining the control of adversarial perturbations to class-specific label transfer, it prompts crucial considerations in model robustness and defense mechanisms. Future research could build on these insights, potentially yielding more resilient model architectures or novel applications in model interpretability, thus broadening the scope and understanding of adversarial impacts in deep learning.

PDF Markdown

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Related Papers

Authors (4)

YouTube

Show All Videos