Adversarial robustness in standalone image recognition

Determine effective techniques to achieve adversarial robustness in standalone image recognition models that reliably prevent adversarial perturbations from causing misclassification, thereby resolving the long-standing challenge of robustness for single-task image classifiers.

Background

The paper notes that despite progress in multimodal settings, achieving adversarial robustness for standalone image classifiers has historically involved steep trade-offs in accuracy, and widely used defenses have struggled to reliably withstand strong attacks. This is highlighted as a broader field-level challenge distinct from the paper’s focus on circuit breakers for generative systems.

The authors emphasize that their circuit-breaking approach provides robustness in multimodal systems but explicitly acknowledge that robust adversarial defenses for standalone image recognition remain unresolved.

References

Notably, while adversarial robustness in standalone image recognition remains an open challenge, circuit breakers allow the larger multimodal system to reliably withstand image ``hijacks'' that aim to produce harmful content.

— Improving Alignment and Robustness with Circuit Breakers (2406.04313 - Zou et al., 6 Jun 2024) in Abstract

Adversarial robustness in standalone image recognition

Sponsor

Background

References

Related Problems