Adversarial robustness in standalone image recognition
Determine effective techniques to achieve adversarial robustness in standalone image recognition models that reliably prevent adversarial perturbations from causing misclassification, thereby resolving the long-standing challenge of robustness for single-task image classifiers.
References
Notably, while adversarial robustness in standalone image recognition remains an open challenge, circuit breakers allow the larger multimodal system to reliably withstand image ``hijacks'' that aim to produce harmful content.
— Improving Alignment and Robustness with Circuit Breakers
(2406.04313 - Zou et al., 6 Jun 2024) in Abstract