RobustBench: a standardized adversarial robustness benchmark (2010.09670v3)

Published 19 Oct 2020 in cs.LG, cs.CR, cs.CV, and stat.ML

Abstract: As a research community, we are still lacking a systematic understanding of the progress on adversarial robustness which often makes it hard to identify the most promising ideas in training robust models. A key challenge in benchmarking robustness is that its evaluation is often error-prone leading to robustness overestimation. Our goal is to establish a standardized benchmark of adversarial robustness, which as accurately as possible reflects the robustness of the considered models within a reasonable computational budget. To this end, we start by considering the image classification task and introduce restrictions (possibly loosened in the future) on the allowed models. We evaluate adversarial robustness with AutoAttack, an ensemble of white- and black-box attacks, which was recently shown in a large-scale study to improve almost all robustness evaluations compared to the original publications. To prevent overadaptation of new defenses to AutoAttack, we welcome external evaluations based on adaptive attacks, especially where AutoAttack flags a potential overestimation of robustness. Our leaderboard, hosted at https://robustbench.github.io/, contains evaluations of 120+ models and aims at reflecting the current state of the art in image classification on a set of well-defined tasks in $\ell_\infty$- and $\ell_2$-threat models and on common corruptions, with possible extensions in the future. Additionally, we open-source the library https://github.com/RobustBench/robustbench that provides unified access to 80+ robust models to facilitate their downstream applications. Finally, based on the collected models, we analyze the impact of robustness on the performance on distribution shifts, calibration, out-of-distribution detection, fairness, privacy leakage, smoothness, and transferability.

Citations (610)

View on Semantic Scholar

Summary

The paper introduces a standardized evaluation benchmark employing AutoAttack to consistently assess worst-case adversarial robustness in deep learning models.
The study rigorously defines threat models and excludes architectures with gradient or stochastic issues to ensure genuine robustness claims.
The benchmark features a public leaderboard and model zoo that reveal robustness gaps and inform more reliable defense strategies.

Essay on "RobustBench: a standardized adversarial robustness benchmark"

The paper "RobustBench: a standardized adversarial robustness benchmark" addresses the critical issue of evaluating adversarial robustness in the field of machine learning, particularly in the context of deep learning models for image classification. Despite substantial research in this area, inconsistencies in evaluating adversarial defenses have persisted, often leading to overestimation of models' robustness. The paper introduces a comprehensive benchmarking framework called RobustBench, aiming to provide a standardized and reliable evaluation of adversarial robustness.

Key Contributions

Robustness Evaluation Standardization: The authors propose a standardized benchmark leveraging AutoAttack, an ensemble of diverse, parameter-free attacks. This approach incorporates both white-box and black-box methods, providing a robust baseline evaluation. The systematic adoption of AutoAttack allows for worst-case performance estimation, addressing discrepancies previously observed in robustness claims across various studies.
Model and Threat Model Specifications: The paper rigorously defines threat models, focusing primarily on the well-studied $\ell_\infty$ and $\ell_2$ perturbations. To ensure robustness claims are genuine, the benchmark imposes certain restrictions on model architectures, excluding those with zero gradients, stochastic elements, or internal optimization loops at inference time.
Leaderboard and Model Zoo: A public leaderboard is established to track the progress of adversarial robustness research, facilitating comparison and revealing effective defense strategies. Complementarily, a Model Zoo is curated, offering easy access to a collection of robust models for downstream applications and further research.
Comprehensive Performance Analysis: The authors conduct a thorough analysis of collected models, exploring robustness effects on distribution shifts, calibration, out-of-distribution detection, and other metrics like fairness and privacy leakage.

Numerical Results and Implications

The benchmark hosts over 120 models, revealing significant gaps between claimed and actual robustness: several models exhibited substantial robustness overestimation. This initiative encourages a shift towards more reliable evaluation practices, catalyzing advancements in adversarial defense techniques. By highlighting successful strategies, RobustBench aims to expedite progress in the field.

Theoretical and Practical Implications

Theoretically, this standardized evaluation framework lays the groundwork for more rigorous analysis of adversarial defenses, potentially guiding researchers towards genuinely robust model designs. Practically, RobustBench serves as a valuable resource for developers seeking reliable models in security-critical applications, such as autonomous systems and healthcare.

Future Directions and Developments

Future work is likely to extend RobustBench's capabilities, incorporating new threat models and broadening its scope beyond image classification. The benchmark may evolve to address real-world challenges, like adaptive attacks in dynamic environments or emerging adversarial settings in different domains.

In conclusion, RobustBench represents a significant step towards consolidating adversarial robustness evaluation methodologies. By setting a high standard for robustness claims, it endeavors to foster transparency and advance the state-of-the-art in building fortified machine learning models. This initiative not only enhances existing defensive strategies but also equips the research community with a robust toolkit to tackle evolving adversarial challenges.

PDF Markdown

Related Papers

GitHub

GitHub - RobustBench/robustbench: RobustBench: a standardized adversarial robustness benchmark [NeurIPS'21 Benchmarks and Datasets Track]

Tweets

https://twitter.com/maksym_andr/status/1785677575576039800