Adversarial Diversity and Hard Positive Generation (1605.01775v2)

Published 5 May 2016 in cs.CV

Abstract: State-of-the-art deep neural networks suffer from a fundamental problem - they misclassify adversarial examples formed by applying small perturbations to inputs. In this paper, we present a new psychometric perceptual adversarial similarity score (PASS) measure for quantifying adversarial images, introduce the notion of hard positive generation, and use a diverse set of adversarial perturbations - not just the closest ones - for data augmentation. We introduce a novel hot/cold approach for adversarial example generation, which provides multiple possible adversarial perturbations for every single image. The perturbations generated by our novel approach often correspond to semantically meaningful image structures, and allow greater flexibility to scale perturbation-amplitudes, which yields an increased diversity of adversarial images. We present adversarial images on several network topologies and datasets, including LeNet on the MNIST dataset, and GoogLeNet and ResidualNet on the ImageNet dataset. Finally, we demonstrate on LeNet and GoogLeNet that fine-tuning with a diverse set of hard positives improves the robustness of these networks compared to training with prior methods of generating adversarial images.

Citations (253)

View on Semantic Scholar

Summary

The paper introduces a novel Perceptual Adversarial Similarity Score (PASS) that quantifies adversarial perturbations in line with human perception.
The paper presents a hot/cold adversarial generation approach that creates diverse perturbations to improve network training.
The paper demonstrates that incorporating hard positive examples into training extends decision boundaries and significantly lowers error rates.

Adversarial Diversity and Hard Positive Generation: An Analytical Overview

Introduction

The research paper "Adversarial Diversity and Hard Positive Generation" by Andras Rozsa, Ethan M. Rudd, and Terrance E. Boult focuses on addressing a critical vulnerability observed in state-of-the-art deep neural networks: their susceptibility to adversarial examples. These adversarial examples are minor perturbations in the input data that lead to misclassification by networks. The authors propose methods to quantify and generate a diverse set of adversarial examples and introduce the concept of hard positive generation. These hard positives aid in enhancing network robustness through data augmentation strategies.

Contributions

The authors contribute to the field of adversarial machine learning in several ways:

Perceptual Adversarial Similarity Score (PASS): They introduce PASS as a new measure which accurately quantifies the perceptibility of adversarial images, aligning more closely with human perception than traditional $L_p$ norm metrics.
Diverse Adversarial Generation: A novel adversarial generation strategy based on a hot/cold approach is proposed. This method allows for the generation of multiple adversarial perturbations for each input image, enhancing the diversity of adversarial samples.
Hard Positive Generation: The authors advocate for the inclusion of non-minimal perturbations, or hard positives, in training which can encompass semantically meaningful structures. These prove more effective in extending decision boundaries and enhancing robustness compared to nearest adversarial examples.

Methodology

The authors demonstrate their approach using prominent network architectures and datasets including LeNet on MNIST and GoogLeNet and ResidualNet on the ImageNet dataset. Key aspects include:

Fast Gradient Strategies: Building upon the fast gradient sign (FGS) method, they develop a fast gradient value (FGV) approach that considers magnitude variations of gradient per pixel, allowing significantly different adversarial directions.
Hot/Cold Approach: This method involves targeting specific network layers, producing feature derivatives that guide perturbations, enabling targeted class adjustments in penultimate layers.

Numerical Results

In their experiments, the authors highlight:

The diverse adversarial generation approach achieves better accuracy and robustness against adversarial examples. Their fine-tuning method, when applied on LeNet using a mixture of adversarial and hard positive images, demonstrates a reduction in error rate compared to methods relying on adversarial training alone.
The implementation of their hot/cold strategy on ImageNet showed a reduction in top-1 and top-5 error rates on GoogLeNet, proving more efficient per image enhancement compared to multiple center crops.

Implications and Future Directions

The findings carry significant implications for the training of more resilient models:

Practical Enhancements: Employing a diverse set of adversarial and hard positive examples for training enhances the generalization capabilities of networks, reducing misclassification on perturbed inputs.
Refinement of Robustness Metrics: The proposed PASS measure could become a new standard for evaluating the perceptibility of adversarial attacks, guiding both adversarial defenses and evaluations.

Future work could explore the application of these methods to emerging architectures like transformer-based models in domains beyond image classification, such as natural language processing or reinforcement learning systems. Expanding on their theoretical underpinnings and integrating advanced augmentation into broader AI systems would further solidify their utility in robust AI applications.