Cross-Entropy Loss and Low-Rank Features Have Responsibility for Adversarial Examples (1901.08360v1)

Published 24 Jan 2019 in cs.LG and stat.ML

Abstract: State-of-the-art neural networks are vulnerable to adversarial examples; they can easily misclassify inputs that are imperceptibly different than their training and test data. In this work, we establish that the use of cross-entropy loss function and the low-rank features of the training data have responsibility for the existence of these inputs. Based on this observation, we suggest that addressing adversarial examples requires rethinking the use of cross-entropy loss function and looking for an alternative that is more suited for minimization with low-rank features. In this direction, we present a training scheme called differential training, which uses a loss function defined on the differences between the features of points from opposite classes. We show that differential training can ensure a large margin between the decision boundary of the neural network and the points in the training dataset. This larger margin increases the amount of perturbation needed to flip the prediction of the classifier and makes it harder to find an adversarial example with small perturbations. We test differential training on a binary classification task with CIFAR-10 dataset and demonstrate that it radically reduces the ratio of images for which an adversarial example could be found -- not only in the training dataset, but in the test dataset as well.

Citations (52)

View on Semantic Scholar

Summary

The paper demonstrates that standard cross-entropy loss on low-rank data results in decision boundaries with significantly narrow margins.
Theoretical and empirical analyses reveal that neural network penultimate layers tend to be low-rank when trained with gradient descent, leading to adversarial vulnerability.
Differential training, which optimizes loss on feature differences, is shown to enhance adversarial robustness without compromising test accuracy.

This paper, "Cross-Entropy Loss and Low-Rank Features Have Responsibility for Adversarial Examples" (1901.08360), investigates the underlying causes for the vulnerability of deep neural networks to adversarial examples, which are imperceptible perturbations to inputs that cause misclassification. The authors propose that the standard practice of minimizing cross-entropy loss with gradient-based methods, combined with the tendency of neural networks to learn low-rank features in their penultimate layers, leads to decision boundaries with poor margins around the training data. This small margin makes the network susceptible to small perturbations.

The paper presents several key theoretical and empirical findings:

Poor Margins with Cross-Entropy on Low-Rank Data: The paper theoretically demonstrates that if a linear classifier is trained using cross-entropy loss and gradient descent on training data that lies on a low-dimensional affine subspace, the resulting decision boundary can have a significantly smaller margin compared to the optimal hard-margin SVM solution. This suggests that even for linearly separable data, cross-entropy minimization doesn't guarantee a large margin if the data has certain structural properties like low rank.
Low-Rank Features in Neural Networks: The authors provide a theoretical proposition suggesting that the outputs of the penultimate layer in a neural network trained with gradient descent and cross-entropy loss tend to be low-rank, particularly when the final layer is linear and initialized to zero. Empirical evidence on CIFAR-10 confirms this tendency, showing that even with nonlinear activations and different optimizers (Adam, momentum, with/without batch normalization), the features remain much lower rank than the feature space dimension.
Connection to Adversarial Examples: The combination of poor margins caused by cross-entropy on low-rank features implies that small perturbations in the penultimate layer's feature space can easily push a data point across the decision boundary, leading to misclassification. This translates to small perturbations in the input space if the mapping from input to features has a bounded Lipschitz constant.

To address this issue and improve the margin, the paper proposes a training scheme called Differential Training. This approach modifies the loss function to operate on the differences between features of points from opposite classes, rather than on individual data points.

For Linear Classifiers: A loss function based on the sigmoid of the difference between feature vectors from different classes, $\sum_{i \in I} \sum_{j \in J} \log(1 + e^{-w^\top ( x_i - y_j)} )$ , is proposed. The paper proves that minimizing this loss using gradient descent for linearly separable data converges to the maximum-margin hyperplane direction, similar to hard-margin SVM. The bias term needs to be set separately based on the resulting weight vector.
For Nonlinear Classifiers (Neural Networks): Using the log-sigmoid loss on feature differences (Equation \ref{eqn:nonlinear-differential}) was found to improve margin in the feature space but not necessarily in the input space. A squared loss function on the difference between network outputs for pairs from opposite classes, $\sum\nolimits_{i\in I} \sum\nolimits_{j \in J} \left( w^\top \phi_\theta(x_i) - w^\top \phi_\theta(y_j) - 1 \right)^2$ , is proposed and empirically shown to yield larger margins in the input space. The intuition is that squared error loss might encourage smaller Lipschitz constants for the feature mapping, translating feature space margin into input space margin.

Practical Implementation and Results:

Training Cost: The differential training loss involves sums over pairs of points from opposite classes ( $|I| \times |J|$ terms), which can be significantly larger than the number of individual data points ( $|I| + |J|$ ). The paper suggests that in practice, especially for well-separated data, only a subset of pairs (potentially those near the current decision boundary) might be needed for stochastic gradient descent to achieve good results, mitigating the increased computational cost.
Experiments on CIFAR-10: The authors trained a convolutional neural network on a binary classification task (planes vs. horses) from CIFAR-10 using both standard cross-entropy and differential training (with the squared loss on feature differences). Both methods achieved similar standard test accuracy.
Robustness: When attacked with Projected Gradient Descent (PGD) and Carlini-Wagner (C&W) attacks, the network trained with differential training demonstrated significantly higher accuracy on adversarial examples generated from both the training and test datasets compared to the cross-entropy-trained network. This indicates improved robustness.
Generalization of Robustness: A notable finding is that differential training maintained similar robustness levels on adversarial examples generated from the training set and the test set (Figure \ref{fig:pgd-attack}). This contrasts with some robust optimization methods that have shown a drop in robustness on unseen data. The authors suggest this aligns with findings on Siamese networks, which also use paired data and perform well with limited data.

Implementation Considerations:

The core implementation difference lies in the loss function and the data sampling strategy during training. Instead of feeding mini-batches of individual images and their labels, mini-batches would consist of pairs of images from opposite classes.
For the squared loss (Equation \ref{squared-loss}), the loss is computed as the squared difference between the final network output scores for a positive example and a negative example, encouraging the score difference to be at least 1.
Determining the optimal bias term $b$ for nonlinear classifiers after training is not as straightforward as the linear case (Equation \ref{bias_choice}) and might require calibration on a validation set or using a soft-margin approach implicitly through the loss function.
The paper mentions the potential need for heuristics to manage the large number of pairs, such as focusing on pairs that are currently misclassified or have small margin.

In summary, the paper provides evidence that standard cross-entropy training, particularly on data leading to low-rank features, results in poor margins and vulnerability to adversarial attacks. Differential training, by minimizing a loss on feature differences, offers a promising alternative that improves margin and significantly enhances adversarial robustness and its generalization to unseen data, albeit potentially at the cost of increased computational complexity due to pair sampling.

PDF Markdown

Cross-Entropy Loss and Low-Rank Features Have Responsibility for Adversarial Examples (1901.08360v1)

Summary

Related Papers