Adversarial Examples in Random CNNs
- The paper demonstrates that adversarial examples can be generated in random CNNs within an ℓ₂-distance proportional to ‖x‖₂/√d, matching information-theoretic limits.
- It employs isoperimetric inequalities, Fourier-spectral analysis, and Gaussian process arguments to rigorously characterize the models' susceptibility.
- The findings reveal that inherent high-dimensional geometric properties, not training specifics, fundamentally limit the robustness of convolutional architectures.
Adversarial examples in random convolutional neural networks (CNNs) are small input perturbations—often imperceptible in the -norm—that can abruptly flip the output of a randomly initialized CNN with high probability. Recent theoretical advances have established that such vulnerabilities are not a byproduct of training or specific architectural choices, but an intrinsic feature of high-dimensional, random convolutional models. The phenomenon is now understood through rigorous analysis involving isoperimetric inequalities, spectral geometry, Fourier representations of convolution operators, and Gaussian process arguments. The resulting robustness thresholds match the information-theoretic limits: for a -dimensional input , adversarial examples can be found within -distance of , which is essentially the minimal possible for any classifier uniformly over the input space.
1. Mathematical Foundations of Adversarial Vulnerability in Random CNNs
The vulnerability of random CNNs to small-norm adversarial examples is established through several distinct, yet conceptually aligned mathematical frameworks:
- Isoperimetric Approach: Using the isoperimetric inequality on the special orthogonal group , one proves that, for a Lipschitz-invariant classifier defined via random convolutions, the decision boundary must, with overwhelming probability, be close (in ) to any given input . Specifically, for random nets constructed from regular/Xavier convolutional layers and standard activations (odd or ReLU, with technical assumptions on network width and depth), there exists with sign-flip and , where is the spectral norm (i.e., operator norm of the matrixized input) (Daniely, 14 Jun 2025).
- Fourier-Spectral Analysis: Random convolutional layers, when represented in the Fourier basis, decompose into (almost) independent low-dimensional blocks. With random Gaussian weights, the spectral norms and minimum singular values of these blocks are tightly controlled, ensuring well-conditioning. This facilitates lower-bounding the input gradient and ensures its robustness within small balls, which directly underpins adversarial construction via gradient-based methods (Daniely et al., 3 Feb 2026).
- Gaussian Process and Covering Number Arguments: For infinite-width limits, the output of a random deep network converges to a Gaussian process indexed by the input vector. Standard entropy integral and concentration arguments (Borell–TIS inequality, Dudley integral) show that, in any norm (), the minimal adversarial perturbation distance is bounded by (Palma et al., 2020, Montanari et al., 2022).
2. Network Architectures, Random Initialization, and Theoretical Assumptions
The analytical results are valid across a range of convolutional architectures with the following features:
- Layer Structure: Multiple convolutional layers of fixed, constant depth ; layers can be either group convolutional (with weight-sharing determined by a finite abelian group ) or conventional spatial convolutions. Minimal depth and width conditions ensure well-posedness of the spectral estimates and covering number arguments (Daniely et al., 3 Feb 2026, Daniely, 14 Jun 2025).
- Activation Functions: Assumptions typically hold for activations (including ReLU, smooth functions, or odd functions for symmetric cases), with non-vanishing average squared derivatives to guarantee meaningful gradients.
- Random Initialization: Weights are drawn i.i.d. from appropriate Gaussian distributions (Xavier, He, or similar scaling), yielding statistically isotropic or SO(d)-invariant random functions.
- Output Mapping: The final network output is scalar-valued, constructed as an inner product of the last-layer activations with an independent random vector.
These conditions are broad enough to encompass most conventional untrained CNNs and, in the limit, theoretical random convolutional function classes (Daniely et al., 3 Feb 2026, Daniely, 14 Jun 2025).
3. Existence, Construction, and Size of Adversarial Examples
All approaches confirm that, with overwhelming probability over random weights, for any input of dimension , there exists a perturbation such that
which flips the sign of the classifier's output. This claim holds for:
- Random CNNs with any reasonable smooth or ReLU activation (Daniely, 14 Jun 2025, Daniely et al., 3 Feb 2026, Montanari et al., 2022).
- Networks of constant depth and sufficiently large width per layer.
- Various -norms, with scaling for all (Palma et al., 2020).
Notably, this scaling is essentially optimal, as no classifier can be more robust in the worst-case scenario.
4. Explicit Construction: Gradient-Based Adversarial Attacks
Explicit adversarial perturbations can be computed through a single gradient-descent or "fast gradient sign" step:
- Gradient Computation: The input-gradient is guaranteed to have norm bounded away from zero—uniformly over most of interest.
- Perturbation Size and Step: Setting , with proportional to , ensures that the sign of at is flipped. The typical step has (Daniely et al., 3 Feb 2026, Montanari et al., 2022).
- Success Probability: With high probability ( for moderate ), a single such step suffices, and the output switches class.
The Gaussian conditioning argument further ensures that the joint distribution of follows a precisely characterized bivariate normal law, leading to nearly certain attack success as increases (Montanari et al., 2022).
5. Geometric and Group-Theoretic Underpinnings
The existence of adversarial examples in random CNNs reflects deep properties of high-dimensional geometry:
- SO(d)-Invariance and Isoperimetry: Random group-convolutional layers confer SO(d)-invariance to the output function on the input orbit, subjecting the network to sharp isoperimetric laws. Concentration of measure implies that for any sizable region (such as the decision region of a binary classifier), nearly all points are within of the boundary—directly yielding adversarial examples (Daniely, 14 Jun 2025).
- Fourier Diagonalization: For abelian group CNNs, the convolution operator diagonalizes block-wise in the Fourier basis, so robustness cannot be achieved by increasing width or by randomizing filter arrangements.
Geometric inseparability thus emerges as the foundational cause, unaffected by training or typical regularization.
6. Empirical and Experimental Findings
Theoretical guarantees are largely validated by empirical studies:
- Experimental Verification: On random and trained networks (e.g., LeNet, ResNet, shallow and deep convolutional nets), -bounded adversarial attacks (FGSM, PGD, Carlini–Wagner) find perturbations of size (Palma et al., 2020).
- Effect of Training: Training on natural data (e.g., MNIST, CIFAR10) does not significantly increase robustness for or attacks; in some settings, the adversarial distance even decreases for out-of-distribution data (Palma et al., 2020).
- Architectural Variants: Modifications such as instance-wise random masking applied at shallow layers (e.g., "Random Mask" CNNs) increase robustness against black-box attacks, albeit at the cost of reduced expressivity or accuracy. Remaining adversarial examples in such architectures can induce perceptible changes in semantics, sometimes fooling humans as well (Luo et al., 2020).
7. Implications for Robustness, Defenses, and Theoretical Limits
Current theoretical evidence dictates several key implications:
- Universality of Vulnerability: Robustness to norm-bounded perturbations cannot be fundamentally improved by architecture alone in generic isotropic CNNs unless depth increases with dimension, weight-sharing is broken, or input preprocessing disrupts SO(d)-invariance (Daniely, 14 Jun 2025, Daniely et al., 3 Feb 2026).
- Limits of Training: Even adversarial training or large-scale pretraining cannot circumvent the high-dimensional isoperimetric barrier for standard data-unaware random initializations.
- Defensive Strategies: Breaking the invariance structure via architectural randomization (masks, spatial permutations), input-dependent pre-processing, or learned anisotropic filters can offer partial gains, but no parameter-free defense can exceed the scaling without leveraging data distribution properties (Luo et al., 2020, Daniely et al., 3 Feb 2026).
- Conceptual Clarity: The formalism suggests a need to refine the definition of adversarial examples, potentially focusing on perceptual metrics or human label invariance rather than strict norm constraints (Luo et al., 2020).
In summary, adversarial examples in random convolutional neural networks are a universal and quantitatively sharp phenomenon dictated by high-dimensional geometry, spectral characteristics of random convolutions, and group-theoretic invariance. The information-theoretic lower bounds on adversarial distance tightly constrain the space of possible defenses and set a baseline for understanding robustness in both theory and practice (Daniely et al., 3 Feb 2026, Daniely, 14 Jun 2025, Palma et al., 2020, Montanari et al., 2022, Luo et al., 2020).