modReLU Activation Functions
- modReLU is a modified rectified linear activation that caps outputs at a defined threshold, preventing unbounded activation.
- Its design enforces 1-Lipschitz continuity and confines gradient flow to a safe range, thereby reducing adversarial perturbation amplification.
- Empirical results on MNIST demonstrate that careful tau selection can significantly boost robustness against FGSM, PGD, and CW attacks while preserving accuracy.
The modReLU (modified or capped ReLU) activation function is a variant of the commonly used rectified linear unit (ReLU) designed to enhance adversarial robustness in neural networks. It replaces the unbounded positive output of standard ReLU with an upper cap, limiting each neuron's activation to a specified threshold. This simple architectural modification constrains layerwise perturbation amplification, yielding substantial improvements in robustness to adversarial attacks on small-scale vision tasks, demonstrated quantitatively on MNIST benchmark models (Sooksatra et al., 2024).
1. Formal Definition
modReLU is defined by introducing an explicit upper bound parameter to the canonical ReLU function. For input , the function is:
- For , .
- For , .
- For , .
When , modReLU reduces to standard ReLU. Empirically, have been employed, with larger for illustration.
2. Theoretical Motivation and Properties
Classical ReLU activations, unbounded above, facilitate rapid model training but also enable layerwise amplification of small adversarial perturbations, undermining robustness. By capping output at , modReLU prohibits “blow-up” of internal signals, directly limiting how much a perturbation can propagate or amplify through the network.
Analytically:
- Standard ReLU’s local Lipschitz constant is 1, but unbounded output across layers allows gradient accumulation.
- modReLU enforces both 1-Lipschitz continuity and an output range restriction: , bounding amplification across all layers simultaneously.
A trade-off emerges: selecting too small exacerbates vanishing gradients, especially in deep or wide architectures. must be sufficiently large to preserve trainability, yet sufficiently small to mitigate adversarial growth.
3. Gradients and Backpropagation
modReLU is piecewise-linear with two flat (inactive) regions. The (sub)gradient with respect to is: At or , any subgradient in is valid; commonly, $1$ is taken at and $0$ at by convention. During backpropagation, nonzero gradient propagation occurs strictly within ; outside this region, gradients vanish. This restricts parameter updates when activations saturate at the cap, necessitating careful selection in very deep or wide dense layers.
4. Implementation and Network Integration
modReLU can directly replace standard ReLU layers in existing architectures. Practical integration involves:
- Selecting specific network layers for capping, often favoring bottleneck or early layers.
- Simple Python or TensorFlow/Keras implementations (see table below) facilitate rapid adoption.
| Context | Implementation Example | Usage Example |
|---|---|---|
| Function Definition | tf.minimum(tf.maximum(z, 0.0), tau) |
Replace any ReLU layer with modReLU |
| Gradient Handling | Gradient flows for only | Monitor saturation and vanishing gradients |
| Keras Layer | CappedReLU(tau) |
E.g., insert after Conv2D |
This approach incurs negligible computational overhead (notably for convolutional nets). In dense networks, fine tuning is required to avoid excessive saturation or vanishing gradients, especially as network depth increases.
5. Experimental Protocol
Empirical analysis centered on MNIST digit classification was conducted under the following setup:
- Models: Three-layer dense nets (e.g., units) with various capping patterns; two-layer variants and architectures with different widths/ordering.
- Optimizer: Adam, learning rate , 20 epochs on unperturbed data.
- Adversarial Attacks: FGSM (, ), PGD (, , step $0.01$, $10$ iterations), and CW (, max iterations, lr $0.01$, ).
- Metrics:
- Test accuracy (clean and adversarially attacked)
- Attack success rates
- Layerwise hidden perturbation growth ()
- Zero-gradient distance (distance from to nearest point where vanishes under PGD)
- Sensitivity map sum (summed maximal class gradient differences per input pixel).
6. Empirical Results
Quantitative findings for two-hidden-layer MNIST classifiers, with modReLU applied to the bottleneck layer, are summarized below:
| Adv. Train | Clean (%) | FGSM (%) | PGD (%) | CW () (%) | |
|---|---|---|---|---|---|
| ReLU (∞) | none | 98.49 | 41.77 | 9.47 | 0.00 |
| 1 | none | 98.46 | 41.24 | 7.45 | 0.00 |
| 0.1 | none | 98.06 | 68.04 | 39.79 | 5.56 |
| 0.01 | none | 97.88 | 92.37 | 89.61 | 8.07 |
| ReLU | FGSM | 98.26 | 91.44 | 85.12 | 0.19 |
| 1 | FGSM | 98.35 | 92.46 | 81.88 | 0.18 |
| 0.1 | FGSM | 98.18 | 93.00 | 90.37 | 3.50 |
| 0.01 | FGSM | 97.10 | 94.07 | 96.36 | 8.21 |
| ReLU | PGD | 98.67 | 91.85 | 86.74 | 0.10 |
| 1 | PGD | 98.49 | 93.32 | 87.09 | 0.11 |
| 0.1 | PGD | 98.09 | 92.64 | 92.85 | 3.62 |
| 0.01 | PGD | 96.55 | 89.21 | 95.43 | 8.00 |
Additional observations:
- Capping only bottleneck layers maximizes robust accuracy.
- For small , hidden-layer perturbation growth under adversarial input is substantially attenuated.
- Decreasing from proportionally reduces sensitivity-map sums, trending from 26 to 0.
- With PGD adversarial training, modReLU at achieves 95% PGD-robust accuracy versus 86% for standard ReLU with PGD.
7. Implications, Trade-offs, and Best Practices
modReLU delivers markedly improved robustness, especially in combination with adversarial training. For PGD and FGSM attacks, robust accuracy gains are dramatic with minimal reduction in standard accuracy. CW () attack robustness improves as well, but to a lesser extent.
Key trade-offs:
- Negligible computational overhead in convolutional layers.
- Too small () induces vanishing gradients and potential underfitting, particularly in wide or deep MLPs.
- Capping only certain layers (e.g., bottlenecks) balances accuracy and robustness.
- FGSM adversarial training, when combined with modReLU, suffices to reach PGD-level robustness, streamlining retraining.
Recommended practice:
- Sweep values in to optimize accuracy–robustness trade-off.
- Prefer capping in convolutional feature maps or bottleneck layers; exercise caution with fully connected nets.
- Potential research avenues include adaptive or per-channel , selective capping at input/output, scalability to larger datasets and architectures, and formal investigation of global Lipschitz constraints imposed by modReLU (Sooksatra et al., 2024).
modReLU constitutes a simple, low-overhead architectural measure for enhancing adversarial robustness, readily integrable into standard deep learning workflows and empirically validated across a range of model variants and adversarial threat models.