Randomized Directional Smoothing Techniques
- Randomized directional smoothing is a set of methods that smooth non-smooth functions by averaging directional perturbations.
- It underpins techniques in optimization, control, and model security by enabling effective gradient estimation and stabilization.
- Practical variants include orthogonal direction sampling for zeroth-order optimization, anisotropic Gaussian smoothing for faster convergence, and directional embedding smoothing for robust vision-language models.
Randomized directional smoothing encompasses a family of techniques in optimization, control, robustness certification, and model security, exploiting directional randomization and averaging to smooth non-smooth functions, stabilize optimization, or increase model robustness. These methods generalize classical randomized smoothing by replacing isotropic perturbations with structured, often directionally-aligned, noise. The canonical instantiations include randomized zeroth-order optimization with orthogonal directions, directional embedding smoothing for model defense, anisotropic Gaussian smoothing for accelerated convergence, and parameter-space smoothing for certifiable robustness. These techniques share the principle of Monte Carlo approximation of non-local quantities using random perturbations along selected directions or distributional axes.
1. Mathematical Foundations of Randomized Directional Smoothing
Randomized directional smoothing replaces a potentially non-smooth or brittle function by a locally averaged (smoothed) version along chosen directions. The canonical scalar-valued smoothing takes the form
where is a symmetric, positive definite covariance matrix that determines the smoothing geometry. Isotropic smoothing takes ; directional or anisotropic smoothing selects with nonuniform eigenvalues or aligns noise with specific directions.
In the context of zeroth-order gradient estimation, one typically draws random directions forming an orthonormal basis and constructs a Monte Carlo gradient approximation:
with the smoothing radius. For the general anisotropic setting, the gradient of the smoothed function can be written as
as established in anisotropic smoothing frameworks (Starnes et al., 2024).
Directional embedding smoothing for vision-LLMs perturbs token embeddings by adding noise of the form
0
which ensures noise is injected strictly along the embedding direction (Wang et al., 16 Mar 2026).
2. Algorithmic Instantiations and Implementations
Several algorithmic schemes operationalize randomized directional smoothing, depending on the domain and goal:
- Zeroth-order Optimization with Orthogonal Directions: At each iteration, draw 1 random orthonormal vectors 2 and compute finite-difference estimates in these directions. The estimate is lifted to the full space by a linear map. This underpins the Randomized Directional Smoothing (RDS) and spherical smoothing for optimization (Kozak et al., 2021).
- Anisotropic Gaussian Smoothing (AGS): Build the smoothing covariance 3 adaptive to curvature or recent gradients. Monte Carlo estimators of 4 are used in AGS-GD, AGS-SGD, and AGS-Adam algorithms. Adaptation strategies leverage local Hessian or gradient moments to orient smoothing along high-curvature directions (Starnes et al., 2024).
- Directional Embedding Smoothing in RESTA: For robust vision-LLMs, perturb each token embedding directionally as above, generate 5 noisy embedding sequences, autoregressively decode in parallel, and select next tokens by majority vote. The smoothing relies on injection of Gaussian noise strictly aligned with each token vector (Wang et al., 16 Mar 2026).
- Randomized Smoothing for Control and Certification: In optimal control of nonsmooth systems, perturbed states are averaged to define smooth surrogate dynamics amenable to gradient-based methods. In image certification, transformation parameters are smoothed with Gaussian noise, and error bounds are established in parameter space for robust prediction (Lidec et al., 2022, Fischer et al., 2020).
The following table summarizes core algorithmic settings:
| Application Domain | Smoothing Directionality | Reference |
|---|---|---|
| Zeroth-order Optimization | Random orthonormal basis | (Kozak et al., 2021) |
| Anisotropic Gradient-based Opt. | Adaptive covariance (Σ) | (Starnes et al., 2024) |
| Vision-LLM Defense | Embedding vector direction | (Wang et al., 16 Mar 2026) |
| Control/Certification | State or parameter space | (Lidec et al., 2022, Fischer et al., 2020) |
3. Theoretical Properties and Convergence Analyses
Randomized directional smoothing introduces bias and variance trade-offs that are quantifiable both for estimation accuracy and optimization convergence:
- For RDS in optimization, the estimator bias is 6 and variance is 7. Convergence rates for convex 8-smooth objectives under appropriate step-size and smoothing radius yield 9, with linear convergence to a neighborhood under the Polyak-Łojasiewicz (PL) condition (Kozak et al., 2021).
- In AGS methods, rates in both convex and nonconvex settings obtain, with convergence balls' radii dictated by 0, and step-sizes. Anisotropic smoothing recovers classical isotropic bounds in the special case 1 but often leads to improved empirical convergence, particularly in ill-conditioned or ridge-like landscapes (Starnes et al., 2024).
- For RESTA with directional embedding smoothing, no formal robustness certificates are proved for vision-LLMs; effectiveness is motivated by the disruption of local adversarial paths in embedding space (Wang et al., 16 Mar 2026).
- In randomized smoothing-based certification for parameterized transformations, explicit certification radii in parameter space are derived, guaranteeing robustness to geometric perturbations up to a provable threshold (Fischer et al., 2020).
4. Practical Considerations and Hyperparameter Selection
Effective application of randomized directional smoothing relies on careful tuning of parameters:
- Number of Random Directions / Samples (2): 3 balances estimator variance and per-iteration cost. In high dimensions, 4 is common for optimization, while for model defense 5 suffices for majority voting (Kozak et al., 2021, Wang et al., 16 Mar 2026).
- Noise Scale (6): The smoothing radius determines bias-variance trade-off. Too small 7 increases variance; too large introduces excessive bias. Practical heuristics select 8 for optimization, or 9 for embeddings in Gemma (Starnes et al., 2024, Wang et al., 16 Mar 2026).
- Covariance Design (Anisotropy): For AGS, 0 is often built using local Hessian or moving gradient moments to adapt directionality (Starnes et al., 2024).
- Algorithmic Stability: Aggressive adaptation of 1 can violate smoothness assumptions. Conservative updates and periodic adaptation are recommended (Starnes et al., 2024).
- System Deployment: In robust inference (e.g., RESTA), randomized directional smoothing acts as a lightweight, inference-time layer. Composability with other defenses is possible but does not obviate the need for broader system-level security, including red teaming and alignment training (Wang et al., 16 Mar 2026).
5. Empirical Results and Applications
Empirical evidence demonstrates the utility of randomized directional smoothing across domains:
- Model Robustness and Security: On the JailBreakV-28K suite, RESTA with directional noise reduced attack success rate on LLaVA-1.5-7B from 50.13% to 25.93% with only a minor utility loss (ScienceQA score drop from 64.07% to 61.42%). Isotropic noise failed to obtain comparable gains (Wang et al., 16 Mar 2026).
- Optimization: RDS achieves convergence rates 2 for convex and linear up to 3 for PL objectives. Empirically, moderate subspace dimension 4 provides best trade-off between cost and convergence (Kozak et al., 2021).
- Control: In optimal control of systems with non-smooth dynamics (Coulomb friction, impacts), randomized smoothing enables second-order methods (R-DDP) to succeed where classical methods or pure RL either fail or require excessive samples. R-DDP matched state-of-the-art RL with 5–6 times fewer samples on contact-rich tasks (Lidec et al., 2022).
- Certification: Robustness certificates against geometric transformations were established for smoothing in parameter space, providing high-confidence intervals and individual certifiable radii for image classification (Fischer et al., 2020).
6. Variants, Special Cases, and Extensions
Randomized directional smoothing admits several reductions and related methodologies:
- Spherical Smoothing: 7 recovers classical two-point estimation along a single random direction (Kozak et al., 2021).
- Coordinate Descent: Employing coordinate axes as directions recovers randomized coordinate descent.
- Anisotropic vs. Isotropic Smoothing: Taking 8 yields isotropic smoothing (classical randomized smoothing as in Cohen et al.), while general 9 enables highly directional or adaptive smoothing (Starnes et al., 2024).
- Embedding Smoothing vs. Parameter Space Smoothing: In vision-LLMs, smoothing is performed in the embedding space; in geometric robustness it is performed in transformation parameter space (Wang et al., 16 Mar 2026, Fischer et al., 2020).
A plausible implication is that further adaptation of the covariance or smoothing geometry—guided by application-specific priors or learned structural information—may further enhance the security-utility or convergence trade-off.
7. Limitations and Future Directions
While randomized directional smoothing significantly advances optimization efficiency, robustness, and model security, several limitations and open questions persist:
- Certification Gaps: While formal certificates exist in parameter-space smoothing, defenses like RESTA-directional remain heuristic, lacking formal robustness or failure bounds in adversarial settings (Wang et al., 16 Mar 2026).
- Adversarial Adaptivity: Defensive smoothing may only temporarily mitigate attack success until adaptive adversaries circumvent the smoothed geometry.
- Compositional Effects: How randomized smoothing interacts with system-level composition—multiple defense layers, alignment training, and red teaming—remains an area for empirical and theoretical investigation (Wang et al., 16 Mar 2026).
Future work is expected to close the certification gaps for adversarial robustness, develop more sophisticated mechanisms for adaptive smoothing geometry, and establish compositional guarantees in multi-layer robust learning systems.