Papers
Topics
Authors
Recent
Search
2000 character limit reached

Randomized Directional Smoothing Techniques

Updated 24 April 2026
  • Randomized directional smoothing is a set of methods that smooth non-smooth functions by averaging directional perturbations.
  • It underpins techniques in optimization, control, and model security by enabling effective gradient estimation and stabilization.
  • Practical variants include orthogonal direction sampling for zeroth-order optimization, anisotropic Gaussian smoothing for faster convergence, and directional embedding smoothing for robust vision-language models.

Randomized directional smoothing encompasses a family of techniques in optimization, control, robustness certification, and model security, exploiting directional randomization and averaging to smooth non-smooth functions, stabilize optimization, or increase model robustness. These methods generalize classical randomized smoothing by replacing isotropic perturbations with structured, often directionally-aligned, noise. The canonical instantiations include randomized zeroth-order optimization with orthogonal directions, directional embedding smoothing for model defense, anisotropic Gaussian smoothing for accelerated convergence, and parameter-space smoothing for certifiable robustness. These techniques share the principle of Monte Carlo approximation of non-local quantities using random perturbations along selected directions or distributional axes.

1. Mathematical Foundations of Randomized Directional Smoothing

Randomized directional smoothing replaces a potentially non-smooth or brittle function f:RdRf:\mathbb{R}^d\to\mathbb{R} by a locally averaged (smoothed) version along chosen directions. The canonical scalar-valued smoothing takes the form

fΣ(x)=EuN(0,Σ)[f(x+u)]f_\Sigma(x) = \mathbb{E}_{u \sim \mathcal{N}(0, \Sigma)}[f(x + u)]

where Σ\Sigma is a symmetric, positive definite covariance matrix that determines the smoothing geometry. Isotropic smoothing takes Σ=σ2I\Sigma = \sigma^2 I; directional or anisotropic smoothing selects Σ\Sigma with nonuniform eigenvalues or aligns noise with specific directions.

In the context of zeroth-order gradient estimation, one typically draws random directions {ui}i=1m\{u_i\}_{i=1}^m forming an orthonormal basis and constructs a Monte Carlo gradient approximation:

gh(x)=1hi=1m[f(x+hui)f(x)]uig_h(x) = \frac{1}{h}\sum_{i=1}^{m} [f(x + h u_i) - f(x)] u_i

with h>0h > 0 the smoothing radius. For the general anisotropic setting, the gradient of the smoothed function can be written as

fΣ(x)=Σ1EuN(0,Σ)[uf(x+u)]\nabla f_\Sigma(x) = \Sigma^{-1} \mathbb{E}_{u \sim \mathcal{N}(0,\Sigma)}[u f(x+u)]

as established in anisotropic smoothing frameworks (Starnes et al., 2024).

Directional embedding smoothing for vision-LLMs perturbs token embeddings eRde \in \mathbb{R}^d by adding noise of the form

fΣ(x)=EuN(0,Σ)[f(x+u)]f_\Sigma(x) = \mathbb{E}_{u \sim \mathcal{N}(0, \Sigma)}[f(x + u)]0

which ensures noise is injected strictly along the embedding direction (Wang et al., 16 Mar 2026).

2. Algorithmic Instantiations and Implementations

Several algorithmic schemes operationalize randomized directional smoothing, depending on the domain and goal:

  • Zeroth-order Optimization with Orthogonal Directions: At each iteration, draw fΣ(x)=EuN(0,Σ)[f(x+u)]f_\Sigma(x) = \mathbb{E}_{u \sim \mathcal{N}(0, \Sigma)}[f(x + u)]1 random orthonormal vectors fΣ(x)=EuN(0,Σ)[f(x+u)]f_\Sigma(x) = \mathbb{E}_{u \sim \mathcal{N}(0, \Sigma)}[f(x + u)]2 and compute finite-difference estimates in these directions. The estimate is lifted to the full space by a linear map. This underpins the Randomized Directional Smoothing (RDS) and spherical smoothing for optimization (Kozak et al., 2021).
  • Anisotropic Gaussian Smoothing (AGS): Build the smoothing covariance fΣ(x)=EuN(0,Σ)[f(x+u)]f_\Sigma(x) = \mathbb{E}_{u \sim \mathcal{N}(0, \Sigma)}[f(x + u)]3 adaptive to curvature or recent gradients. Monte Carlo estimators of fΣ(x)=EuN(0,Σ)[f(x+u)]f_\Sigma(x) = \mathbb{E}_{u \sim \mathcal{N}(0, \Sigma)}[f(x + u)]4 are used in AGS-GD, AGS-SGD, and AGS-Adam algorithms. Adaptation strategies leverage local Hessian or gradient moments to orient smoothing along high-curvature directions (Starnes et al., 2024).
  • Directional Embedding Smoothing in RESTA: For robust vision-LLMs, perturb each token embedding directionally as above, generate fΣ(x)=EuN(0,Σ)[f(x+u)]f_\Sigma(x) = \mathbb{E}_{u \sim \mathcal{N}(0, \Sigma)}[f(x + u)]5 noisy embedding sequences, autoregressively decode in parallel, and select next tokens by majority vote. The smoothing relies on injection of Gaussian noise strictly aligned with each token vector (Wang et al., 16 Mar 2026).
  • Randomized Smoothing for Control and Certification: In optimal control of nonsmooth systems, perturbed states are averaged to define smooth surrogate dynamics amenable to gradient-based methods. In image certification, transformation parameters are smoothed with Gaussian noise, and error bounds are established in parameter space for robust prediction (Lidec et al., 2022, Fischer et al., 2020).

The following table summarizes core algorithmic settings:

Application Domain Smoothing Directionality Reference
Zeroth-order Optimization Random orthonormal basis (Kozak et al., 2021)
Anisotropic Gradient-based Opt. Adaptive covariance (Σ) (Starnes et al., 2024)
Vision-LLM Defense Embedding vector direction (Wang et al., 16 Mar 2026)
Control/Certification State or parameter space (Lidec et al., 2022, Fischer et al., 2020)

3. Theoretical Properties and Convergence Analyses

Randomized directional smoothing introduces bias and variance trade-offs that are quantifiable both for estimation accuracy and optimization convergence:

  • For RDS in optimization, the estimator bias is fΣ(x)=EuN(0,Σ)[f(x+u)]f_\Sigma(x) = \mathbb{E}_{u \sim \mathcal{N}(0, \Sigma)}[f(x + u)]6 and variance is fΣ(x)=EuN(0,Σ)[f(x+u)]f_\Sigma(x) = \mathbb{E}_{u \sim \mathcal{N}(0, \Sigma)}[f(x + u)]7. Convergence rates for convex fΣ(x)=EuN(0,Σ)[f(x+u)]f_\Sigma(x) = \mathbb{E}_{u \sim \mathcal{N}(0, \Sigma)}[f(x + u)]8-smooth objectives under appropriate step-size and smoothing radius yield fΣ(x)=EuN(0,Σ)[f(x+u)]f_\Sigma(x) = \mathbb{E}_{u \sim \mathcal{N}(0, \Sigma)}[f(x + u)]9, with linear convergence to a neighborhood under the Polyak-Łojasiewicz (PL) condition (Kozak et al., 2021).
  • In AGS methods, rates in both convex and nonconvex settings obtain, with convergence balls' radii dictated by Σ\Sigma0, and step-sizes. Anisotropic smoothing recovers classical isotropic bounds in the special case Σ\Sigma1 but often leads to improved empirical convergence, particularly in ill-conditioned or ridge-like landscapes (Starnes et al., 2024).
  • For RESTA with directional embedding smoothing, no formal robustness certificates are proved for vision-LLMs; effectiveness is motivated by the disruption of local adversarial paths in embedding space (Wang et al., 16 Mar 2026).
  • In randomized smoothing-based certification for parameterized transformations, explicit certification radii in parameter space are derived, guaranteeing robustness to geometric perturbations up to a provable threshold (Fischer et al., 2020).

4. Practical Considerations and Hyperparameter Selection

Effective application of randomized directional smoothing relies on careful tuning of parameters:

  • Number of Random Directions / Samples (Σ\Sigma2): Σ\Sigma3 balances estimator variance and per-iteration cost. In high dimensions, Σ\Sigma4 is common for optimization, while for model defense Σ\Sigma5 suffices for majority voting (Kozak et al., 2021, Wang et al., 16 Mar 2026).
  • Noise Scale (Σ\Sigma6): The smoothing radius determines bias-variance trade-off. Too small Σ\Sigma7 increases variance; too large introduces excessive bias. Practical heuristics select Σ\Sigma8 for optimization, or Σ\Sigma9 for embeddings in Gemma (Starnes et al., 2024, Wang et al., 16 Mar 2026).
  • Covariance Design (Anisotropy): For AGS, Σ=σ2I\Sigma = \sigma^2 I0 is often built using local Hessian or moving gradient moments to adapt directionality (Starnes et al., 2024).
  • Algorithmic Stability: Aggressive adaptation of Σ=σ2I\Sigma = \sigma^2 I1 can violate smoothness assumptions. Conservative updates and periodic adaptation are recommended (Starnes et al., 2024).
  • System Deployment: In robust inference (e.g., RESTA), randomized directional smoothing acts as a lightweight, inference-time layer. Composability with other defenses is possible but does not obviate the need for broader system-level security, including red teaming and alignment training (Wang et al., 16 Mar 2026).

5. Empirical Results and Applications

Empirical evidence demonstrates the utility of randomized directional smoothing across domains:

  • Model Robustness and Security: On the JailBreakV-28K suite, RESTA with directional noise reduced attack success rate on LLaVA-1.5-7B from 50.13% to 25.93% with only a minor utility loss (ScienceQA score drop from 64.07% to 61.42%). Isotropic noise failed to obtain comparable gains (Wang et al., 16 Mar 2026).
  • Optimization: RDS achieves convergence rates Σ=σ2I\Sigma = \sigma^2 I2 for convex and linear up to Σ=σ2I\Sigma = \sigma^2 I3 for PL objectives. Empirically, moderate subspace dimension Σ=σ2I\Sigma = \sigma^2 I4 provides best trade-off between cost and convergence (Kozak et al., 2021).
  • Control: In optimal control of systems with non-smooth dynamics (Coulomb friction, impacts), randomized smoothing enables second-order methods (R-DDP) to succeed where classical methods or pure RL either fail or require excessive samples. R-DDP matched state-of-the-art RL with Σ=σ2I\Sigma = \sigma^2 I5–Σ=σ2I\Sigma = \sigma^2 I6 times fewer samples on contact-rich tasks (Lidec et al., 2022).
  • Certification: Robustness certificates against geometric transformations were established for smoothing in parameter space, providing high-confidence intervals and individual certifiable radii for image classification (Fischer et al., 2020).

6. Variants, Special Cases, and Extensions

Randomized directional smoothing admits several reductions and related methodologies:

  • Spherical Smoothing: Σ=σ2I\Sigma = \sigma^2 I7 recovers classical two-point estimation along a single random direction (Kozak et al., 2021).
  • Coordinate Descent: Employing coordinate axes as directions recovers randomized coordinate descent.
  • Anisotropic vs. Isotropic Smoothing: Taking Σ=σ2I\Sigma = \sigma^2 I8 yields isotropic smoothing (classical randomized smoothing as in Cohen et al.), while general Σ=σ2I\Sigma = \sigma^2 I9 enables highly directional or adaptive smoothing (Starnes et al., 2024).
  • Embedding Smoothing vs. Parameter Space Smoothing: In vision-LLMs, smoothing is performed in the embedding space; in geometric robustness it is performed in transformation parameter space (Wang et al., 16 Mar 2026, Fischer et al., 2020).

A plausible implication is that further adaptation of the covariance or smoothing geometry—guided by application-specific priors or learned structural information—may further enhance the security-utility or convergence trade-off.

7. Limitations and Future Directions

While randomized directional smoothing significantly advances optimization efficiency, robustness, and model security, several limitations and open questions persist:

  • Certification Gaps: While formal certificates exist in parameter-space smoothing, defenses like RESTA-directional remain heuristic, lacking formal robustness or failure bounds in adversarial settings (Wang et al., 16 Mar 2026).
  • Adversarial Adaptivity: Defensive smoothing may only temporarily mitigate attack success until adaptive adversaries circumvent the smoothed geometry.
  • Compositional Effects: How randomized smoothing interacts with system-level composition—multiple defense layers, alignment training, and red teaming—remains an area for empirical and theoretical investigation (Wang et al., 16 Mar 2026).

Future work is expected to close the certification gaps for adversarial robustness, develop more sophisticated mechanisms for adaptive smoothing geometry, and establish compositional guarantees in multi-layer robust learning systems.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Randomized Directional Smoothing.