Randomized Directional Smoothing Techniques

Updated 24 April 2026

Randomized directional smoothing is a set of methods that smooth non-smooth functions by averaging directional perturbations.
It underpins techniques in optimization, control, and model security by enabling effective gradient estimation and stabilization.
Practical variants include orthogonal direction sampling for zeroth-order optimization, anisotropic Gaussian smoothing for faster convergence, and directional embedding smoothing for robust vision-language models.

Randomized directional smoothing encompasses a family of techniques in optimization, control, robustness certification, and model security, exploiting directional randomization and averaging to smooth non-smooth functions, stabilize optimization, or increase model robustness. These methods generalize classical randomized smoothing by replacing isotropic perturbations with structured, often directionally-aligned, noise. The canonical instantiations include randomized zeroth-order optimization with orthogonal directions, directional embedding smoothing for model defense, anisotropic Gaussian smoothing for accelerated convergence, and parameter-space smoothing for certifiable robustness. These techniques share the principle of Monte Carlo approximation of non-local quantities using random perturbations along selected directions or distributional axes.

1. Mathematical Foundations of Randomized Directional Smoothing

Randomized directional smoothing replaces a potentially non-smooth or brittle function $f:\mathbb{R}^d\to\mathbb{R}$ by a locally averaged (smoothed) version along chosen directions. The canonical scalar-valued smoothing takes the form

$f_\Sigma(x) = \mathbb{E}_{u \sim \mathcal{N}(0, \Sigma)}[f(x + u)]$

where $\Sigma$ is a symmetric, positive definite covariance matrix that determines the smoothing geometry. Isotropic smoothing takes $\Sigma = \sigma^2 I$ ; directional or anisotropic smoothing selects $\Sigma$ with nonuniform eigenvalues or aligns noise with specific directions.

In the context of zeroth-order gradient estimation, one typically draws random directions $\{u_i\}_{i=1}^m$ forming an orthonormal basis and constructs a Monte Carlo gradient approximation:

$g_h(x) = \frac{1}{h}\sum_{i=1}^{m} [f(x + h u_i) - f(x)] u_i$

with $h > 0$ the smoothing radius. For the general anisotropic setting, the gradient of the smoothed function can be written as

$\nabla f_\Sigma(x) = \Sigma^{-1} \mathbb{E}_{u \sim \mathcal{N}(0,\Sigma)}[u f(x+u)]$

as established in anisotropic smoothing frameworks (Starnes et al., 2024).

Directional embedding smoothing for vision-LLMs perturbs token embeddings $e \in \mathbb{R}^d$ by adding noise of the form

$f_\Sigma(x) = \mathbb{E}_{u \sim \mathcal{N}(0, \Sigma)}[f(x + u)]$ 0

which ensures noise is injected strictly along the embedding direction (Wang et al., 16 Mar 2026).

2. Algorithmic Instantiations and Implementations

Several algorithmic schemes operationalize randomized directional smoothing, depending on the domain and goal:

Zeroth-order Optimization with Orthogonal Directions: At each iteration, draw $f_\Sigma(x) = \mathbb{E}_{u \sim \mathcal{N}(0, \Sigma)}[f(x + u)]$ 1 random orthonormal vectors $f_\Sigma(x) = \mathbb{E}_{u \sim \mathcal{N}(0, \Sigma)}[f(x + u)]$ 2 and compute finite-difference estimates in these directions. The estimate is lifted to the full space by a linear map. This underpins the Randomized Directional Smoothing (RDS) and spherical smoothing for optimization (Kozak et al., 2021).
Anisotropic Gaussian Smoothing (AGS): Build the smoothing covariance $f_\Sigma(x) = \mathbb{E}_{u \sim \mathcal{N}(0, \Sigma)}[f(x + u)]$ 3 adaptive to curvature or recent gradients. Monte Carlo estimators of $f_\Sigma(x) = \mathbb{E}_{u \sim \mathcal{N}(0, \Sigma)}[f(x + u)]$ 4 are used in AGS-GD, AGS-SGD, and AGS-Adam algorithms. Adaptation strategies leverage local Hessian or gradient moments to orient smoothing along high-curvature directions (Starnes et al., 2024).
Directional Embedding Smoothing in RESTA: For robust vision-LLMs, perturb each token embedding directionally as above, generate $f_\Sigma(x) = \mathbb{E}_{u \sim \mathcal{N}(0, \Sigma)}[f(x + u)]$ 5 noisy embedding sequences, autoregressively decode in parallel, and select next tokens by majority vote. The smoothing relies on injection of Gaussian noise strictly aligned with each token vector (Wang et al., 16 Mar 2026).
Randomized Smoothing for Control and Certification: In optimal control of nonsmooth systems, perturbed states are averaged to define smooth surrogate dynamics amenable to gradient-based methods. In image certification, transformation parameters are smoothed with Gaussian noise, and error bounds are established in parameter space for robust prediction (Lidec et al., 2022, Fischer et al., 2020).

The following table summarizes core algorithmic settings:

Application Domain	Smoothing Directionality	Reference
Zeroth-order Optimization	Random orthonormal basis	(Kozak et al., 2021)
Anisotropic Gradient-based Opt.	Adaptive covariance (Σ)	(Starnes et al., 2024)
Vision-LLM Defense	Embedding vector direction	(Wang et al., 16 Mar 2026)
Control/Certification	State or parameter space	(Lidec et al., 2022, Fischer et al., 2020)

3. Theoretical Properties and Convergence Analyses

Randomized directional smoothing introduces bias and variance trade-offs that are quantifiable both for estimation accuracy and optimization convergence:

For RDS in optimization, the estimator bias is $f_\Sigma(x) = \mathbb{E}_{u \sim \mathcal{N}(0, \Sigma)}[f(x + u)]$ 6 and variance is $f_\Sigma(x) = \mathbb{E}_{u \sim \mathcal{N}(0, \Sigma)}[f(x + u)]$ 7. Convergence rates for convex $f_\Sigma(x) = \mathbb{E}_{u \sim \mathcal{N}(0, \Sigma)}[f(x + u)]$ 8-smooth objectives under appropriate step-size and smoothing radius yield $f_\Sigma(x) = \mathbb{E}_{u \sim \mathcal{N}(0, \Sigma)}[f(x + u)]$ 9, with linear convergence to a neighborhood under the Polyak-Łojasiewicz (PL) condition (Kozak et al., 2021).
In AGS methods, rates in both convex and nonconvex settings obtain, with convergence balls' radii dictated by $\Sigma$ 0, and step-sizes. Anisotropic smoothing recovers classical isotropic bounds in the special case $\Sigma$ 1 but often leads to improved empirical convergence, particularly in ill-conditioned or ridge-like landscapes (Starnes et al., 2024).
For RESTA with directional embedding smoothing, no formal robustness certificates are proved for vision-LLMs; effectiveness is motivated by the disruption of local adversarial paths in embedding space (Wang et al., 16 Mar 2026).
In randomized smoothing-based certification for parameterized transformations, explicit certification radii in parameter space are derived, guaranteeing robustness to geometric perturbations up to a provable threshold (Fischer et al., 2020).

4. Practical Considerations and Hyperparameter Selection

Effective application of randomized directional smoothing relies on careful tuning of parameters:

Number of Random Directions / Samples ( $\Sigma$ 2): $\Sigma$ 3 balances estimator variance and per-iteration cost. In high dimensions, $\Sigma$ 4 is common for optimization, while for model defense $\Sigma$ 5 suffices for majority voting (Kozak et al., 2021, Wang et al., 16 Mar 2026).
Noise Scale ( $\Sigma$ 6): The smoothing radius determines bias-variance trade-off. Too small $\Sigma$ 7 increases variance; too large introduces excessive bias. Practical heuristics select $\Sigma$ 8 for optimization, or $\Sigma$ 9 for embeddings in Gemma (Starnes et al., 2024, Wang et al., 16 Mar 2026).
Covariance Design (Anisotropy): For AGS, $\Sigma = \sigma^2 I$ 0 is often built using local Hessian or moving gradient moments to adapt directionality (Starnes et al., 2024).
Algorithmic Stability: Aggressive adaptation of $\Sigma = \sigma^2 I$ 1 can violate smoothness assumptions. Conservative updates and periodic adaptation are recommended (Starnes et al., 2024).
System Deployment: In robust inference (e.g., RESTA), randomized directional smoothing acts as a lightweight, inference-time layer. Composability with other defenses is possible but does not obviate the need for broader system-level security, including red teaming and alignment training (Wang et al., 16 Mar 2026).

5. Empirical Results and Applications

Empirical evidence demonstrates the utility of randomized directional smoothing across domains:

Model Robustness and Security: On the JailBreakV-28K suite, RESTA with directional noise reduced attack success rate on LLaVA-1.5-7B from 50.13% to 25.93% with only a minor utility loss (ScienceQA score drop from 64.07% to 61.42%). Isotropic noise failed to obtain comparable gains (Wang et al., 16 Mar 2026).
Optimization: RDS achieves convergence rates $\Sigma = \sigma^2 I$ 2 for convex and linear up to $\Sigma = \sigma^2 I$ 3 for PL objectives. Empirically, moderate subspace dimension $\Sigma = \sigma^2 I$ 4 provides best trade-off between cost and convergence (Kozak et al., 2021).
Control: In optimal control of systems with non-smooth dynamics (Coulomb friction, impacts), randomized smoothing enables second-order methods (R-DDP) to succeed where classical methods or pure RL either fail or require excessive samples. R-DDP matched state-of-the-art RL with $\Sigma = \sigma^2 I$ 5– $\Sigma = \sigma^2 I$ 6 times fewer samples on contact-rich tasks (Lidec et al., 2022).
Certification: Robustness certificates against geometric transformations were established for smoothing in parameter space, providing high-confidence intervals and individual certifiable radii for image classification (Fischer et al., 2020).

6. Variants, Special Cases, and Extensions

Randomized directional smoothing admits several reductions and related methodologies:

Spherical Smoothing: $\Sigma = \sigma^2 I$ 7 recovers classical two-point estimation along a single random direction (Kozak et al., 2021).
Coordinate Descent: Employing coordinate axes as directions recovers randomized coordinate descent.
Anisotropic vs. Isotropic Smoothing: Taking $\Sigma = \sigma^2 I$ 8 yields isotropic smoothing (classical randomized smoothing as in Cohen et al.), while general $\Sigma = \sigma^2 I$ 9 enables highly directional or adaptive smoothing (Starnes et al., 2024).
Embedding Smoothing vs. Parameter Space Smoothing: In vision-LLMs, smoothing is performed in the embedding space; in geometric robustness it is performed in transformation parameter space (Wang et al., 16 Mar 2026, Fischer et al., 2020).

A plausible implication is that further adaptation of the covariance or smoothing geometry—guided by application-specific priors or learned structural information—may further enhance the security-utility or convergence trade-off.

7. Limitations and Future Directions

While randomized directional smoothing significantly advances optimization efficiency, robustness, and model security, several limitations and open questions persist:

Certification Gaps: While formal certificates exist in parameter-space smoothing, defenses like RESTA-directional remain heuristic, lacking formal robustness or failure bounds in adversarial settings (Wang et al., 16 Mar 2026).
Adversarial Adaptivity: Defensive smoothing may only temporarily mitigate attack success until adaptive adversaries circumvent the smoothed geometry.
Compositional Effects: How randomized smoothing interacts with system-level composition—multiple defense layers, alignment training, and red teaming—remains an area for empirical and theoretical investigation (Wang et al., 16 Mar 2026).

Future work is expected to close the certification gaps for adversarial robustness, develop more sophisticated mechanisms for adaptive smoothing geometry, and establish compositional guarantees in multi-layer robust learning systems.

Markdown Report Issue Upgrade to Chat

References (5)

Anisotropic Gaussian Smoothing for Gradient-based Optimization (2024)

Directional Embedding Smoothing for Robust Vision Language Models (2026)

Zeroth order optimization with orthogonal random directions (2021)

Leveraging Randomized Smoothing for Optimal Control of Nonsmooth Dynamical Systems (2022)

Certified Defense to Image Transformations via Randomized Smoothing (2020)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Randomized Directional Smoothing.

Randomized Directional Smoothing Techniques

1. Mathematical Foundations of Randomized Directional Smoothing

2. Algorithmic Instantiations and Implementations

3. Theoretical Properties and Convergence Analyses

4. Practical Considerations and Hyperparameter Selection

5. Empirical Results and Applications

6. Variants, Special Cases, and Extensions

7. Limitations and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Randomized Directional Smoothing Techniques

1. Mathematical Foundations of Randomized Directional Smoothing

2. Algorithmic Instantiations and Implementations

3. Theoretical Properties and Convergence Analyses

4. Practical Considerations and Hyperparameter Selection

5. Empirical Results and Applications

6. Variants, Special Cases, and Extensions

7. Limitations and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research