DISBELIEVE: Distance-Constrained Adversarial Perturbation

Updated 24 April 2026

Distance-Constrained Adversarial Perturbation is a method that maximizes misclassification by optimizing perturbations while keeping them within prescribed norm and geometric bounds.
It integrates techniques like gradient ascent, projection, and proximal operators to ensure attacks remain close to benign references in both federated and input-space contexts.
Empirical evaluations on benchmarks such as CheXpert, CIFAR-10, and MNIST demonstrate that DISBELIEVE effectively degrades model performance while evading detection by robust aggregation defenses.

Distance-Constrained Adversarial Perturbation (DISBELIEVE) encompasses a class of adversarial attack methodologies that maximize misclassification or model degradation objectives under explicit distance constraints. Typical instantiations arise in both federated learning and classical neural network settings, formalized via optimization over client parameter updates or input perturbations subject to norm or geometric bounds. The common thread is an adversary who seeks to evade defense mechanisms by ensuring that the crafted adversarial perturbations (to models or inputs) remain close—according to some prescribed metric—to benign references, while still having maximal negative impact.

1. Mathematical Formulations and Formal Objectives

Central to distance-constrained adversarial perturbation is the adversarial optimization problem:

Federated setting: Given benign update set $\{w_i\}$ and malicious update $w_{\mathrm{adv}}$ , maximize the global classification loss, constrained such that $\|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}}$ ,

$\begin{aligned} &\max_{w_{\text{adv}}} L_{\text{global}}\left(\{w_i\}_{i\in\mathcal B}, w_{\text{adv}}\right) \ &\;\textrm{s.t.}\;\; \|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}} \end{aligned}$

where $P_{\mathrm{dist}}$ is the intra-malicious “spread” (Joshi et al., 2023).

Input-space perturbation: For a classifier $f:X\to\Delta_n$ and base sample $x$ , find $u$ that meets a misclassification criterion $\hat{y}(u)\neq y$ while $m(u;x) \leq \epsilon$ for metric $w_{\mathrm{adv}}$ 0, or minimizes $w_{\mathrm{adv}}$ 1 subject to $w_{\mathrm{adv}}$ 2, where $w_{\mathrm{adv}}$ 3 denotes class logits (Pooladian et al., 2019).

Both paradigms may use auxiliary constraints or surrogate formulations (e.g., log-barrier or projected gradient) that enforce the boundedness of the attack in $w_{\mathrm{adv}}$ 4, total variation, or task-specific geometry.

2. Algorithms and Optimization Schemes

2.1 Model Poisoning in Federated Learning

The DISBELIEVE algorithm (for federated systems) proceeds as follows:

Gather malicious clients' parameters or gradients, compute their empirical mean and (max or min) pairwise intra-cluster squared distance.
Initialize a malicious proxy model or gradient at the intra-malicious mean.
Iteratively update the proxy to maximize classification loss on local malicious client data, using gradient ascent.
Projection step: After each update, if the candidate $w_{\mathrm{adv}}$ 5 deviates beyond the allowed radius from the mean, project it radially back onto the sphere of feasible $w_{\mathrm{adv}}$ 6 distance to maintain $w_{\mathrm{adv}}$ 7.
Terminate when both the constraint and loss plateau. The resulting $w_{\mathrm{adv}}$ 8 is submitted as the aggregated malicious update (Joshi et al., 2023).

2.2 Proximal-Gradient Input Attacks

For input-space attacks (ProxLogBarrier framework):

Reformulate the hard constraint (misclassification at minimal metric distance) as an unconstrained objective with a log-barrier penalizing violation of the class decision constraint.
At each iteration, take a gradient step w.r.t. the log-barrier penalized objective, followed by the corresponding proximal operator for the chosen metric (e.g., $w_{\mathrm{adv}}$ 9, $\|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}}$ 0, $\|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}}$ 1, $\|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}}$ 2, or TV).
After each update, enforce the sample remains misclassified by backtracking along the line segment joining previous and current iterate.
The process iterates with decaying barrier coefficients to increasingly sharpen constraint enforcement (Pooladian et al., 2019).

Closed-form proximal operators for various distances enable efficient projection—even for non-smooth metrics such as $\|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}}$ 3 and TV.

3. Distance Constraints versus Robust Defenses

A critical insight in DISBELIEVE-based attacks is that many robust aggregation rules—KRUM, Trimmed Mean, DOS (COPOD-based outlier scoring)—use explicit or implicit distance metrics to detect poisoning. If the adversarial update is constructed to remain within the intra-client (benign or malicious) spread, it will not be flagged as anomalous. Specifically:

KRUM selects the most “central” update by neighbor distance; a carefully crafted $\|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}}$ 4 that is not outlying can be accepted.
Trimmed Mean removes only coordinatewise outliers; by remaining within coordinate bounds derived from the normal update distribution, $\|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}}$ 5 is not rejected.
DOS assigns outlier risk based on distance from the cluster centroid; staying near the center maximizes aggregation weight for the attack (Joshi et al., 2023).

This principle extends to metric-based input defenses: if adversarial perturbations keep within certified radii, many provable- or heuristic-robust classifiers cannot guarantee protection.

4. Theoretical Guarantees and Provable Robustness

Certain distance-constrained attack strategies admit formal guarantees on approximation quality and robustness:

For a point $\|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}}$ 6 and classifier $\|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}}$ 7, the minimal adversarial perturbation $\|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}}$ 8 (Euclidean or other norm) is bounded within constant factor $\|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}}$ 9 by root-finding solutions along the model gradient under mild smoothness and boundary-regularity assumptions. Concretely,

for $\begin{aligned} &\max_{w_{\text{adv}}} L_{\text{global}}\left(\{w_i\}_{i\in\mathcal B}, w_{\text{adv}}\right) \ &\;\textrm{s.t.}\;\; \|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}} \end{aligned}$ 1 derived from projected line search along $\begin{aligned} &\max_{w_{\text{adv}}} L_{\text{global}}\left(\{w_i\}_{i\in\mathcal B}, w_{\text{adv}}\right) \ &\;\textrm{s.t.}\;\; \|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}} \end{aligned}$ 2 (Brau et al., 2022).

The distance constraint radius $\begin{aligned} &\max_{w_{\text{adv}}} L_{\text{global}}\left(\{w_i\}_{i\in\mathcal B}, w_{\text{adv}}\right) \ &\;\textrm{s.t.}\;\; \|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}} \end{aligned}$ 3 can serve as a certified region: no adversarial example with $\begin{aligned} &\max_{w_{\text{adv}}} L_{\text{global}}\left(\{w_i\}_{i\in\mathcal B}, w_{\text{adv}}\right) \ &\;\textrm{s.t.}\;\; \|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}} \end{aligned}$ 4 can alter classification, providing a practical robustness certificate with quantifiable error (Brau et al., 2022).
In the ProxLogBarrier algorithm, convergence to first-order stationary points for the penalized composite objective is guaranteed under standard properties (Lipschitz gradient, prox-existence), though not global optimality due to nonconvexity (Pooladian et al., 2019).

5. Empirical Performance and Benchmarks

Distance-constrained adversarial perturbations, when deployed as per DISBELIEVE, produce significant degradation in both federated and centralized learning settings:

Federated model poisoning (parameter/gradient-based): On datasets such as CheXpert-Small, HAM10000, BreakHis, and CIFAR-10, DISBELIEVE yields severe drops in global AUC under all evaluated defenses. Example: CheXpert-Small, AUC with DISBELIEVE under DOS drops from 0.71 (no attack) to 0.44, outperforming other attacks such as LIE and Min-Max (Joshi et al., 2023).
Input-space attacks: On MNIST, CIFAR-10, ImageNet, ProxLogBarrier achieves higher attack success rates and smaller perturbations under multiple metrics; e.g., on CIFAR-10 (undefended, $\begin{aligned} &\max_{w_{\text{adv}}} L_{\text{global}}\left(\{w_i\}_{i\in\mathcal B}, w_{\text{adv}}\right) \ &\;\textrm{s.t.}\;\; \|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}} \end{aligned}$ 5), $\begin{aligned} &\max_{w_{\text{adv}}} L_{\text{global}}\left(\{w_i\}_{i\in\mathcal B}, w_{\text{adv}}\right) \ &\;\textrm{s.t.}\;\; \|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}} \end{aligned}$ 6 of images fooled at $\begin{aligned} &\max_{w_{\text{adv}}} L_{\text{global}}\left(\{w_i\}_{i\in\mathcal B}, w_{\text{adv}}\right) \ &\;\textrm{s.t.}\;\; \|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}} \end{aligned}$ 7 pixel changes, median value $\begin{aligned} &\max_{w_{\text{adv}}} L_{\text{global}}\left(\{w_i\}_{i\in\mathcal B}, w_{\text{adv}}\right) \ &\;\textrm{s.t.}\;\; \|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}} \end{aligned}$ 8, outperforming previous $\begin{aligned} &\max_{w_{\text{adv}}} L_{\text{global}}\left(\{w_i\}_{i\in\mathcal B}, w_{\text{adv}}\right) \ &\;\textrm{s.t.}\;\; \|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}} \end{aligned}$ 9-specialized attacks (Pooladian et al., 2019).
Robustness validation: Empirical evaluations confirm that when attacks are restricted to certified radii (distance estimates $P_{\mathrm{dist}}$ 0), observed attacks fail on nearly all tested points within this neighborhood, corroborating theoretical robustness claims (Brau et al., 2022).

6. Connections across Domains and Practical Recommendations

Distance-constrained adversarial perturbation frameworks unify several attack and certification strategies across federated and centralized settings:

The adversarial “budget” paradigm—maximizing loss under metric bounds—arises identically at the model and input levels.
Closed-form proximal operators and geometric reduction underpin efficient and generalizable algorithm designs (applicable for $P_{\mathrm{dist}}$ 1, $P_{\mathrm{dist}}$ 2, TV seminorm, and others).
Distance-based certificate methodologies deliver both practical protection and tight theoretical approximation bounds, especially in the local boundary neighborhood.

Practical recommendations for practitioners include: Choosing hyperparameters (barrier coefficient $P_{\mathrm{dist}}$ 3, decay $P_{\mathrm{dist}}$ 4, gradient step sizes $P_{\mathrm{dist}}$ 5, proximity parameter $P_{\mathrm{dist}}$ 6), initializing with large perturbations for misclassification, and focusing on attack-constrained radii commensurate with empirical and theoretical guarantees (Pooladian et al., 2019).

7. Summary Table: Key Algorithmic Elements

Attack Variant	Distance Metric / Constraint	Optimization Core
DISBELIEVE (federated)	Intra-malicious $P_{\mathrm{dist}}$ 7 bound	Loss-maximization + projection
ProxLogBarrier (input)	Any (closed-form prox: $P_{\mathrm{dist}}$ 8, $P_{\mathrm{dist}}$ 9, TV)	Proximal gradient + log-barrier
Certified perturbation	Euclidean ( $f:X\to\Delta_n$ 0) minimal distance	Gradient-aligned root-finding

These approaches collectively demonstrate that distance-constrained adversarial perturbation constitutes a principled and broadly effective methodology for evading detection-based defenses and for probing the certified robustness of neural systems (Joshi et al., 2023, Pooladian et al., 2019, Brau et al., 2022).

Markdown Report Issue Upgrade to Chat

References (3)

DISBELIEVE: Distance Between Client Models is Very Essential for Effective Local Model Poisoning Attacks (2023)

A principled approach for generating adversarial images under non-smooth dissimilarity metrics (2019)

On the Minimal Adversarial Perturbation for Deep Neural Networks with Provable Estimation Error (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Distance-Constrained Adversarial Perturbation (DISBELIEVE).