Papers
Topics
Authors
Recent
Search
2000 character limit reached

DISBELIEVE: Distance-Constrained Adversarial Perturbation

Updated 24 April 2026
  • Distance-Constrained Adversarial Perturbation is a method that maximizes misclassification by optimizing perturbations while keeping them within prescribed norm and geometric bounds.
  • It integrates techniques like gradient ascent, projection, and proximal operators to ensure attacks remain close to benign references in both federated and input-space contexts.
  • Empirical evaluations on benchmarks such as CheXpert, CIFAR-10, and MNIST demonstrate that DISBELIEVE effectively degrades model performance while evading detection by robust aggregation defenses.

Distance-Constrained Adversarial Perturbation (DISBELIEVE) encompasses a class of adversarial attack methodologies that maximize misclassification or model degradation objectives under explicit distance constraints. Typical instantiations arise in both federated learning and classical neural network settings, formalized via optimization over client parameter updates or input perturbations subject to norm or geometric bounds. The common thread is an adversary who seeks to evade defense mechanisms by ensuring that the crafted adversarial perturbations (to models or inputs) remain close—according to some prescribed metric—to benign references, while still having maximal negative impact.

1. Mathematical Formulations and Formal Objectives

Central to distance-constrained adversarial perturbation is the adversarial optimization problem:

  • Federated setting: Given benign update set {wi}\{w_i\} and malicious update wadvw_{\mathrm{adv}}, maximize the global classification loss, constrained such that wadvμparam22Pdist\|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}},

maxwadvLglobal({wi}iB,wadv)   s.t.    wadvμparam22Pdist\begin{aligned} &\max_{w_{\text{adv}}} L_{\text{global}}\left(\{w_i\}_{i\in\mathcal B}, w_{\text{adv}}\right) \ &\;\textrm{s.t.}\;\; \|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}} \end{aligned}

where PdistP_{\mathrm{dist}} is the intra-malicious “spread” (Joshi et al., 2023).

  • Input-space perturbation: For a classifier f:XΔnf:X\to\Delta_n and base sample xx, find uu that meets a misclassification criterion y^(u)y\hat{y}(u)\neq y while m(u;x)ϵm(u;x) \leq \epsilon for metric wadvw_{\mathrm{adv}}0, or minimizes wadvw_{\mathrm{adv}}1 subject to wadvw_{\mathrm{adv}}2, where wadvw_{\mathrm{adv}}3 denotes class logits (Pooladian et al., 2019).

Both paradigms may use auxiliary constraints or surrogate formulations (e.g., log-barrier or projected gradient) that enforce the boundedness of the attack in wadvw_{\mathrm{adv}}4, total variation, or task-specific geometry.

2. Algorithms and Optimization Schemes

2.1 Model Poisoning in Federated Learning

The DISBELIEVE algorithm (for federated systems) proceeds as follows:

  1. Gather malicious clients' parameters or gradients, compute their empirical mean and (max or min) pairwise intra-cluster squared distance.
  2. Initialize a malicious proxy model or gradient at the intra-malicious mean.
  3. Iteratively update the proxy to maximize classification loss on local malicious client data, using gradient ascent.
  4. Projection step: After each update, if the candidate wadvw_{\mathrm{adv}}5 deviates beyond the allowed radius from the mean, project it radially back onto the sphere of feasible wadvw_{\mathrm{adv}}6 distance to maintain wadvw_{\mathrm{adv}}7.
  5. Terminate when both the constraint and loss plateau. The resulting wadvw_{\mathrm{adv}}8 is submitted as the aggregated malicious update (Joshi et al., 2023).

2.2 Proximal-Gradient Input Attacks

For input-space attacks (ProxLogBarrier framework):

  1. Reformulate the hard constraint (misclassification at minimal metric distance) as an unconstrained objective with a log-barrier penalizing violation of the class decision constraint.
  2. At each iteration, take a gradient step w.r.t. the log-barrier penalized objective, followed by the corresponding proximal operator for the chosen metric (e.g., wadvw_{\mathrm{adv}}9, wadvμparam22Pdist\|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}}0, wadvμparam22Pdist\|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}}1, wadvμparam22Pdist\|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}}2, or TV).
  3. After each update, enforce the sample remains misclassified by backtracking along the line segment joining previous and current iterate.
  4. The process iterates with decaying barrier coefficients to increasingly sharpen constraint enforcement (Pooladian et al., 2019).

Closed-form proximal operators for various distances enable efficient projection—even for non-smooth metrics such as wadvμparam22Pdist\|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}}3 and TV.

3. Distance Constraints versus Robust Defenses

A critical insight in DISBELIEVE-based attacks is that many robust aggregation rules—KRUM, Trimmed Mean, DOS (COPOD-based outlier scoring)—use explicit or implicit distance metrics to detect poisoning. If the adversarial update is constructed to remain within the intra-client (benign or malicious) spread, it will not be flagged as anomalous. Specifically:

  • KRUM selects the most “central” update by neighbor distance; a carefully crafted wadvμparam22Pdist\|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}}4 that is not outlying can be accepted.
  • Trimmed Mean removes only coordinatewise outliers; by remaining within coordinate bounds derived from the normal update distribution, wadvμparam22Pdist\|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}}5 is not rejected.
  • DOS assigns outlier risk based on distance from the cluster centroid; staying near the center maximizes aggregation weight for the attack (Joshi et al., 2023).

This principle extends to metric-based input defenses: if adversarial perturbations keep within certified radii, many provable- or heuristic-robust classifiers cannot guarantee protection.

4. Theoretical Guarantees and Provable Robustness

Certain distance-constrained attack strategies admit formal guarantees on approximation quality and robustness:

  • For a point wadvμparam22Pdist\|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}}6 and classifier wadvμparam22Pdist\|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}}7, the minimal adversarial perturbation wadvμparam22Pdist\|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}}8 (Euclidean or other norm) is bounded within constant factor wadvμparam22Pdist\|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}}9 by root-finding solutions along the model gradient under mild smoothness and boundary-regularity assumptions. Concretely,

maxwadvLglobal({wi}iB,wadv)   s.t.    wadvμparam22Pdist\begin{aligned} &\max_{w_{\text{adv}}} L_{\text{global}}\left(\{w_i\}_{i\in\mathcal B}, w_{\text{adv}}\right) \ &\;\textrm{s.t.}\;\; \|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}} \end{aligned}0

for maxwadvLglobal({wi}iB,wadv)   s.t.    wadvμparam22Pdist\begin{aligned} &\max_{w_{\text{adv}}} L_{\text{global}}\left(\{w_i\}_{i\in\mathcal B}, w_{\text{adv}}\right) \ &\;\textrm{s.t.}\;\; \|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}} \end{aligned}1 derived from projected line search along maxwadvLglobal({wi}iB,wadv)   s.t.    wadvμparam22Pdist\begin{aligned} &\max_{w_{\text{adv}}} L_{\text{global}}\left(\{w_i\}_{i\in\mathcal B}, w_{\text{adv}}\right) \ &\;\textrm{s.t.}\;\; \|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}} \end{aligned}2 (Brau et al., 2022).

  • The distance constraint radius maxwadvLglobal({wi}iB,wadv)   s.t.    wadvμparam22Pdist\begin{aligned} &\max_{w_{\text{adv}}} L_{\text{global}}\left(\{w_i\}_{i\in\mathcal B}, w_{\text{adv}}\right) \ &\;\textrm{s.t.}\;\; \|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}} \end{aligned}3 can serve as a certified region: no adversarial example with maxwadvLglobal({wi}iB,wadv)   s.t.    wadvμparam22Pdist\begin{aligned} &\max_{w_{\text{adv}}} L_{\text{global}}\left(\{w_i\}_{i\in\mathcal B}, w_{\text{adv}}\right) \ &\;\textrm{s.t.}\;\; \|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}} \end{aligned}4 can alter classification, providing a practical robustness certificate with quantifiable error (Brau et al., 2022).
  • In the ProxLogBarrier algorithm, convergence to first-order stationary points for the penalized composite objective is guaranteed under standard properties (Lipschitz gradient, prox-existence), though not global optimality due to nonconvexity (Pooladian et al., 2019).

5. Empirical Performance and Benchmarks

Distance-constrained adversarial perturbations, when deployed as per DISBELIEVE, produce significant degradation in both federated and centralized learning settings:

  • Federated model poisoning (parameter/gradient-based): On datasets such as CheXpert-Small, HAM10000, BreakHis, and CIFAR-10, DISBELIEVE yields severe drops in global AUC under all evaluated defenses. Example: CheXpert-Small, AUC with DISBELIEVE under DOS drops from 0.71 (no attack) to 0.44, outperforming other attacks such as LIE and Min-Max (Joshi et al., 2023).
  • Input-space attacks: On MNIST, CIFAR-10, ImageNet, ProxLogBarrier achieves higher attack success rates and smaller perturbations under multiple metrics; e.g., on CIFAR-10 (undefended, maxwadvLglobal({wi}iB,wadv)   s.t.    wadvμparam22Pdist\begin{aligned} &\max_{w_{\text{adv}}} L_{\text{global}}\left(\{w_i\}_{i\in\mathcal B}, w_{\text{adv}}\right) \ &\;\textrm{s.t.}\;\; \|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}} \end{aligned}5), maxwadvLglobal({wi}iB,wadv)   s.t.    wadvμparam22Pdist\begin{aligned} &\max_{w_{\text{adv}}} L_{\text{global}}\left(\{w_i\}_{i\in\mathcal B}, w_{\text{adv}}\right) \ &\;\textrm{s.t.}\;\; \|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}} \end{aligned}6 of images fooled at maxwadvLglobal({wi}iB,wadv)   s.t.    wadvμparam22Pdist\begin{aligned} &\max_{w_{\text{adv}}} L_{\text{global}}\left(\{w_i\}_{i\in\mathcal B}, w_{\text{adv}}\right) \ &\;\textrm{s.t.}\;\; \|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}} \end{aligned}7 pixel changes, median value maxwadvLglobal({wi}iB,wadv)   s.t.    wadvμparam22Pdist\begin{aligned} &\max_{w_{\text{adv}}} L_{\text{global}}\left(\{w_i\}_{i\in\mathcal B}, w_{\text{adv}}\right) \ &\;\textrm{s.t.}\;\; \|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}} \end{aligned}8, outperforming previous maxwadvLglobal({wi}iB,wadv)   s.t.    wadvμparam22Pdist\begin{aligned} &\max_{w_{\text{adv}}} L_{\text{global}}\left(\{w_i\}_{i\in\mathcal B}, w_{\text{adv}}\right) \ &\;\textrm{s.t.}\;\; \|w_{\mathrm{adv}} - \mu^{\mathrm{param}}\|_2^2 \leq P_{\mathrm{dist}} \end{aligned}9-specialized attacks (Pooladian et al., 2019).
  • Robustness validation: Empirical evaluations confirm that when attacks are restricted to certified radii (distance estimates PdistP_{\mathrm{dist}}0), observed attacks fail on nearly all tested points within this neighborhood, corroborating theoretical robustness claims (Brau et al., 2022).

6. Connections across Domains and Practical Recommendations

Distance-constrained adversarial perturbation frameworks unify several attack and certification strategies across federated and centralized settings:

  • The adversarial “budget” paradigm—maximizing loss under metric bounds—arises identically at the model and input levels.
  • Closed-form proximal operators and geometric reduction underpin efficient and generalizable algorithm designs (applicable for PdistP_{\mathrm{dist}}1, PdistP_{\mathrm{dist}}2, TV seminorm, and others).
  • Distance-based certificate methodologies deliver both practical protection and tight theoretical approximation bounds, especially in the local boundary neighborhood.

Practical recommendations for practitioners include: Choosing hyperparameters (barrier coefficient PdistP_{\mathrm{dist}}3, decay PdistP_{\mathrm{dist}}4, gradient step sizes PdistP_{\mathrm{dist}}5, proximity parameter PdistP_{\mathrm{dist}}6), initializing with large perturbations for misclassification, and focusing on attack-constrained radii commensurate with empirical and theoretical guarantees (Pooladian et al., 2019).

7. Summary Table: Key Algorithmic Elements

Attack Variant Distance Metric / Constraint Optimization Core
DISBELIEVE (federated) Intra-malicious PdistP_{\mathrm{dist}}7 bound Loss-maximization + projection
ProxLogBarrier (input) Any (closed-form prox: PdistP_{\mathrm{dist}}8, PdistP_{\mathrm{dist}}9, TV) Proximal gradient + log-barrier
Certified perturbation Euclidean (f:XΔnf:X\to\Delta_n0) minimal distance Gradient-aligned root-finding

These approaches collectively demonstrate that distance-constrained adversarial perturbation constitutes a principled and broadly effective methodology for evading detection-based defenses and for probing the certified robustness of neural systems (Joshi et al., 2023, Pooladian et al., 2019, Brau et al., 2022).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Distance-Constrained Adversarial Perturbation (DISBELIEVE).