Papers
Topics
Authors
Recent
Search
2000 character limit reached

Cross-Image Contagious Constraint

Updated 5 February 2026
  • Cross-image contagious constraint is a mechanism that propagates perturbations across multiple images, coupling local changes with global model behavior.
  • It leverages transformer self-attention in LAMP and gradient-guided coupling in PCD to enforce adversarial effects and synchronized attribute interactions.
  • Empirical evaluations reveal significant gains in attack success rates and effective enforcement of inter-image attributes in both adversarial and generative settings.

A cross-image contagious constraint is a formal mechanism whereby information, perturbations, or imposed properties on a subset of images—within a multi-image or joint generative setting—are systematically designed to propagate or “infect” representations associated with other images or tokens in the model’s computational graph. The primary use of such constraints is to engineer global coupling or adversarial effects such that local manipulations yield maximum system-wide impact. This paradigm has been developed in distinct contexts: (1) as a targeted adversarial vulnerability amplification strategy in multi-image multimodal LLMs (MLLMs), and (2) as a mechanism for enforcing attribute interaction or contrast in coupled diffusion-based joint generation. Both use the principle of cross-component “contagion,” making local changes produce reliable influence elsewhere in the input or generative space (Ishmam et al., 29 Jan 2026, Luan et al., 14 Aug 2025).

1. Formal Definitions and Mathematical Construction

In the context of multi-image MLLMs (Ishmam et al., 29 Jan 2026), consider an input comprising MM images {x1,,xM}\{x_1,\ldots,x_M\} and a text prompt tt. A subset k<Mk < M of images is perturbed by universal adversarial perturbations {δ1,,δk}\{\delta_1,\ldots,\delta_k\} such that δiϵ\lVert \delta_i\rVert_\infty \leq \epsilon. The full input sequence ss is constructed by interleaving tokenized visual and text tokens, with “noisy” (perturbed) token positions in set N\mathcal{N} and “clean” token positions in C\mathcal{C}. For each transformer decoder layer \ell and attention head hh, let Ah,i,j()RA^{(\ell)}_{h,i,j}\in\mathbb R denote the attention from query token ii to key token jj.

The cross-image contagious constraint is formulated as the “contagious loss”:

Ladvctg=1LH=1Lh=1HiCjNAh,i,j()\mathcal{L}_{\mathrm{adv}}^{\mathrm{ctg}} = -\frac{1}{L H}\sum_{\ell=1}^L\sum_{h=1}^H\sum_{i\in\mathcal{C}}\sum_{j\in\mathcal{N}} A^{(\ell)}_{h,i,j}

Minimizing Ladvctg\mathcal{L}_{\mathrm{adv}}^{\mathrm{ctg}} directly maximizes the average attention flowing from clean to noisy tokens, forcing clean representations to be computed as mixtures with contaminated (perturbed) information. This indirectly amplifies the reach of adversarial intervention, even when only kMk\ll M images are modified (Ishmam et al., 29 Jan 2026).

In coupled diffusion models (Luan et al., 14 Aug 2025), cross-image contagious constraints are implemented through differentiable coupling cost functions c(x,y)c(x, y). The key design is for gradients xc\nabla_x c and yc\nabla_y c to induce desired “contagious” attribute interactions between two or more images, typically by minimizing cc whenever a global property (such as distinct classification) is enforced across samples. For instance, in age-contrast face-pair generation, choosing

cXOR(x,y)=a{Y,O}[p(ax)(1p(ay))+p(ay)(1p(ax))]c_{\mathrm{XOR}}(x, y) = -\sum_{a \in \{\mathrm{Y}, \mathrm{O}\}} \big[p(a|x)(1-p(a|y)) + p(a|y)(1-p(a|x))\big]

leads the negative gradients to drive one image toward “young” when the other is “old,” and vice versa, thereby enforcing attribute contrast via gradient-based “infection” (Luan et al., 14 Aug 2025).

2. Mechanistic Propagation and Architectural Dependencies

The contagious constraint exploits the self-attention mechanism of transformers or the synchronized gradient flow of latent samplers. In MLLMs, maximizing iC,jNAh,i,j\sum_{i\in\mathcal{C},j\in\mathcal{N}}A_{h,i,j} biases clean tokens (including text) to aggregate their hidden representations from noisy, adversarially altered sources at every decoder layer. This effect recursively propagates as each layer’s representation incorporates greater “infection” from the initial perturbations, thus biasing subsequent predictions or generations robustly, even as the majority of image inputs remain unmodified.

In test-time coupled diffusion, the negative gradients of c(x,y)c(x,y) are injected into each individual chain (e.g., XX and YY) during Langevin or denoising steps. This guidance “contagiously” synchronizes the attribute distributions, causing property modifications in one sample’s latent to directly influence the generative trajectory of the other. The “cross-image contagious” effect is achieved entirely through the structure of cc and the architectural coupling, without explicit retraining of the component model weights (Luan et al., 14 Aug 2025).

3. Integration into End-to-End Objectives and Training

In LAMP (Ishmam et al., 29 Jan 2026), the contagious constraint is instantiated as one component (Ladvctg\mathcal{L}_{\mathrm{adv}}^{\mathrm{ctg}}) in a weighted sum of five adversarial objectives:

Ladv=λ1Llm+λ2Ldec+λ3Lh+λ4Ladvctg+λ5Ladvias\mathcal{L}_\mathrm{adv} = \lambda_1 \mathcal{L}^{\mathrm{lm}} + \lambda_2 \mathcal{L}^{\mathrm{dec}} + \lambda_3 \mathcal{L}^{h} + \lambda_4 \mathcal{L}_{\mathrm{adv}}^{\mathrm{ctg}} + \lambda_5 \mathcal{L}_{\mathrm{adv}}^{\mathrm{ias}}

where λ4=1.2\lambda_4=1.2 by default. The total loss is minimized with respect to δ\delta (the perturbations), leaving the model parameters frozen. The contagious objective is thus not isolated but functions in concert with log-likelihood minimization, hidden state decorrelation, Hausdorff distance maximization for attention, and index-attention suppression. All are optimized using a shared backward pass, with only δ\delta receiving parameter updates and \ell_\infty clipping to enforce perturbation norms (see Section 5 for pseudocode).

In Projected Coupled Diffusion (PCD), the cross-image contagious constraint is structurally embedded in the alternating steps of (1) a coupled guidance update using the cost cc, and (2) a projection step enforcing hard constraints on each variable. The resultant samples (e.g., (XT,YT)(X_T, Y_T)) satisfy both marginal fidelity, hard constraints, and global contagious properties, as required for application contexts such as image pairing, manipulation, or coordinated robotics (Luan et al., 14 Aug 2025).

4. Algorithmic Realization

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
for each training epoch:
  for each minibatch of samples {(xx_M, t)}:
    # Apply δ to the first k images
    x'_i = x_i + δ_i   for i=1…k
    build interleaved token sequence s
    # Forward pass: compute attention weights for all layers/heads
    model(s)  { A^{(ℓ)}_{h} } for ℓ=1L, h=1H
    # Compute contagious loss
    L_ctg = 0
    forin 1L:
      for h in 1H:
        L_ctg += sum_{iℂ, jℕ} A^{(ℓ)}_{h,i,j}
    L_ctg = -L_ctg / (L*H)
    # Combine all loss terms, backpropagate w.r.t. δ
    L_total = λ L_lm + λ L_dec + λ L_h + λ L_ctg + λ L_ias
    _δ L_total
    δ  δ - η * _δ L_total
    δ  clip(δ, -ε, +ε)

  • Initialize X0N(0,I)X_0 \sim \mathcal{N}(0, I), Y0N(0,I)Y_0 \sim \mathcal{N}(0, I)
  • For t=0,,T1t=0,\dots,T-1:

    1. Coupled LMC step:

    XXγδxc+δsX+2δϵXX \gets X - \gamma\delta\nabla_x c + \delta s_X + \sqrt{2\delta}\epsilon_X

    YYγδyc+δsY+2δϵYY \gets Y - \gamma\delta\nabla_y c + \delta s_Y + \sqrt{2\delta}\epsilon_Y

  1. Projection:

    XProjKX(X)X \gets \mathrm{Proj}_{\mathcal K_X}(X)

    YProjKY(Y)Y \gets \mathrm{Proj}_{\mathcal K_Y}(Y)

  • Return (XT,YT)(X_T, Y_T)

These formulations ensure that, even with only partial intervention, the targeted properties or adversarial effects are propagated jointly and systematically across all coupled inputs.

5. Empirical Validation and Observed Effects

The necessity and sufficiency of the cross-image contagious constraint are evidenced in both adversarial and generative applications.

In LAMP, ablation studies show that removing Ladvctg\mathcal{L}_{\mathrm{adv}}^{\mathrm{ctg}} alone causes substantial drops in attack success rate (ASR): from 73.43% (full LAMP) to 70.32%, and to 67.12% when index-attention suppression is also omitted. The Contamination Index (CI) rises from 0.14 to 0.31 with the inclusion of Ladvctg\mathcal{L}_{\mathrm{adv}}^{\mathrm{ctg}}, parallel to a 12.3 percentage point ASR gain, corroborating that attention redistribution toward noisy tokens is critical for attack effectiveness. Visualization of attention maps further reveals the “infection” phenomenon: in clean settings attention remains on text; under the contagious constraint, attention mass shifts to perturbed image tokens even when only two out of four images are modified. These effects are robust across seven MLLMs and five benchmarks, with LAMP outperforming strong baselines by a mean margin of 19.5 percentage points (Ishmam et al., 29 Jan 2026).

In PCD, the use of contagious constraints enables reliable enforcement of attribute relationships (such as XORed properties in paired generation) as validated in face-pair and manipulation experiments. The projected coupled diffusion framework both ensures hard constraint satisfaction and achieves the target cross-image property with high empirical consistency (Luan et al., 14 Aug 2025).

6. Comparative Analysis and Theoretical Guarantees

Both transformer-based (LAMP) and diffusion-based (PCD) frameworks instantiate cross-image contagious constraints via gradient-driven coupling, but differ in implementation and theoretical properties:

  • In LAMP, propagation occurs via self-attention mixing in frozen model architectures, relying heavily on layer-wise attention dynamics to mediate adversarial “spread.”
  • In PCD, propagation is explicit in the negative gradients of the coupling cost c(x,y)c(x, y), and is further rigidified by projection steps that ensure feasibility of marginals.

Projected LMC theory guarantees convergence to the constrained target distribution if each marginal distribution is log-concave and all feasible sets are convex. The contagious constraint, in this context, acts as a synchronizing force, while projection rigidly enforces hard-set compliance (Luan et al., 14 Aug 2025). In contrast, there is no analogous strict convergence result for cross-image contagious objectives in LAMP, but empirical gains in adversarial efficacy are consistent and reproducible (Ishmam et al., 29 Jan 2026).

A plausible implication is that the general principle of cross-input contagious constraints can be abstracted and adapted across most architectures with explicit cross-instance representation mixing, provided the coupling mechanism interacts sufficiently “aggressively” with the computational graph. This suggests broader utility wherever global property imposition via local intervention is advantageous.

7. Applications, Limitations, and Broader Impact

Cross-image contagious constraints are of interest for both adversarial robustness evaluation and complex attribute coordination in joint generation tasks. In adversarial MLLMs, they reveal that carefully structured perturbations can infect the entire multi-image, multi-modal input sequence without direct manipulation of every constituent image. In coupled diffusion models, they provide a mechanism by which inter-image relationships (e.g., contrast, similarity) can be reliably enforced at test-time without model retraining.

The main limitation is the architectural dependency: effectiveness is reduced in systems with weak cross-input interaction or minimal representation sharing. Additionally, while projection mechanisms in PCD yield theoretical guarantees under convexity, such rigid enforcement may not extend cleanly to non-convex or highly-structured input spaces.

Overall, cross-image contagious constraints furnish a rigorous and generalizable framework for propagating local changes to global effects in multi-input modeling, with demonstrated efficacy in both adversarial and constructive cross-modal settings (Ishmam et al., 29 Jan 2026, Luan et al., 14 Aug 2025).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Cross-Image Contagious Constraint.