Cross-Image Contagious Constraint
- Cross-image contagious constraint is a mechanism that propagates perturbations across multiple images, coupling local changes with global model behavior.
- It leverages transformer self-attention in LAMP and gradient-guided coupling in PCD to enforce adversarial effects and synchronized attribute interactions.
- Empirical evaluations reveal significant gains in attack success rates and effective enforcement of inter-image attributes in both adversarial and generative settings.
A cross-image contagious constraint is a formal mechanism whereby information, perturbations, or imposed properties on a subset of images—within a multi-image or joint generative setting—are systematically designed to propagate or “infect” representations associated with other images or tokens in the model’s computational graph. The primary use of such constraints is to engineer global coupling or adversarial effects such that local manipulations yield maximum system-wide impact. This paradigm has been developed in distinct contexts: (1) as a targeted adversarial vulnerability amplification strategy in multi-image multimodal LLMs (MLLMs), and (2) as a mechanism for enforcing attribute interaction or contrast in coupled diffusion-based joint generation. Both use the principle of cross-component “contagion,” making local changes produce reliable influence elsewhere in the input or generative space (Ishmam et al., 29 Jan 2026, Luan et al., 14 Aug 2025).
1. Formal Definitions and Mathematical Construction
In the context of multi-image MLLMs (Ishmam et al., 29 Jan 2026), consider an input comprising images and a text prompt . A subset of images is perturbed by universal adversarial perturbations such that . The full input sequence is constructed by interleaving tokenized visual and text tokens, with “noisy” (perturbed) token positions in set and “clean” token positions in . For each transformer decoder layer and attention head , let denote the attention from query token to key token .
The cross-image contagious constraint is formulated as the “contagious loss”:
Minimizing directly maximizes the average attention flowing from clean to noisy tokens, forcing clean representations to be computed as mixtures with contaminated (perturbed) information. This indirectly amplifies the reach of adversarial intervention, even when only images are modified (Ishmam et al., 29 Jan 2026).
In coupled diffusion models (Luan et al., 14 Aug 2025), cross-image contagious constraints are implemented through differentiable coupling cost functions . The key design is for gradients and to induce desired “contagious” attribute interactions between two or more images, typically by minimizing whenever a global property (such as distinct classification) is enforced across samples. For instance, in age-contrast face-pair generation, choosing
leads the negative gradients to drive one image toward “young” when the other is “old,” and vice versa, thereby enforcing attribute contrast via gradient-based “infection” (Luan et al., 14 Aug 2025).
2. Mechanistic Propagation and Architectural Dependencies
The contagious constraint exploits the self-attention mechanism of transformers or the synchronized gradient flow of latent samplers. In MLLMs, maximizing biases clean tokens (including text) to aggregate their hidden representations from noisy, adversarially altered sources at every decoder layer. This effect recursively propagates as each layer’s representation incorporates greater “infection” from the initial perturbations, thus biasing subsequent predictions or generations robustly, even as the majority of image inputs remain unmodified.
In test-time coupled diffusion, the negative gradients of are injected into each individual chain (e.g., and ) during Langevin or denoising steps. This guidance “contagiously” synchronizes the attribute distributions, causing property modifications in one sample’s latent to directly influence the generative trajectory of the other. The “cross-image contagious” effect is achieved entirely through the structure of and the architectural coupling, without explicit retraining of the component model weights (Luan et al., 14 Aug 2025).
3. Integration into End-to-End Objectives and Training
In LAMP (Ishmam et al., 29 Jan 2026), the contagious constraint is instantiated as one component () in a weighted sum of five adversarial objectives:
where by default. The total loss is minimized with respect to (the perturbations), leaving the model parameters frozen. The contagious objective is thus not isolated but functions in concert with log-likelihood minimization, hidden state decorrelation, Hausdorff distance maximization for attention, and index-attention suppression. All are optimized using a shared backward pass, with only receiving parameter updates and clipping to enforce perturbation norms (see Section 5 for pseudocode).
In Projected Coupled Diffusion (PCD), the cross-image contagious constraint is structurally embedded in the alternating steps of (1) a coupled guidance update using the cost , and (2) a projection step enforcing hard constraints on each variable. The resultant samples (e.g., ) satisfy both marginal fidelity, hard constraints, and global contagious properties, as required for application contexts such as image pairing, manipulation, or coordinated robotics (Luan et al., 14 Aug 2025).
4. Algorithmic Realization
LAMP Cross-Image Contagion Pseudocode (Ishmam et al., 29 Jan 2026)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
for each training epoch: for each minibatch of samples {(x₁…x_M, t)}: # Apply δ to the first k images x'_i = x_i + δ_i for i=1…k build interleaved token sequence s # Forward pass: compute attention weights for all layers/heads model(s) → { A^{(ℓ)}_{h} } for ℓ=1…L, h=1…H # Compute contagious loss L_ctg = 0 for ℓ in 1…L: for h in 1…H: L_ctg += sum_{i∈ℂ, j∈ℕ} A^{(ℓ)}_{h,i,j} L_ctg = -L_ctg / (L*H) # Combine all loss terms, backpropagate w.r.t. δ L_total = λ₁ L_lm + λ₂ L_dec + λ₃ L_h + λ₄ L_ctg + λ₅ L_ias ∇_δ L_total δ ← δ - η * ∇_δ L_total δ ← clip(δ, -ε, +ε) |
PCD Cross-Image Contagion Algorithm (Luan et al., 14 Aug 2025)
- Initialize ,
- For :
- Coupled LMC step:
- Projection:
- Return
These formulations ensure that, even with only partial intervention, the targeted properties or adversarial effects are propagated jointly and systematically across all coupled inputs.
5. Empirical Validation and Observed Effects
The necessity and sufficiency of the cross-image contagious constraint are evidenced in both adversarial and generative applications.
In LAMP, ablation studies show that removing alone causes substantial drops in attack success rate (ASR): from 73.43% (full LAMP) to 70.32%, and to 67.12% when index-attention suppression is also omitted. The Contamination Index (CI) rises from 0.14 to 0.31 with the inclusion of , parallel to a 12.3 percentage point ASR gain, corroborating that attention redistribution toward noisy tokens is critical for attack effectiveness. Visualization of attention maps further reveals the “infection” phenomenon: in clean settings attention remains on text; under the contagious constraint, attention mass shifts to perturbed image tokens even when only two out of four images are modified. These effects are robust across seven MLLMs and five benchmarks, with LAMP outperforming strong baselines by a mean margin of 19.5 percentage points (Ishmam et al., 29 Jan 2026).
In PCD, the use of contagious constraints enables reliable enforcement of attribute relationships (such as XORed properties in paired generation) as validated in face-pair and manipulation experiments. The projected coupled diffusion framework both ensures hard constraint satisfaction and achieves the target cross-image property with high empirical consistency (Luan et al., 14 Aug 2025).
6. Comparative Analysis and Theoretical Guarantees
Both transformer-based (LAMP) and diffusion-based (PCD) frameworks instantiate cross-image contagious constraints via gradient-driven coupling, but differ in implementation and theoretical properties:
- In LAMP, propagation occurs via self-attention mixing in frozen model architectures, relying heavily on layer-wise attention dynamics to mediate adversarial “spread.”
- In PCD, propagation is explicit in the negative gradients of the coupling cost , and is further rigidified by projection steps that ensure feasibility of marginals.
Projected LMC theory guarantees convergence to the constrained target distribution if each marginal distribution is log-concave and all feasible sets are convex. The contagious constraint, in this context, acts as a synchronizing force, while projection rigidly enforces hard-set compliance (Luan et al., 14 Aug 2025). In contrast, there is no analogous strict convergence result for cross-image contagious objectives in LAMP, but empirical gains in adversarial efficacy are consistent and reproducible (Ishmam et al., 29 Jan 2026).
A plausible implication is that the general principle of cross-input contagious constraints can be abstracted and adapted across most architectures with explicit cross-instance representation mixing, provided the coupling mechanism interacts sufficiently “aggressively” with the computational graph. This suggests broader utility wherever global property imposition via local intervention is advantageous.
7. Applications, Limitations, and Broader Impact
Cross-image contagious constraints are of interest for both adversarial robustness evaluation and complex attribute coordination in joint generation tasks. In adversarial MLLMs, they reveal that carefully structured perturbations can infect the entire multi-image, multi-modal input sequence without direct manipulation of every constituent image. In coupled diffusion models, they provide a mechanism by which inter-image relationships (e.g., contrast, similarity) can be reliably enforced at test-time without model retraining.
The main limitation is the architectural dependency: effectiveness is reduced in systems with weak cross-input interaction or minimal representation sharing. Additionally, while projection mechanisms in PCD yield theoretical guarantees under convexity, such rigid enforcement may not extend cleanly to non-convex or highly-structured input spaces.
Overall, cross-image contagious constraints furnish a rigorous and generalizable framework for propagating local changes to global effects in multi-input modeling, with demonstrated efficacy in both adversarial and constructive cross-modal settings (Ishmam et al., 29 Jan 2026, Luan et al., 14 Aug 2025).