Papers
Topics
Authors
Recent
Search
2000 character limit reached

Entropy-bank Guided Adversarial Attacks

Updated 2 January 2026
  • EGA is a targeted adversarial method that exploits token-level uncertainty in vision-language models to efficiently alter output trajectories.
  • It selects approximately 20% of high-entropy tokens and uses a reusable entropy bank to improve attack transferability across different model architectures.
  • Empirical evaluations show over 93% attack success and a significant increase in harmful content conversion compared to global perturbation methods.

Entropy-bank Guided Adversarial Attacks (EGA) refer to a targeted adversarial methodology designed to exploit uncertainty-driven vulnerabilities within autoregressive vision-LLMs (VLMs). EGA leverages the observation that only a small subset (approximately 20%) of token prediction steps—marked by high token-level entropy—primarily governs output trajectories. By concentrating perturbations at these “critical decision points,” EGA enables both model-specific and cross-model semantic degradation with greater efficiency and transferability than prior global or random attacks. The approach is distinguished by the construction of a reusable entropy bank, allowing for selective, high-impact targeting of tokens that are empirically most susceptible to adversarial manipulation (He et al., 26 Dec 2025).

1. Entropy in Vision-LLM Decoding

In the context of EGA, entropy serves as a measure of prediction uncertainty at each step of autoregressive VLM generation. For a VLM fθ:I{y1,,yT}f_\theta : I \to \{y_1, \ldots, y_T\} that autoregressively produces a token sequence conditioned on an image II, the token-level entropy at time tt is given by

Ht:=wVpt(w)logpt(w)H_t := -\sum_{w \in \mathcal{V}} p_t(w) \log p_t(w)

where pt(w)=P(yt=w  I,y<t)p_t(w) = P(y_t = w ~|~ I, y_{<t}) is the model’s predicted probability for token ww at step tt and V\mathcal{V} is the vocabulary. A single teacher-forced pass on clean images yields an entropy profile {H1,...,HT}\{H_1, ..., H_T\}, quantifying generative instability at each position.

2. Selection and Characterization of High-Entropy Tokens

EGA identifies the tokens responsible for the majority of generation “forks” by selecting the top quantile of positions with the highest entropies. Specifically, entropies are ranked in descending order, and a set Sq={σ(1),...,σ(k)}S_q = \{\sigma(1), ..., \sigma(k)\} is formed, where k=qTk = \lceil q \cdot T \rceil and qq is the selection ratio (default q=0.20q=0.20). Alternatively, a threshold τ\tau can define the set S={tHtτ}S = \{ t \,|\, H_t \geq \tau \} with S/T0.20|S|/T \approx 0.20. Empirical findings establish that this ≈20% subset governs the bulk of generative trajectory changes, thus presenting an optimal attack surface.

3. Construction and Integration of the Entropy Bank

EGA constructs a reusable entropy bank BB to generalize across images and models. This bank comprises the top KK tokens from the vocabulary V\mathcal{V} with the highest empirical “flip-rate,” calculated as:

FlipRate(w)=#images where token w is replaced in a high-entropy positiontotal images\mathrm{FlipRate}(w) = \frac{\#\,\text{images where token } w \text{ is replaced in a high-entropy position}}{\text{total images}}

Tokens are ranked by flip-rate, and the bank BB consists of the top-KK most “flippable.” At attack time, per-image high-entropy masks SqS_q are augmented with token positions in the bank: Sbank={t  y^tB}S_\text{bank} = \{ t ~|~ \hat{y}_t \in B \}, yielding the mask Str=SqSbankS_\text{tr} = S_q \cup S_\text{bank}. This augmentation increases attack effectiveness and enables transferability across VLM architectures.

4. EGA Attack Optimization Procedure

EGA optimizes adversarial perturbations by maximizing the average entropy over the selected positions StrS_\text{tr}, under an LL_\infty norm constraint on the pixel-level perturbation:

maxδ1StrtStrHt(fθ(v0+δ,y^<t))subject toδε\max_\delta \frac{1}{|S_\text{tr}|} \sum_{t \in S_\text{tr}} H_t (f_\theta (v_0 + \delta, \hat{y}_{<t}) ) \qquad \text{subject to} \quad \|\delta\|_\infty \leq \varepsilon

where v0=ψ(I)v_0 = \psi(I) is the normalized pixel input and ε=8/255\varepsilon=8/255 is the standard perturbation budget. Projected gradient ascent with momentum (or Adam optimizer) is performed over 300 steps, with periodic mask refresh (every R=50R=50 steps). Exact pseudocode is provided in the source (He et al., 26 Dec 2025), specifying key steps: initial greedy decoding, entropy computation, mask formation, iterative gradient updates, and final adversarial decoding.

5. Empirical Results and Baseline Comparisons

Quantitative evaluation on standard VLMs (Qwen2.5-VL-7B-Instruct, InternVL3.5-4B, LLaVA-1.5-7B) demonstrates that EGA achieves attack success rates (ASR) of 93.1%93.1\%94.8%94.8\% with Δ\DeltaCIDEr scores near 0.85–0.88. Crucially, EGA induces harmful content (violence, self-harm, hate, etc.) in 37.3%37.3\%47.1%47.1\% of outputs—more than doubling the harmful rate achieved by global-entropy (MIE) baselines (14%14\%23%23\%) and greatly exceeding standard methods such as PGD, VLA, and COA (all <2%<2\% harmful conversion). In visual question answering, similar patterns are observed, with EGA producing 25%25\%29%29\% harmful conversions and retaining >80%>80\% ASR. Transferability metrics show EGA achieves 17%17\%26%26\% harmful rates on unseen target models, compared with <12%<12\% for transferred XTA or MIE attacks.

Model ASR (%) ΔCIDEr Harmful Rate (%)
Qwen2.5-VL-7B-Instruct 94.8 0.88 42.5
InternVL3.5-4B 93.8 0.86 37.3
LLaVA-1.5-7B 93.1 0.85 47.1

6. Key Insights and Implications

Empirical findings reveal that approximately 20% of tokens with highest entropy govern the majority of generative variability. Restricting adversarial perturbations to these “decision tokens” not only preserves attack efficacy but leads to disproportionate semantic drift—enabling conversion of 35%35\%49%49\% of benign outputs to harmful content, a vulnerability underreported by prior global-entropy attacks. High-entropy tokens show recurrence across different VLM architectures, enabling practical cross-model transfer of attacks.

A salient implication is that robustness interventions should focus on stabilizing next-token distributions specifically at high-entropy forks, for example, via uncertainty regularization or dynamic token masking, instead of merely increasing average-case robustness. EGA thus exposes a concentrated vulnerability in modern VLM decoding and motivates reevaluation of multimodal safety mechanisms (He et al., 26 Dec 2025).

7. Practical Considerations and Implementation

EGA is implemented under a standardized LL_\infty-norm constraint on normalized pixels (ε=8/255\varepsilon=8/255 as default), with optimizer options including Adam (β1=0.9\beta_1=0.9, β2=0.999\beta_2=0.999) or momentum-based PGD (μ=0.6\mu=0.6). Perturbations are iteratively updated, and the selected entropy mask may be optionally refreshed to account for evolving generative trajectories. Parameters such as token budget (q=0.20q=0.20), entropy bank size (K=100K=100), and mask refresh interval (R=50R=50) are configurable, with efficacy robust across reasonable variations. Decoding is performed greedily with typical sequence length restrictions.

A plausible implication is that the EGA framework and entropy bank concept may be adapted to broader autoregressive, multimodal, or sequence-generation settings where token-level uncertainty can be exploited for selective adversarial control.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Entropy-bank Guided Adversarial Attacks (EGA).