Papers
Topics
Authors
Recent
2000 character limit reached

Query-Adaptive Offset Optimization in LVLMs

Updated 12 January 2026
  • QAO is a targeted method that refines LVLM activation editing by applying query-specific correction offsets based on image-text context.
  • It employs a lightweight MLP-based offset estimator to adjust the general factual steering vector, mitigating hallucination without altering the backbone model.
  • Implemented within the AFTER framework, QAO achieves up to a 16.3% reduction in hallucination, offering precise and scalable correction of model activations.

Query-Adaptive Offset Optimization (QAO) is a mechanism designed to refine the activation editing of Large Vision-LLMs (LVLMs) by introducing query-specific correction vectors. QAO operates within the AFTER framework, which addresses object hallucination caused by language bias in LVLMs. Unlike generic activation editing approaches, QAO enables precise, per-query steering of internal model representations by leveraging a lightweight offset estimator, thereby mitigating the risk of over- or under-correction for diverse user queries (Wang et al., 5 Jan 2026).

1. Motivation and Problem Formulation

LVLMs exhibit vulnerability to object hallucination—erroneously generating mentions of objects, attributes, or relations not grounded in the provided image data. The prevalence of hallucination stems from language bias, which can induce systematic misalignment between visual evidence and textual output. AFTER’s Factual-Augmented Activation Steering (FAS) computes a general steering vector dˉ\bar{\mathbf{d}} that moves internal model activations toward fact-augmented semantics, regardless of the query context. However, because each query qq may reference distinct visual or conceptual entities, applying a uniform edit can insufficiently address instance-specific hallucination. QAO responds by introducing a query-conditioned residual offset Δ(q)\Delta(q), modulating the editing vector for each incoming query, thereby adaptively refining the intervention.

2. Query-Aware Offset Estimator Architecture

The central component of QAO is a single-layer Multi-Layer Perceptron (MLP) G\mathcal{G}, parameterized by WW and bb, which projects the self-attention head activations z\mathbf{z} (already encoding the visual-textual “query” context) into predicted offset vectors:

G(z)=Wz+b\mathcal{G}(\mathbf{z}) = W\mathbf{z} + b

Optionally, a ReLU nonlinearity may be incorporated. This design avoids modifying the backbone LVLM parameters, preserving the overall model structure and relying solely on z\mathbf{z} (the attention output at a given layer) as the implicit encoder for the query context. A plausible implication is that this design streamlines deployment and minimizes computational overhead.

3. Mathematical Formulation

QAO is formulated atop the FAS steering vector and introduces mechanisms to extract, estimate, and apply query-specific offsets:

  • General steering vector:

dˉ=1nXxXi=1n(zi+zi)\bar{\mathbf{d}} = \frac{1}{n|\mathbf{X}|} \sum_{x\in\mathbf{X}} \sum_{i=1}^n (\mathbf{z}_i^+ - \mathbf{z}_i)

where zi+\mathbf{z}_i^+ is the activation from the model fed a trusted, fact-augmented text t+t^+ and query qiq_i, while zi\mathbf{z}_i is the original activation from (x,qi)(x,q_i).

  • Query-specific optimal vector and residual:

d~i=zizi,oi=d~idˉ\tilde{\mathbf{d}}_i = \mathbf{z}_i^* - \mathbf{z}_i,\quad \mathbf{o}_i = \tilde{\mathbf{d}}_i - \bar{\mathbf{d}}

  • Offset estimator and final edited activation:

o^i=G(zi)\widehat{\mathbf{o}}_i = \mathcal{G}(\mathbf{z}_i)

Δ(qi)=dˉ+G(zi)\Delta(q_i) = \bar{\mathbf{d}} + \mathcal{G}(\mathbf{z}_i)

h=h+αΔ(qi)\mathbf{h}' = \mathbf{h} + \alpha\,\Delta(q_i)

When editing multiple attention heads indexed by k=1,,Hk=1,\ldots,H, the update is:

hl+1=hl+Concatk=1H(zl,k+α[dˉ+G(zl,k)])Wol\mathbf{h}^{l+1} = \mathbf{h}^l + \mathrm{Concat}_{k=1}^H\left(\mathbf{z}^{l,k} + \alpha\left[\bar{\mathbf{d}} + \mathcal{G}(\mathbf{z}^{l,k})\right]\right)W_o^l

4. Optimization Objective and Training

The offset estimator G\mathcal{G} is trained to minimize the squared 2\ell_2 norm between its predicted offset and the target residual offset for each annotated instance:

LG=1XnxXi=1nG(zi)oi22\mathcal{L}_{\mathcal{G}} = \frac{1}{|\mathbf{X}|\,n} \sum_{x\in\mathbf{X}} \sum_{i=1}^n \left\|\mathcal{G}(\mathbf{z}_i) - \mathbf{o}_i\right\|_2^2

Crucially, FAS’s dˉ\bar{\mathbf{d}} provides general factual guidance, while QAO’s G\mathcal{G} specializes in refining the “residual” offset oi\mathbf{o}_i associated with each query, mitigating the propensity for inappropriate overcorrection. No additional regularization terms are introduced on G\mathcal{G} in the referenced implementation. Training employs a single-layer projection of dimension ddd\to d, typical learning rate of 10410^{-4}, and batches comprising roughly $500$ COCO images times nn queries, iterated over several epochs.

5. Integration with FAS and Editing Workflow

The end-to-end operation of QAO within AFTER proceeds as follows:

  1. Sampling a small set X\mathbf{X} of images (e.g., from the COCO dataset).
  2. For each image xx, generating fact-augmented text t+t^+ via a factuality function F\mathcal{F}.
  3. For image-query pairs, computing activations zi+\mathbf{z}_i^+ from (t+,qi)(t^+,q_i) and zi\mathbf{z}_i from (x,qi)(x,q_i).
  4. Computing the general steering vector dˉ\bar{\mathbf{d}} (FAS step).
  5. Determining the query-specific disparity d~i\tilde{\mathbf{d}}_i, forming offsets oi\mathbf{o}_i, and training the MLP G\mathcal{G}.
  6. At inference, collecting each head's z\mathbf{z} for a new (x,q)(x,q), applying the edit Δ(q)=dˉ+G(z)\Delta(q) = \bar{\mathbf{d}} + \mathcal{G}(\mathbf{z}), and updating the hidden state h\mathbf{h}'.

This pipeline allows for plug-and-play, inference-time mitigation of hallucination in LVLMs without retraining or fine-tuning the backbone model.

6. Implementation Details

Key parameters and routines are as follows:

  • Hidden state dimension d1024d \approx 1024
  • Quantity of edited heads K=64K=64
  • Editing strength α=7\alpha=7
  • Offset estimator G\mathcal{G}: single linear layer, mapping ddd \to d

The simplified pseudocode below summarizes the operational mechanics:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
D_sum = 0
for x in sample_images:
    t_plus = textualize_facts(x)
    for q in query_list:
        z_plus = LVLM_forward(x_text=t_plus, query=q)
        z_orig = LVLM_forward(image=x, query=q)
        D_sum += (z_plus - z_orig)
bar_d = D_sum / (|sample_images| * len(query_list))

train_pairs = []
for x in sample_images:
    t_plus = textualize_facts(x)
    for q in query_list:
        z_plus = LVLM_forward(x_text=t_plus, query=q)
        z_orig = LVLM_forward(image=x, query=q)
        d_precise = z_plus - z_orig
        o_target = d_precise - bar_d
        train_pairs.append((z_orig, o_target))

initialize G parameters W,b
for epoch in 1..E:
    for (z, o_star) in minibatches(train_pairs):
        o_hat = G(z)
        loss = ||o_hat - o_star||^2
        backpropagate(loss); update(W,b)

def edited_forward(x, q):
    h = initial_encoding(x, q)
    for l in layers:
        for k in top_K_heads:
            z = self_attn_head_output(h, head=k)
            offset = G(z)
            z_corrected = z + alpha*(bar_d + offset)
            # replace head output with z_corrected
        h = next_layer(h)
    return decode(h)

7. Significance and Empirical Impact

QAO, as instantiated in AFTER, advances the granularity and adaptability of activation editing for LVLM hallucination mitigation. By moving from a coarse, query-agnostic steering vector to fine-grained, query-adaptive offsets, AFTER achieved up to a 16.3% reduction in hallucination relative to baseline on the AMBER benchmark, measured over three widely adopted LVLMs. A plausible implication is that QAO's modularity and low-cost deployment facilitate scalable improvement of factual reliability in cross-modal AI systems without costly retraining (Wang et al., 5 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Query-Adaptive Offset Optimization (QAO).