Query-Adaptive Offset Optimization in LVLMs

Updated 12 January 2026

QAO is a targeted method that refines LVLM activation editing by applying query-specific correction offsets based on image-text context.
It employs a lightweight MLP-based offset estimator to adjust the general factual steering vector, mitigating hallucination without altering the backbone model.
Implemented within the AFTER framework, QAO achieves up to a 16.3% reduction in hallucination, offering precise and scalable correction of model activations.

Query-Adaptive Offset Optimization (QAO) is a mechanism designed to refine the activation editing of Large Vision-LLMs (LVLMs) by introducing query-specific correction vectors. QAO operates within the AFTER framework, which addresses object hallucination caused by language bias in LVLMs. Unlike generic activation editing approaches, QAO enables precise, per-query steering of internal model representations by leveraging a lightweight offset estimator, thereby mitigating the risk of over- or under-correction for diverse user queries (Wang et al., 5 Jan 2026).

1. Motivation and Problem Formulation

LVLMs exhibit vulnerability to object hallucination—erroneously generating mentions of objects, attributes, or relations not grounded in the provided image data. The prevalence of hallucination stems from language bias, which can induce systematic misalignment between visual evidence and textual output. AFTER’s Factual-Augmented Activation Steering (FAS) computes a general steering vector $\bar{\mathbf{d}}$ that moves internal model activations toward fact-augmented semantics, regardless of the query context. However, because each query $q$ may reference distinct visual or conceptual entities, applying a uniform edit can insufficiently address instance-specific hallucination. QAO responds by introducing a query-conditioned residual offset $\Delta(q)$ , modulating the editing vector for each incoming query, thereby adaptively refining the intervention.

2. Query-Aware Offset Estimator Architecture

The central component of QAO is a single-layer Multi-Layer Perceptron (MLP) $\mathcal{G}$ , parameterized by $W$ and $b$ , which projects the self-attention head activations $\mathbf{z}$ (already encoding the visual-textual “query” context) into predicted offset vectors:

$\mathcal{G}(\mathbf{z}) = W\mathbf{z} + b$

Optionally, a ReLU nonlinearity may be incorporated. This design avoids modifying the backbone LVLM parameters, preserving the overall model structure and relying solely on $\mathbf{z}$ (the attention output at a given layer) as the implicit encoder for the query context. A plausible implication is that this design streamlines deployment and minimizes computational overhead.

3. Mathematical Formulation

QAO is formulated atop the FAS steering vector and introduces mechanisms to extract, estimate, and apply query-specific offsets:

General steering vector:

$\bar{\mathbf{d}} = \frac{1}{n|\mathbf{X}|} \sum_{x\in\mathbf{X}} \sum_{i=1}^n (\mathbf{z}_i^+ - \mathbf{z}_i)$

where $\mathbf{z}_i^+$ is the activation from the model fed a trusted, fact-augmented text $t^+$ and query $q_i$ , while $\mathbf{z}_i$ is the original activation from $(x,q_i)$ .

Query-specific optimal vector and residual:

$\tilde{\mathbf{d}}_i = \mathbf{z}_i^* - \mathbf{z}_i,\quad \mathbf{o}_i = \tilde{\mathbf{d}}_i - \bar{\mathbf{d}}$

Offset estimator and final edited activation:

$\widehat{\mathbf{o}}_i = \mathcal{G}(\mathbf{z}_i)$

$\Delta(q_i) = \bar{\mathbf{d}} + \mathcal{G}(\mathbf{z}_i)$

$\mathbf{h}' = \mathbf{h} + \alpha\,\Delta(q_i)$

When editing multiple attention heads indexed by $k=1,\ldots,H$ , the update is:

$\mathbf{h}^{l+1} = \mathbf{h}^l + \mathrm{Concat}_{k=1}^H\left(\mathbf{z}^{l,k} + \alpha\left[\bar{\mathbf{d}} + \mathcal{G}(\mathbf{z}^{l,k})\right]\right)W_o^l$

4. Optimization Objective and Training

The offset estimator $\mathcal{G}$ is trained to minimize the squared $\ell_2$ norm between its predicted offset and the target residual offset for each annotated instance:

$\mathcal{L}_{\mathcal{G}} = \frac{1}{|\mathbf{X}|\,n} \sum_{x\in\mathbf{X}} \sum_{i=1}^n \left\|\mathcal{G}(\mathbf{z}_i) - \mathbf{o}_i\right\|_2^2$

Crucially, FAS’s $\bar{\mathbf{d}}$ provides general factual guidance, while QAO’s $\mathcal{G}$ specializes in refining the “residual” offset $\mathbf{o}_i$ associated with each query, mitigating the propensity for inappropriate overcorrection. No additional regularization terms are introduced on $\mathcal{G}$ in the referenced implementation. Training employs a single-layer projection of dimension $d\to d$ , typical learning rate of $10^{-4}$ , and batches comprising roughly $500$ COCO images times $n$ queries, iterated over several epochs.

5. Integration with FAS and Editing Workflow

The end-to-end operation of QAO within AFTER proceeds as follows:

Sampling a small set $\mathbf{X}$ of images (e.g., from the COCO dataset).
For each image $x$ , generating fact-augmented text $t^+$ via a factuality function $\mathcal{F}$ .
For image-query pairs, computing activations $\mathbf{z}_i^+$ from $(t^+,q_i)$ and $\mathbf{z}_i$ from $(x,q_i)$ .
Computing the general steering vector $\bar{\mathbf{d}}$ (FAS step).
Determining the query-specific disparity $\tilde{\mathbf{d}}_i$ , forming offsets $\mathbf{o}_i$ , and training the MLP $\mathcal{G}$ .
At inference, collecting each head's $\mathbf{z}$ for a new $(x,q)$ , applying the edit $\Delta(q) = \bar{\mathbf{d}} + \mathcal{G}(\mathbf{z})$ , and updating the hidden state $\mathbf{h}'$ .

This pipeline allows for plug-and-play, inference-time mitigation of hallucination in LVLMs without retraining or fine-tuning the backbone model.

6. Implementation Details

Key parameters and routines are as follows:

Hidden state dimension $d \approx 1024$
Quantity of edited heads $K=64$
Editing strength $\alpha=7$
Offset estimator $\mathcal{G}$ : single linear layer, mapping $d \to d$

The simplified pseudocode below summarizes the operational mechanics:

D_sum = 0
for x in sample_images:
    t_plus = textualize_facts(x)
    for q in query_list:
        z_plus = LVLM_forward(x_text=t_plus, query=q)
        z_orig = LVLM_forward(image=x, query=q)
        D_sum += (z_plus - z_orig)
bar_d = D_sum / (|sample_images| * len(query_list))

train_pairs = []
for x in sample_images:
    t_plus = textualize_facts(x)
    for q in query_list:
        z_plus = LVLM_forward(x_text=t_plus, query=q)
        z_orig = LVLM_forward(image=x, query=q)
        d_precise = z_plus - z_orig
        o_target = d_precise - bar_d
        train_pairs.append((z_orig, o_target))

initialize G parameters W,b
for epoch in 1..E:
    for (z, o_star) in minibatches(train_pairs):
        o_hat = G(z)
        loss = ||o_hat - o_star||^2
        backpropagate(loss); update(W,b)

def edited_forward(x, q):
    h = initial_encoding(x, q)
    for l in layers:
        for k in top_K_heads:
            z = self_attn_head_output(h, head=k)
            offset = G(z)
            z_corrected = z + alpha*(bar_d + offset)
            # replace head output with z_corrected
        h = next_layer(h)
    return decode(h)

7. Significance and Empirical Impact

QAO, as instantiated in AFTER, advances the granularity and adaptability of activation editing for LVLM hallucination mitigation. By moving from a coarse, query-agnostic steering vector to fine-grained, query-adaptive offsets, AFTER achieved up to a 16.3% reduction in hallucination relative to baseline on the AMBER benchmark, measured over three widely adopted LVLMs. A plausible implication is that QAO's modularity and low-cost deployment facilitate scalable improvement of factual reliability in cross-modal AI systems without costly retraining (Wang et al., 5 Jan 2026).

PDF Markdown Chat (Pro)

References (1)

AFTER: Mitigating the Object Hallucination of LVLM via Adaptive Factual-Guided Activation Editing (2026)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Query-Adaptive Offset Optimization (QAO).

Query-Adaptive Offset Optimization in LVLMs

1. Motivation and Problem Formulation

2. Query-Aware Offset Estimator Architecture

3. Mathematical Formulation

4. Optimization Objective and Training

5. Integration with FAS and Editing Workflow

6. Implementation Details

7. Significance and Empirical Impact

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Query-Adaptive Offset Optimization in LVLMs

1. Motivation and Problem Formulation

2. Query-Aware Offset Estimator Architecture

3. Mathematical Formulation

4. Optimization Objective and Training

5. Integration with FAS and Editing Workflow

6. Implementation Details

7. Significance and Empirical Impact

Sponsor

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research