Papers
Topics
Authors
Recent
2000 character limit reached

CHILI: Inhibiting Hallucinations in Vision-Language Models

Updated 11 October 2025
  • The paper introduces CHILI, a method that decomposes CLIP activations into localized object and context components to suppress hallucinated detections.
  • It employs median filtering and IoU-based spatial weighting to isolate true visual evidence from contextual bias in concept bottleneck models.
  • CHILI enhances model faithfulness and interpretability, offering more reliable explanations for high-stakes applications such as medical imaging and legal evidencing.

Concept Hallucination Inhibition via Localized Interpretability (CHILI) denotes a family of methodologies that selectively inhibit model hallucinations by leveraging localized, interpretable signals from within high-capacity vision-language architectures. While hallucination in this context refers to models generating concepts (objects, attributes, or relations) inconsistent with the input image, CHILI seeks to disentangle and suppress such spurious activations—typically by localizing and analyzing internal feature contributions associated with the purported concept. CHILI advances explainable AI paradigms, particularly for concept bottleneck models (CBMs) built atop foundational vision-language encoders such as CLIP, where model explanations and factuality both depend on accurate, concept-aware representations.

1. Background: Hallucination and Concept Bottleneck Models

Hallucination in the context of vision-language modeling is the model’s tendency to signal the presence (or absence) of high-level semantic concepts (e.g., “tail”, “wing”, “bridge”) in an image, even when that signal results from contextual or class-associative bias rather than direct visual evidence. This issue is especially consequential for CBMs that use CLIP in a zero-shot fashion: such models explain predictions by explicit reference to concept detectors, but the faithfulness of these explanations is compromised if concept presence is due to class-level context rather than object localization (Kazmierczak et al., 8 Oct 2025).

CLIP-based CBMs encode an image via a ViT backbone, interfacing with the concept space by measuring the cosine similarity between projected image and text embeddings. This embedding, however, is an entanglement of object evidence and context—yielding explanations that may be conceptually plausible but visually unfaithful.

2. Disentangling Image Embeddings in CLIP-based Architectures

CHILI introduces a mechanism to explicitly decompose the CLIP image embedding into localized “object” and diffuse “context” contributions. Formally, each image I is encoded into patch tokens, which, through a stack of L transformer layers and H attention heads, generate vector contributions

mi,l,hPWO(l,h)(αcls,i(l,h)vi(l,h))m_{i,l,h} \equiv P \cdot W_O^{(l,h)}\left( \alpha^{(l,h)}_{\text{cls},i}\,v^{(l,h)}_i \right)

where PP is the final projection, WO(l,h)W_O^{(l,h)} is the output projection for head h, α\alpha the attention from the class token to image token ii in layer ll, and vi(l,h)v^{(l,h)}_i the corresponding value vector.

The global CLIP similarity is decomposed as

S(I,T)=Mimg(I),Mtext(T)=l=1Lh=1Hi=0NAi,l,h(T)+εS(I, T) = \langle M_\text{img}(I), M_\text{text}(T) \rangle = \sum_{l=1}^L \sum_{h=1}^H \sum_{i=0}^N A_{i,l,h}(T) + \varepsilon

with Ai,l,h(T)=mi,l,h,Mtext(T)A_{i,l,h}(T) = \langle m_{i,l,h}, M_\text{text}(T)\rangle. The residual ε\varepsilon encompasses non-attention-derived components.

Crucially, CHILI partitions each activation map Al,hA_{l,h} into spatially interpretable object and context terms: Al,h=Al,h(Object)+Al,h(Context)A_{l,h} = A_{l,h}^\text{(Object)} + A_{l,h}^\text{(Context)} This decomposition forms the basis for localized interpretability and subsequent hallucination inhibition.

3. Localizing and Filtering Concept Activations

The disambiguation of object and context is operationalized in CHILI by:

a. Filtering Pseudo-Registers: Vision transformers often accumulate global high-norm activations ("pseudo registers") that contribute to class-level or context-centric signals. These are removed by subtracting a local median filter fm(Al,h)f_m(A_{l,h}) from each activation map, isolating spatially meaningful content.

b. Spatial Weighting via IoU Calibration: Using a probe dataset with ground-truth segmentations GG for the target concept, CHILI quantifies each layer-head pair's object-localization capacity by calculating Intersection over Union (IoU) between the thresholded median-filtered activation hm(Al,h)\operatorname{hm}(A_{l,h}) and the segmentation. The IoU-based weight is

wl,h=E[1exp(αIoU(hm(Al,h),G))]w_{l,h} = \mathbb{E}\left[1 - \exp\left(-\alpha\, \operatorname{IoU}(\operatorname{hm}(A_{l,h}), G)\right)\right]

with α\alpha a temperature parameter.

c. Weighted Activation Decomposition: The filtered activation is split as

Al,h(Object)=wl,hfm(Al,h),Al,h(Context)=(1wl,h)fm(Al,h)A_{l,h}^\text{(Object)} = w_{l,h}\,f_m(A_{l,h}),\quad A_{l,h}^\text{(Context)} = (1 - w_{l,h})\,f_m(A_{l,h})

Aggregating over l,h,il,h,i yields S(Object)S^\text{(Object)} and S(Context)S^\text{(Context)}, enabling the CLIP similarity score to be written as

S(I,T)=S(Object)+S(Context)+εS(I,T) = S^\text{(Object)} + S^\text{(Context)} + \varepsilon

This explicit decomposition forms the interpretative basis for distinguishing between correctly-grounded and contextually-induced concept activations.

4. Hallucination Inhibition and Saliency-Based Explanations

The output S(Object)S^\text{(Object)} isolates the localized, visually-supported evidence for the target concept, suppressing spurious activations that arise from context. This not only inhibits concept hallucination but also allows the model explanation to be direct, visual, and faithful.

Further, the object-localized activation maps can be used for saliency-based XAI explanations (e.g., via DeepSHAP), producing heatmaps that more precisely and interpretable highlight the image regions responsible for a given concept detection. Thus, CHILI provides not only hallucination inhibition but also improved, interpretable visual explanations.

5. Implications for XAI, Model Faithfulness, and Downstream Applications

By disambiguating object evidence from contextual bias, CHILI enhances the faithfulness of explanations generated by CBMs and other CLIP-based models. As S(Object)S^\text{(Object)} is directly supported by localized pixel activations corresponding to the concept, explanations become trustworthy for high-stakes applications (e.g., medical imaging, legal evidencing).

This suggests that CHILI improves both the factuality of concept predictions and the reliability of model explanations in settings where class-context confounds are prevalent. A plausible implication is that CHILI also mitigates failure modes where background cues induce false-positive concept detection—reducing both model hallucination rate and false explanatory signals.

6. Limitations and Prospects for Future Development

Several limitations are articulated:

  • The decomposition assumes layer-head independence, disregarding higher-order interdependencies.
  • Not all high-norm activation patterns ("pseudo-registers") are clearly interpretable; their precise function remains ambiguous.
  • Filtering out contextual features can, in certain cases, reduce overall accuracy if genuine object evidence is ambiguous or occluded.

Future work may extend the CHILI framework to non-ViT or multimodal transformer architectures, or model cross-layer and cross-head interactions to better account for complex feature entanglements. The balance between retaining informative contextual signals and suppressing context-induced hallucination is another axis for potential refinement.

7. Summary Table: Core Components of CHILI in CLIP-based CBMs

Component Purpose Formulation/Operation
Median filter fmf_m Isolate spatial from global activation Al,hfm(Al,h)A_{l,h}-f_m(A_{l,h})
IoU-based head weighting Quantify localization ability wl,hw_{l,h} as above
Object/context decomposition Partition local vs. context activations Al,h(O,C)A_{l,h}^\text{(O,C)}
Activation aggregation Isolate object-supporting score S(Object)S^\text{(Object)}

This summary table organizes the essential steps underpinning the CHILI methodology, as described in (Kazmierczak et al., 8 Oct 2025).

In sum, CHILI (Concept Hallucination Inhibition via Localized Interpretability) provides a principled and technically rigorous approach to suppressing hallucinated concept detections and enhancing both the faithfulness and interpretability of vision-language AI systems by localizing and analyzing the source of semantic activations within the model.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Concept Hallucination Inhibition via Localized Interpretability (CHILI).