In-Context Anomaly Detection (ICAD)
- In-Context Anomaly Detection (ICAD) is a family of techniques that incorporate task-specific context to identify anomalies relative to local reference sets rather than global statistics.
- It employs diverse methodologies—from classical statistical models to advanced foundation models—to compare test samples with contextual exemplars, achieving state-of-the-art detection performance.
- ICAD enhances interpretability by quantifying deviations through metrics like AUROC and conformal p-values, making it applicable to varied data types such as images, text, and tabular data.
In-Context Anomaly Detection (ICAD) refers to a family of anomaly and out-of-distribution (OOD) detection methods that incorporate contextual or reference information directly into the detection process. Unlike global anomaly detectors that define normality purely in terms of population statistics, ICAD methods compare each test instance against a local or task-specific reference set, context, or prompt, allowing for robust detection across heterogeneous modalities, domains, and scenarios. The ICAD paradigm supports both classical statistical models and modern foundation models (e.g., LLMs, LVLMs), achieving state-of-the-art results for specialized, generalist, and explainable anomaly detection tasks.
1. Core Principles and Formal Definitions
ICAD frameworks define anomaly (or OOD) status by measuring how discrepant a candidate sample is relative to a reference context or set of exemplars. Several mathematical formulations exist:
- Reference-Set Discrepancy: Given a reference set of normal samples from task , and a test instance , the anomaly decision is
where is a modality- or architecture-specific discrepancy/scoring function, and is the indicator function. This schema is adopted in "ICAD-LLM" (Wu et al., 1 Dec 2025).
- Contextual Feature Partitioning: In classical tabular settings, features are split into contextual () and behavioral () subsets. For each point , the context induces a local reference group (e.g., -NN in space). The anomaly score measures how atypical is given (Li et al., 2023, Calikus et al., 2021).
- Knowledge Graph Context: In semantic scenes, object anomaly is defined relative to entity-relation graphs, with context provided by the observed configuration of objects and scenes (Vaska et al., 2022).
- Conformal p-Values: In specialized LLM OOD detection, ICAD is instantiated via inductive conformal prediction, calibrating a non-conformity measure (NCM) on in-domain reference data and computing p-values for new inputs (Gupta et al., 4 Sep 2025).
ICAD consistently frames anomaly as "deviation relative to context/reference set," aligning with both statistical and in-context learning paradigms.
2. Methodologies and Model Instantiations
ICAD supports a rich set of architectures, scoring functions, and model classes. The following table summarizes representative frameworks and their core methodology:
| Method & Reference | Context Source | Anomaly Score Type |
|---|---|---|
| ICAD-LLM (Wu et al., 1 Dec 2025) | Reference set, in-prompt | Cosine dissimilarity in LLM embedding |
| InCTRL (Zhu et al., 11 Mar 2024) | Few-shot normal images | Residual in image/patch/text space |
| Polysemantic Dropout (Gupta et al., 4 Sep 2025) | In-domain calibration data | Dropout-tolerance NCM, conformal p-value |
| IADGPT (Zhao et al., 14 Aug 2025) | Few-shot support images | LLM logits, cross-attention map, free-text |
| Facade (Kantchelian et al., 9 Dec 2024) | Social/network context | Contrastive lift in contextual embedding |
| WisCon (Calikus et al., 2021) | Enumerated feature subsets | Ensemble behavioral anomaly scores |
| QCAD (Li et al., 2023) | k-NN in context features | QRF-based conditional tail anomaly score |
| KG-ICAD (Vaska et al., 2022) | Scene/object co-occurrence | KG link plausibility score |
Approaches range from tree-based statistical models (QRF, iForest, clustering) to deep learning backbones (LLMs, LVLMs, contrastive encoders) and graph embeddings. Scoring functions include nonconformity measures, likelihoods, residuals, contrastive distances, and specially constructed p-values or ensembles.
3. Context Construction, Calibration, and Learning
A central challenge in ICAD is the construction and management of context/reference sets:
- Explicit Reference Sets: Approaches such as ICAD-LLM and InCTRL require normal samples at inference, which are selected via heuristics, random sampling, or retrieved based on similarity to the query (Wu et al., 1 Dec 2025, Zhu et al., 11 Mar 2024).
- Context Partitioning: In tabular and industrial data, features are split into context and behavior, either by domain knowledge or automated enumeration (WisCon) (Calikus et al., 2021, Li et al., 2023).
- Calibration Procedures: Conformal ICAD methods (e.g., Polysemantic Dropout) require calibration nonconformity scores computed on a (held-out) in-domain set; all test scoring is relative to this calibration structure (Gupta et al., 4 Sep 2025).
- Contrastive Learning: In settings with no anomaly labels, contrastive objectives over context-action or context-behavior pairs support score calibration (Facade) (Kantchelian et al., 9 Dec 2024).
Active learning, context pruning, and adaptive weighting of contexts further refine and robustify the constructed ensemble of contextual detectors (WisCon, QCAD).
4. Modalities, Generalization, and Unified Models
Recent advances demonstrate that ICAD is broadly applicable across data modalities and problem settings:
- Time Series, Tabular, Logs: ICAD-LLM, via modality-aware encoding and in-context learning, achieves high F1/AUROC across time series, tabular, and log benchmarks, including strong zero-shot transfer without retraining (Wu et al., 1 Dec 2025).
- Images and Vision-Language: InCTRL and IADGPT extend ICAD to visual anomaly detection, leveraging few-shot image prompts and local/global residual encoding, and enabling unified detection, localization, and reasoning outputs (Zhu et al., 11 Mar 2024, Zhao et al., 14 Aug 2025).
- Knowledge Graphs and Semantic Context: Link-prediction-based ICAD frameworks detect objects anomalous within the context of a given scene or object constellation, achieving up to 99.9% top-3 accuracy on Visual Genome-derived anomalies (Vaska et al., 2022).
- Large Foundation Models: LLMs and LVLMs have been adapted to ICAD via prompt-based in-context learning, few-shot demonstration, and contrastive/LoRA fine-tuning. Chain-of-thought prompting further enhances interpretability in some settings (Jin et al., 24 Jul 2024).
The defining property of modern ICAD models is their ability to process, compare, and adapt across diverse data formats, leveraging rich context with minimal to zero task-specific retraining.
5. Theoretical Guarantees, Metrics, and Empirical Performance
ICAD frameworks enable statistical guarantees, robust calibration, and demonstrable empirical gains:
- Conformal Validity: Formal guarantees on false alarm probabilities are obtained via conformal prediction (p-value uniformity under exchangeability), as in Polysemantic Dropout (Gupta et al., 4 Sep 2025).
- Metrics: Standard evaluation includes AUROC, AUC-PR, F1-score, accuracy, Precision@k, Top-k accuracy, and empirical false-alarm curves. Detection improvements of up to 37% AUROC over baselines have been reported in domain-specialized LLM OOD (Gupta et al., 4 Sep 2025), with ICAD-LLM and InCTRL surpassing or matching task-specific AD in generalist settings (Wu et al., 1 Dec 2025, Zhu et al., 11 Mar 2024).
- Tradeoffs: ICAD methods frequently involve a tradeoff between computational overhead (multi-pass inference, context management) and calibration robustness. Finite calibration, reference set diversity, and model-specific efficiency bottlenecks are recurring concerns.
6. Interpretability, Extensions, and Outstanding Challenges
Interpretability and modular extension are prominent themes in current ICAD research:
- Feature Contribution and Visualization: QCAD and InCTRL expose per-feature or per-region anomaly contributions, supporting explainable AD. Beanplot and attention visualizations enhance transparency (Li et al., 2023, Zhu et al., 11 Mar 2024).
- Ensemble and Modular Design: Adaptive context weighting, pruning, and aggregation are critical for high performance in data-rich and multi-modal environments (WisCon, QCAD, Polysemantic Dropout) (Calikus et al., 2021, Li et al., 2023, Gupta et al., 4 Sep 2025).
- Open Challenges: Issues include automated reference/context selection, scaling to streaming/online scenarios, extending to new modalities (audio, video), and energy-efficient inference. Threshold tuning remains data- and application-specific in absence of error control (outside conformal settings) (Wu et al., 1 Dec 2025).
- Outlook: Ongoing research seeks to further unify ICAD across generative models, temporal/spatiotemporal data, and autonomous/embodied agents. Application-specific extensions target high-stakes domains such as medicine, cybersecurity, law, and finance (Zhao et al., 14 Aug 2025, Kantchelian et al., 9 Dec 2024, Gupta et al., 4 Sep 2025).
7. Representative Experimental Results
Select empirical outcomes highlight the advantages and breadth of the ICAD paradigm:
| Setting | Metric | ICAD Model | Performance | Reference |
|---|---|---|---|---|
| Med-specialized LLM OOD | AUROC | Polysemantic Dropout | Up to 0.91 (+8% over baseline) | (Gupta et al., 4 Sep 2025) |
| Industrial Vision (MVTec-AD) | Image AUC | IADGPT (4-shot) | 97.3 (vs. 97.1/95.8 for baselines) | (Zhao et al., 14 Aug 2025) |
| General Tabular (ADBench 18) | AUROC | ICAD-LLM | 95.13 (OFO best: 94.83) | (Wu et al., 1 Dec 2025) |
| Security/Insider Threat | FPR | Facade | 0.0003% best-case, <0.01% overall | (Kantchelian et al., 9 Dec 2024) |
| Contextual (QCAD on synth.) | PRC-AÌ„ | QCAD | Ranked 1st on 9/10 sets | (Li et al., 2023) |
| KG-ICAD (VG Detector) | Top-3 Acc. | KG-ICAD | 99.9% | (Vaska et al., 2022) |
These results establish ICAD as an empirically validated, theoretically principled, and highly extensible approach for state-of-the-art anomaly detection under diverse data, context, and task conditions.