Case-Based Reasoning (CBR) Overview

Updated 27 August 2025

Case-Based Reasoning (CBR) is an AI approach that solves new problems by retrieving and adapting solutions from similar, previously solved cases stored in a case library.
It employs an iterative cycle of retrieval, reuse, revise, and retain, with enhancements like fuzzy logic and regression-based weighting to handle imprecision and uncertainty.
Applications span medical diagnosis, industrial fault detection, legal argumentation, and deep learning integration, demonstrating its robust adaptability and explainability.

Case-Based Reasoning (CBR) is an artificial intelligence methodology in which solutions to new problems are derived by adapting solutions from similar, previously solved problems (cases) stored in a dedicated case library. Unlike rule-based or purely statistical approaches, CBR operates by retrieving cases that are most similar to a given target problem, reusing or adapting their solutions, revising the proposed answer as needed, and retaining the new solution and its context in the case base for future reference. CBR is fundamental in many domains, especially where explicit domain knowledge is incomplete, rapidly evolving, or inherently experiential.

1. Formal Principles and CBR Cycle

Case-Based Reasoning is characterized by the iterative execution of four principal steps:

Retrieve: Select the most similar historical cases (source cases) with respect to the current problem (target case), typically employing a similarity function $S(\cdot, \cdot)$ defined over appropriate case descriptors or features.
Reuse: Adapt the solution or parts of the solution from the retrieved case(s) to address differences between the new and existing problems.
Revise: Evaluate and, if necessary, iteratively refine the proposed solution (often via simulation, domain rules, or expert override).
Retain: Save the new case and its (evaluated) solution in the library, ensuring the system’s knowledge base grows through experience (Voskoglou et al., 2014, Oliveira et al., 2020).

Mathematically, the retrieval step often maximizes a similarity function,

$\text{best match} = \operatorname{argmax}_{c_i \in \text{CaseBase}}\, S(c_i, t),$

where $c_i$ is a stored case and $t$ is the representation of the target problem.

2. Similarity Evaluation and Knowledge Imperfection Handling

Classical CBR systems compute overall similarity by aggregating local similarity functions $\Phi_j$ over each descriptor $d_j$ : $\mathrm{MR} = \prod_{i=1}^m \left[ \Phi_\text{Value}(d_i) \cdot \Phi_\text{State}(d_i) \cdot \Phi_\text{Presence}(d_i) \cdot \Phi_\text{OM}(d_i) \right],$ where each $\Phi$ term captures a distinct aspect of similarity (value, component state, presence, operating mode) (Bitar et al., 2012).

The robustness of similarity evaluation in real-world CBR systems requires explicit management of knowledge imperfections:

Imprecision: Fuzzy logic and possibility theory are used to model and correct descriptor values. For example, temperature sensor readings are assigned fuzzy membership degrees: $\mu(x) = \begin{cases} 1, & x = 80 \ 0, & |x-80| \geq 20 \ p(x), & |x-80| < 20 \end{cases}$ where $p(x)$ is piecewise linear, and the imprecise target value may be corrected to the representative fuzzy set centroid.
Uncertainty: Descriptors with uncertain values are omitted from similarity aggregation, preventing unreliable data from biasing retrieval.
Incompleteness: Adaptation measures refine the candidate set by weighting each descriptor according to domain-specific importance, such as abnormal versus normal component modes ( $w_i$ ), ensuring that critical features disproportionately influence the selection (Bitar et al., 2012).

3. Hybrid and Data-driven Extensions

Recent CBR methods extend the classic paradigm by integrating statistical and data-driven approaches to enhance similarity, adaptability, and interpretability:

Logistic Regression Weighting: Attributes and cases are weighted using regression-derived significance measures—e.g., attribute weights $\omega_a$ are proportional to the Wald statistic, and case weights $\omega_p$ derive from model residuals. The KNN fusion is then: $s_p = \frac{\sum_{p'} \omega_{p'} d(p, p')^{-1} y_{p'}}{\sum_{p'} \omega_{p'} d(p, p')^{-1}}$ where $d(p,p')$ is the LR-weighted distance (Campillo-Gimenez et al., 2013). This enhances robustness against irrelevant or noisy attributes in clinical decision support.
Evolutionary and Automated Approaches: The local and global similarity function parameters can be automatically tuned via evolutionary computation such as Particle Swarm Optimization (PSO), optimizing both discriminative power and the interpretability of attributions. Posterior probabilities for class membership are estimated by softmax-weighted voting among retrieved neighbors, facilitating risk scoring and explainable prediction (Li et al., 2021).
Integration with Deep Learning and LLMs: CBR is now deployed as explicit persistent memory for LLMs, with vectorized case retrieval conducted via deep-learned embeddings and approximate nearest neighbor (ANN) methods. Similarity is computed, e.g., by cosine similarity in embedding space,

$S(x,y) = \frac{x \cdot y}{\|x\|\,\|y\|}$

and retrieved cases supply context for further neural adaptation or prompt engineering (Watson, 2023, Marom, 9 Jan 2025).

4. CBR in Argumentation and Explainable Reasoning

CBR has strong ties with computational argumentation and legal AI. In Abstract Argumentation-based CBR (AA-CBR), each case is modeled as an argument that may attack (contradict) or support others according to specificity and outcome (Paulino-Passos et al., 2020, Gould et al., 7 Jul 2025):

Attack relations capture when one case (argument) is more specific than another and has a conflicting outcome.
Support relations (in sAA-CBR) permit cases to reinforce others with the same outcome, preventing pathological "spikes" (extraneous cases not included in debates) and increasing interpretability.

Accepted arguments are computed using grounded extension semantics (iterated defense) as per Dung’s framework; classification is determined by whether the default/majority argument survives the argumentative process.

Model properties include:

Cautious Monotonicity: Guaranteeing that adding self-confirming cases won’t invalidate prior inferences (Paulino-Passos et al., 2020).
Spike-freeness: Ensuring all cases contribute to the reasoning process by means of supporting relations (Gould et al., 7 Jul 2025).

5. Application Domains

CBR is applied in a wide array of domains where analogical or experiential knowledge is paramount:

Medical Diagnosis and Decision Support: Systems leverage existing patient records to guide diagnostic and treatment decisions, with hybrid retrieval (attribute/case weighting) yielding performance robust to data noise and feature irrelevance (Campillo-Gimenez et al., 2013, Voskoglou et al., 2014).
Industrial Fault Diagnosis: Fuzzy and adaptive similarity functions permit correct identification even under imprecise, incomplete, or uncertain sensor readings (Bitar et al., 2012).
Natural Language Processing and QA: Structured CBR systems (e.g., with MultiNet graphs) supply candidate answer validation and reranking; retrieval-augmented models now employ graph edit measures integrated with learning-to-rank (Weis, 2015, Das et al., 2021).
Knowledge Graph Reasoning: Probabilistic CBR techniques for open-world KG completion employ path-based similarity and dynamic clustering, supporting both interpretability and online adaptation (Das et al., 2020).
Text Generation: CBR generates structured natural language such as weather forecasts or test scripts by retrieving, adapting, and revising templates corresponding to past, similar data instances (Adeyanju, 2015, Guo et al., 26 Mar 2025).
Financial Risk and Drug-Drug Interaction Prediction: Fully automated, explainable CBR pipelines provide economic justification and mechanistic interpretation for credit and pharmaceutical predictions; CBR-enhanced LLM frameworks further enable analogical case reasoning for predictive tasks (Li et al., 2021, Liu et al., 29 May 2025).

6. Methodological Advances and Modeling Guidance

Modern CBR modeling benefits from:

Two-layer modeling methodologies: Separation between abstract problem representation (objectives, attributes, tolerances, context) and concrete system implementation (case structuring, similarity measures, evaluation functions) (Oliveira et al., 2020).
Multimodal Retrieval-Augmented Generation: General frameworks (e.g., MCBR-RAG) extend CBR to multimodal domains by learning both textual and latent (embedding) representations for retrieval and adaptation, empirically improving accuracy in structured domains such as board games and mathematical expression problems (Marom, 9 Jan 2025).
Reinforcement and Learning-based Optimization: Combined supervised and reinforcement learning stages have been deployed to align case-based retrieval and script generation with production requirements, notably by fine-tuning retrievers with contrastive objectives and reward-based LLM adaptation (Guo et al., 26 Mar 2025).
Persistent Memory for LLM Agents: Persistent, vector-based CBR memory enables LLMs to maintain context, reduce hallucinations, and support autonomously evolving reasoning systems; this synergy is relevant for artificial general intelligence (Watson, 2023, Hatalis et al., 9 Apr 2025).

7. Strengths, Limitations, and Research Directions

CBR exhibits notable strengths in interpretability, continual learning, and domain adaptability:

Decision processes are explainable through explicit matching with prior cases and feature attributions.
Learning is incremental (retention of new cases) and amenable to on-the-fly adaptation without retraining.
Hybridization with other learning approaches (logic, neural networks, fuzzy and probabilistic models) enables CBR to tackle uncertainty, incomplete knowledge, and dynamic data distributions (Voskoglou et al., 2014, Li et al., 2021, Hatalis et al., 9 Apr 2025).

However, CBR's dependence on the quality, coverage, and curation of the case base can limit generalization and may lead to anecdotal biases if the repository is not sufficiently representative (Voskoglou et al., 2014, Weis, 2015). Advances in automated case weighting, clustering, argumentation-based filtering (for “surprising” cases), and representative sampling are crucial for improving scalability and reliability (Paulino-Passos et al., 2020, Liu et al., 29 May 2025).

Ongoing research trends include:

Deeper integration with continuous and multimodal representations for retrieval and adaptation (Marom, 9 Jan 2025).
Modeling of cognitive features such as curiosity, introspection, and self-reflection in CBR-enabled autonomous agents (Hatalis et al., 9 Apr 2025).
Exploration of quantum computing methods (qCBR) for enhanced scalability and nonclassical retrieval/adaptation (Atchade-Adelomou et al., 2021).
Embedding CBR as an explicit, explainable, and updateable memory alongside generative models for AGI (Watson, 2023).

CBR thus persists as a robust, explainable, and versatile approach to problem-solving, increasingly central in contemporary neuro-symbolic and autonomous AI systems.