Correct Answer KB: Storage, Retrieval, Evolution
- CorrectKB is a specialized knowledge base that systematically stores validated diagnostic cases, including structured patient data and chain-of-thought summaries.
- The system employs cosine similarity for retrieval, augmenting current multi-agent consultations with contextually relevant, confirmed past cases.
- Integration of CorrectKB within a multi-agent, consensus-driven framework boosts diagnostic accuracy and supports continuous self-evolution in AI reasoning.
A Correct Answer Knowledge Base (CorrectKB) is a specialized knowledge repository designed to systematically store, retrieve, and operationalize verified correct answers derived from domain-specific problem-solving or consultation sessions. CorrectKB is architected to ensure that high-quality, validated reasoning and outcomes inform and improve future automated and collaborative decision-making processes. In the context of advanced AI frameworks such as MDTeamGPT, CorrectKB is tightly integrated within a multi-agent, multi-round consultation and self-evolving architecture, supporting ongoing improvement in accuracy and rationality of machine-augmented reasoning (Chen et al., 18 Mar 2025).
1. Core Function: Storage and Structuring of Correct Diagnostic Experiences
CorrectKB acts primarily as an archive of successful diagnostic cases, each consisting of the original question, patient background information, the final validated answer, and a machine- and human-readable summary of the chain-of-thought (denoted as S_final⁴ in the system). Storage into CorrectKB occurs only after the multi-disciplinary discussion has concluded and the outcome is validated by a Safety and Ethics Reviewer. The reasoning trace is extracted by the Chain-of-Thought Reviewer and is recorded alongside key case features in a structured format:
1 2 3 4 5 |
{
"Question": <...>,
"Answer": <...>,
"Summary of S_final⁴": <...>
} |
This selective archival ensures the knowledge base contains only outcome-verified, consensus-backed, and rational explanatory traces, supporting trust in subsequent retrieval and reuse.
2. Retrieval and Embedding for Prompt Augmentation
When faced with a new consultation, CorrectKB is queried to retrieve the most relevant previous cases. The patient’s background and question are transformed into embedding representations using a high-dimensional model such as text-embedding-3-small. Cosine similarity search is performed between the embedded query and all entries in CorrectKB, retrieving the top-K most similar validated cases. These retrieved cases are then packaged as a prompt augmentation to be supplied to the specialist doctor agents engaged in the current multi-agent reasoning session.
This retrieval mechanism leverages both the content and the structured chain-of-thought histories within each CorrectKB entry, effectively biasing the system towards reasoning paths and solutions that have been validated in relevant, contextually similar previous scenarios.
3. Integration with Multi-Agent Consensus and Evolution Process
CorrectKB is an essential element in the consultative pipeline post-consensus aggregation. After each consultation cycle:
- The Lead Physician collects agent responses and extracts four consensus categories: Consistency, Conflict, Independence, and Integration.
- The final curated outputs (typically S_final⁴ and S_{i-1}⁴, the last two rounds’ processed summaries) are used to limit the cognitive burden for the next round.
- Upon a correct diagnosis (as per the Safety and Ethics Reviewer), the case’s key data and reasoning are stored in CorrectKB.
During subsequent consultations, CorrectKB augments the ongoing discussion through prompt enrichment—providing agents with targeted, relevant prior reasoning chains, thereby supporting more efficient convergence to correct and rational outcomes.
4. Algorithmic Operationalization and System Self-Evolution
The formal integration of CorrectKB within the system’s summary and evolution stage is captured as follows:
- If the current round’s diagnosis () is validated, the tuple is appended to CorrectKB.
- On the next case, cosine similarity is computed between the current patient background () and all CorrectKB (and ChainKB) records.
- The top-K matches ( in experiments) are retrieved; their reasoning summaries are used to enhance the consultation prompt (), directly feeding into the system’s prompt-engineering pipeline for all specialist agents.
This retrieval-enhanced prompting is formalized as:
5. Role in Self-Evolution, Generalization, and Rationality
By accumulating only safety-validated, correct reasoning traces, CorrectKB provides a feedback mechanism central to the framework’s self-evolution capability:
- CorrectKB helps the system quickly converge to consensus by surfacing relevant reasoning paths, minimizing repeated errors, and compressing the search space.
- The archive is leveraged not only for exact match scenarios but also for analogical reasoning in novel cases.
- Experimental results indicate that across both MedQA and PubMedQA, enabling CorrectKB and prompt augmentation yields substantial accuracy improvements, supporting effective cross-dataset generalization (e.g., a 3.6% accuracy increase on MedQA when using KBs derived from PubMedQA and vice versa).
6. Interaction with Chain-of-Thought Knowledge Base and Residual Discussion Structure
CorrectKB operates in tandem with ChainKB (which archives learning from error cases) and is synergistically combined with a residual discussion structure—wherein only carefully curated summaries from the most recent rounds are included in the consultation context, mitigating cognitive overload. The Lead Physician’s structured aggregation ensures that only salient, consensus, and conflict information is retained, further refining the input to the agents.
By incorporating CorrectKB within this architecture, the system achieves:
- High efficiency, avoiding information pollution by summarizing and focusing prompts.
- Improved diagnosis rationality, as specialists are guided by structured historic reasoning.
- Progressive self-evolution, as successful reasoning processes are increasingly represented in the knowledge base.
7. Empirical Validation and System Impact
Ablation studies and cross-dataset experiments demonstrate that CorrectKB is a critical factor for improved diagnostic outcomes on multi-round, multi-specialist agent tasks. For example, with the full system—residual discussion, lead aggregation, and both CorrectKB and ChainKB enabled—MDTeamGPT achieved almost state-of-the-art accuracy (90.1% on MedQA, 83.9% on PubMedQA). Cross-dataset transfer further confirms that CorrectKB allows for generalizable, reusable, and contextually robust reasoning templates.
CorrectKB, as instantiated and validated in MDTeamGPT, constitutes an integrated, evolving, and accuracy-centric repository of validated question-answer-reasoning tuples designed to guide, regularize, and enrich multi-agent machine reasoning in complex, high-stakes domains. Its operational role—encompassing storage, efficient retrieval, prompt augmentation, and self-evolution—is substantiated by both formal algorithmic constructs and empirical results (Chen et al., 18 Mar 2025).