Papers
Topics
Authors
Recent
2000 character limit reached

CA-GPT: Domain-Attuned Generative AI

Updated 18 December 2025
  • CA-GPT is a family of domain-attuned generative AI systems that combine LLMs with context-specific workflows for educational assessment and clinical decision support.
  • It leverages prompt engineering, automated validation, and retrieval methods to enforce strict compliance with accreditation standards and clinical guidelines.
  • Quantitative results demonstrate CA-GPT’s superior performance in exam generation and OCT analysis, reducing operator variability and improving outcome standardization.

CA-GPT denotes a family of domain-attuned generative AI systems that integrate LLMs with context-specific workflows for decision support and assessment generation. Two principal instantiations are documented in the literature: (1) in educational assessment engineering for accreditation compliance (Aboalela, 2023), and (2) as a clinical decision-support engine for intravascular imaging interpretation, particularly optical coherence tomography (OCT) in percutaneous coronary intervention (PCI) (Fang et al., 11 Dec 2025). Both leverage prompt engineering, automated validation, and domain-anchored retrieval or mapping to align AI output with strict professional standards or outcomes.

1. System Architectures and Integration

1.1 Educational Accreditation-Compliant Assessment Generation

The assessment-side CA-GPT system is structured to automate the generation of exam questions aligned with stipulated accreditation standards (e.g., ABET, NCAAA):

  • Data ingestion: Incorporates curriculum databases (syllabi, lecture notes), accreditation profiles (ABET SO₁–SO₆, NCAAA Knowledge/Skills/Values domains), and a Bloom taxonomy verb library (six hierarchical levels).
  • Pre-processing: Instructors specify the course, topic, and target outcomes; the system retrieves allowed action verbs per mapped accreditation criteria.
  • Prompt construction: Templates enforce topical and verb constraints: “Please generate N questions that use one of the verbs {v₁,…,v_k} to assess [ABET SO₂.2, NCAAA Skills].”
  • Generation phase: ChatGPT API returns question batches.
  • Post-processing/validation: Action verbs are extracted via NLP; a scoring function computes verb-to-outcome alignment and triggers automated revision if thresholds are unmet.
  • Faculty review: Valid output is submitted for approval or minor manual editing (Aboalela, 2023).

1.2 Clinical AI-OCT Decision Support

The clinical CA-GPT instantiation is architected as a layered, modular system:

  • Small-model layer (“AI-OCT Core”): 13 CNN/U-Net modules (3–5M parameters each) execute OCT image analysis (segmentation, classification, calcium scoring).
  • Large-model layer (“CA-GPT Decision Engine”): Foundation is DeepSeek-R1 (14B parameters) with injected LoRA-style domain adapters (~10M task-specific parameters).
  • RAG pipeline: Quantitative outputs from small models are structured, merged with top-k (k=5) retrieved guidelines/cases, and input to CA-GPT for decision generation.
  • End-to-end flow: Raw OCT → Small-model extraction → RAG retrieval → CA-GPT inference → output for pre/post-PCI planning and assessment (Fang et al., 11 Dec 2025).

2. Domain Mapping and Control Methods

2.1 Verb-to-Outcome Mapping in Education

A formal mapping constrains question actions to guarantee accreditation validity:

  • Mapping function Vmap ⁣:VOVmap \colon V \rightarrow O: Each verb vv from Bloom’s taxonomy is assigned to a unique outcome oo.
  • Outcome verb sets V1(o)V^{-1}(o): For each outcome oo, the associated permitted verbs.
  • Validation metric: The “hit-rate”:

score(q,o)=verbs(q)V1(o)verbs(q)\text{score}(q, o) = \frac{\lvert \text{verbs}(q) \cap V^{-1}(o) \rvert}{\lvert \text{verbs}(q) \rvert}

  • Acceptance threshold: Typical τ=1.0\tau = 1.0 for strict matches; otherwise, τ=0.8\tau = 0.8 for multi-verb questions. Only when score(q,o)τ\text{score}(q, o) \geq \tau is a candidate accepted (Aboalela, 2023).

2.2 RAG and Parameter Control in Medicine

  • Embedded feature space: OCT outputs are vectorized and coupled to retrieved evidence.
  • “Retrieve–reason–generate” paradigm: Structured prompt concatenates evidence, parameters, and explicit clinical roles for the model.
  • Adapters and modularity: Domain knowledge is injected via adapter layers, maintaining foundational LLM consistency while imposing specialized decision logic for PCI stages (Fang et al., 11 Dec 2025).

3. Prompt Engineering Strategies

3.1 Structured Templates for Assessment

  • Full generation: Prompts specify topic, outcomes, and verb constraints, e.g., “Each question must begin with one verb from the list, and must explicitly assess students’ ability to implement HTML table code.”
  • Editing/validation: Reviews specific drafts, requiring verb compliance and offering substitutions when main verb validity fails (Aboalela, 2023).

3.2 Clinical System Prompts

  • Guideline-guided prompts: Combine retrieved passages, extracted physiology, and explicit operating instructions (e.g., “Act as interventional cardiologist…”).
  • Evidence reinforcement: Each CA-GPT decision is anchored by retrieved clinical precedent and consensus, serving as both a control and a documentation trail (Fang et al., 11 Dec 2025).

4. Evaluation Metrics and Quantitative Results

4.1 Faculty Acceptance in Assessment Engineering

  • Participants: 120 faculty members polled across various Saudi universities.
  • Outcomes:
    • Support for full AI exam generation: 85%
    • Support for AI-assisted editing/correction: 98%
  • Statistical notes: Results are descriptive; margin of error at 95% confidence is ±8.8% (Aboalela, 2023).

4.2 Clinical Decision Agreement Metrics

Phase CA-GPT ChatGPT-5 Junior MDs P (overall) CA-GPT vs. ChatGPT-5 CA-GPT vs. Juniors
Pre-PCI 5 [3.75–5] 3 [2–4] 4 [3–4] <0.001 <0.001 <0.001
Post-PCI 5 [4.75–5] 4 [4–5] 5 [4–5] <0.001 <0.001 0.015

Metric highlights:

  • Stent diameter selection: CA-GPT 90.3%, ChatGPT-5 63.9%, Junior MDs 72.2%
  • Stent length: CA-GPT 80.6%, ChatGPT-5 54.2%, Junior MDs 52.8%
  • Stent expansion (post-PCI): CA-GPT 78.4%, ChatGPT-5 33.0%, Junior MDs 84.1%
  • Stent apposition: CA-GPT 93.2%, ChatGPT-5 88.6%, Junior MDs 76.1%
  • Subgroup: Most pronounced advantages seen in complex vessel (LCx/RCA), low OCT-FFR, ACS presentation, and mild calcification scenarios (Fang et al., 11 Dec 2025).

5. Workflow, Validation, and Feedback Loops

5.1 Educational Generation-Validation Loop

  • Cycle: generate → NLP validation → scoring → (if needed) “revise to match” → faculty review.
  • Emphasis: Integration into LMS, logkeeping for audit, and explicit mapping to accreditation outcomes.
  • Guidance: Stronger faculty trust when AI supports rather than replaces manual question writing, despite high acceptance for both uses (Aboalela, 2023).

5.2 Clinical Modular Pipeline

  • Processing: Raw data is systematically transformed (image analysis → parameter extraction → text retrieval → LLM inference).
  • Continuous learning: Adapter fine-tuning on new reports; explicit phase-wise metrics to monitor agreement with expert consensus.
  • Evidence traceability: Every advice generation step is grounded in retrievable documentation, both for transparency and audit (Fang et al., 11 Dec 2025).

6. Limitations and Best Practices

6.1 Educational Use

  • Verb control is essential: Prevents misalignment with accreditation metrics.
  • Prompt specificity: Mandatory inclusion of outcome identifiers.
  • Recommended logging: Verb mappings and AI review steps must be documented; LMS integration is crucial for workflow adoption (Aboalela, 2023).

6.2 Medical Deployment

  • Single-center scope: Findings from retrospective studies with proprietary hardware (Vivolight P80) may not generalize.
  • Data domain bias: Knowledge cutoff (October 2023), regional/linguistic composition of datasets, and system modularity may impact cross-platform portability and up-to-dateness.
  • Long-term impact: No reported MACE/mortality endpoints to date; improvements are quantitative (agreement, standardization) rather than direct outcome-based (Fang et al., 11 Dec 2025).

7. Context and Implications

CA-GPT systems represent a new class of vertically specialized generative AI, characterized by the integration of strict domain-mapping (verb-outcome or parameter-guideline), tightly constructed prompts, automated validation, and human-in-the-loop review. Their adoption is driven by the need for rigorous output alignment—whether to accreditation standards in education or clinical guidelines and expert consensus in medicine. The dual-domain evidence demonstrates the effectiveness of this paradigm in reducing operator-dependent variability, increasing the validity of automated output, and elevating the overall standardization of high-stakes decision-making or assessment tasks (Aboalela, 2023, Fang et al., 11 Dec 2025).

Whiteboard

Follow Topic

Get notified by email when new papers are published related to CA-GPT.