Cognitive Surgery (CoSur)
- Cognitive Surgery (CoSur) is a paradigm that integrates real-time biosignal monitoring, semantic intent analysis, and adaptive robotic support to enhance surgical operations.
- It employs multimodal data like EEG, fNIRS, and eye tracking with deep learning models to accurately infer cognitive workload and trigger context-sensitive adjustments.
- The approach improves intraoperative decision-making and team coordination, optimizing surgical outcomes through a closed-loop system of human-AI collaboration.
Cognitive Surgery (CoSur) is a paradigm in surgical intelligence that unites real-time monitoring, multimodal reasoning, and adaptive, autonomous assistance to optimize surgical and cognitive performance. Unlike conventional robotic or AI systems that focus solely on action recognition or technical automation, CoSur fundamentally integrates the surgeon’s mental state, physiological stressors, and semantic intent with the orchestration of surgical actions, decision support, and team coordination. The foundational vision of CoSur is an operating room in which the surgeon’s cognitive workload (CWL) and intent are continuously inferred from biosignals and behavioral data, enabling the surgical environment—including robots, displays, and information systems—to adapt dynamically and supportively to cognitive demands, thereby improving patient safety and surgical outcomes (Jin et al., 2022).
1. Foundational Concepts and Definitions
Cognitive Surgery is predicated on the hypothesis that intraoperative performance and patient safety can be maximized by embedding real-time cognitive state monitoring and context-aware adaptive assistance within the surgical workflow (Jin et al., 2022). CoSur distinguishes itself from standard robotic surgery through:
- Continuous CWL Monitoring: Biosensed signals—such as EEG, fNIRS, and eye tracking—are utilized to infer cognitive workload and identify periods of potential overload, which can trigger intraoperative adaptations.
- Hierarchical Decision Augmentation: Advanced AI models mimic or supplement the surgeon’s perception, interpretation, and planning by fusing multimodal sensory data and deploying chain-of-thought or context-dependent reasoning (Wang et al., 22 Apr 2026, Low et al., 13 Mar 2025).
- Real-Time Context-Sensitive Adaptation: The surgical suite may dynamically adjust its user interfaces, robotic autonomy, haptic feedback, or display overlays based on the inferred cognitive state.
The key objective of CoSur is to create a closed-loop system in which surgical robots, information displays, and decision-making aids function as genuine cognitive collaborators, not mere reactive executors (Jin et al., 2022, Wang et al., 21 May 2026).
2. Multimodal Monitoring and Cognitive Workload Inference
A central component of CoSur is the acquisition, fusion, and interpretation of physiological and behavioral data that reflect the surgeon’s cognitive state. Jin et al. (Jin et al., 2022) demonstrate a two-stage approach using multistream signals:
- Signal Acquisition: EEG (32 channels), fNIRS (22 channels over prefrontal cortex), and eye pupil diameter are recorded, time-synchronized at high frequency (EEG/fNIRS at 1 kHz, PE at 120 Hz), and downsampled after artifact correction.
- Preprocessing Pipelines: EEG is high-pass filtered (0.5 Hz), fNIRS is normalized to baseline, and pupil data is median-filtered and standardized.
- Feature Construction: Each signal is windowed and concatenated to form a composite tensor (e.g., ) that encodes both temporal and spatial channel interactions.
A cascade of deep learning stages then detects CWL: first, transfer learning with AlexNet on time–frequency scalograms flags any elevation (binary detection), followed by a 1D-CNN that classifies the degree of workload associated with specific task conditions. Classifier performance reaches 93% test accuracy across workload levels, with precision up to 0.99 for binary detection (Jin et al., 2022). By embedding such workload detectors into the OR, CoSur systems can trigger context-sensitive assistive actions instantaneously when cognitive overload is detected.
3. Cognitive Reasoning, Intent Recognition, and Human–AI Collaboration
CoSur extends beyond passive workload recognition by enabling semantic understanding of surgical intent and chain-of-thought reasoning in AI copilot systems. Notable frameworks include:
- Vision–Language–Action (VLA) Models: These architectures fuse encoded endoscopic video (with uncertainty modeling), parsed natural language prompts conveying surgeon intent, and explicit logical reasoning modules. The reasoning module performs multi-step deduction to infer both low-level robotic motion goals and hidden tissue states (Wang et al., 21 May 2026).
- Reasoning Pipeline: Perceptual features and language are fused, followed by graph or attention-based inference to determine optimal actions. Policy networks or RL-based experts then translate inferred goals into robot commands within safety-constrained envelopes.
- Cognitive Collaboration: In this setting, the surgeon retains task-level authority, with the AI copilot offering maneuver suggestions, intent inference, and decision-support overlays. Cognitive load metrics (e.g., NASA-TLX) are evaluated in prospective user studies to assess the impact of reasoning-driven autonomy (Wang et al., 21 May 2026, Sharma et al., 3 Aug 2025).
Adaptive control strategies blend human and autonomous actions proportionally to inferred workload, e.g., , where is a function of the real-time cognitive workload index (Sharma et al., 3 Aug 2025).
4. Chain-of-Thought, Spatiotemporal Reasoning, and Multi-Agent Architectures
CoSur frameworks leverage multi-stage reasoning and collaborative AI agents to decompose and interpret complex intraoperative events:
- Chain-of-Thought (CoT) Benchmarks and Protocols: SurgCoT (Wang et al., 22 Apr 2026) provides a structured five-tuple annotation (Q, O, K, C, A) for surgical video reasoning, enabling the evaluation of AI systems against expert-level multi-step reasoning tasks (e.g., Causal Action Ordering, Affordance Mapping, Anomaly Tracking).
- Hierarchical, Multi-Agent Workflows: SurgRAW (Low et al., 13 Mar 2025) organizes agents into visual-semantic and cognitive-inference specialists, employing task-specific CoT prompts and a panel-discussion module to enforce consistency and mitigate hallucinations. Integrating Retrieval-Augmented Generation further closes the domain knowledge gap and structurally grounds reasoning.
- Performance Metrics: Multi-agent CoT-RAG frameworks have shown up to 29.32% accuracy improvement in surgical scene understanding over baseline VLMs, with particularly strong gains in patient data extraction and action prediction (Low et al., 13 Mar 2025).
Spatiotemporal reasoning, multi-stage question scaffolding, and explicit evidence mining form the backbone of trustworthy, auditable CoSur decision support (Wang et al., 22 Apr 2026, Low et al., 13 Mar 2025).
5. Application Domains: Skill Assessment, Training, and Real-Time Intraoperative Support
CoSur systems have been deployed and validated in diverse domains:
- Cognitive-Motor Skill Assessment: Integrative DNNs fusing video-based motor features with fNIRS-derived neural activation accurately classify surgeon expertise and predict performance in laparoscopic tasks, surpassing single-modality metrics (e.g., vs. ) (Yanik et al., 2024).
- Real-Time Cognitive Load Biomarkers: EEG-based theta power and VC9 biomarkers enable continuous workload monitoring during surgical simulation, correlating with performance metrics and facilitating adaptive training feedback. For instance, VC9 decrease with improved accuracy exhibited strong correlations (e.g., , ) (Bez et al., 2020).
- Simulated Team Training: Agent-driven sandboxes (SurgBox) replicate OR teams with role-specific LLM actors, retrieval-augmented knowledge, and a Surgery Copilot equipped with long-short memory mechanisms. Deliberate practice in such environments demonstrates reduced cognitive load and increased decision-making accuracy (NASA-TLX drop from 65 to 53/100, surgical plan accuracy up to 88%) (Wu et al., 2024).
Intraoperative deployment enables dynamic team-level adaptation—dimming video feeds, adjusting robotic compliance, triggering overlays, or offloading routine decisions to cognitive agents—whenever workload thresholds are breached (Jin et al., 2022, Wu et al., 2024).
6. Technical Limitations, Safety, and Prospective Directions
Current CoSur systems face several technical and deployment challenges:
- Signal Robustness and Generalizability: Multimodal biosignal inference (EEG, fNIRS, eye tracking) must overcome artifact contamination, inter-user variability, and domain adaptation for real-world OR conditions (Bez et al., 2020, Jin et al., 2022).
- Reasoning Depth vs. Latency: Deep multi-stage reasoning introduces inference delay; balancing real-time responsiveness with reasoning fidelity is an open constraint (Wang et al., 21 May 2026, Low et al., 13 Mar 2025).
- Reward Engineering and Domain Coverage: Multi-agent RL approaches require dense, hand-tuned reward schemes that may not generalize. Most experimental validations are limited to benchtop, simulation, or restricted clinical specialties (Scheikl et al., 2021, Qin et al., 25 Feb 2026).
- Safety, Explainability, and Regulatory Pathways: Auditable chain-of-thought protocols, RLHF-optimized rationales, and consistency-check systems address clinical safety, but formal regulatory validation and prospective clinical trials are limited to date (Qin et al., 25 Feb 2026, Wu et al., 2024).
- Scalability and Extensibility: Modular APIs, adaptive function catalogs, and continual-learning LLM copilot architectures are advocated for rapid translation to new procedures, specialties, and real OR environments (Chen et al., 2024, Wu et al., 2024).
Ongoing research priorities include expansion to new surgical domains, integration of additional modalities (audio, haptics, kinematics), end-to-end human-in-the-loop reward models, and clinical outcome studies.
7. Synthesis and Future Perspective
Cognitive Surgery envisions a future where intraoperative care is defined by transparent, explainable, and robust human–AI collaboration. The core architectural and methodological innovations—real-time biosignal-driven workload inference, semantic reasoning over visual and linguistic data, multi-agent task decomposition, retrieval-augmented domain grounding, and closed-loop adaptation—collectively establish CoSur as a central pillar of next-generation surgical intelligence (Jin et al., 2022, Wang et al., 22 Apr 2026, Low et al., 13 Mar 2025). While the present literature demonstrates substantial gains over prior approaches in both technical and practical metrics, broadening clinical validation and ensuring real-time, fail-safe deployment remain at the forefront of translational efforts.