Papers
Topics
Authors
Recent
Search
2000 character limit reached

NOTSS: Non-Technical Skills for Surgeons

Updated 22 January 2026
  • NOTSS is a structured framework outlining essential behavioral, cognitive, and interpersonal skills necessary for surgical safety and effective teamwork.
  • Recent advances integrate computer vision, AI, and LLM analytics to quantitatively assess skills like situation awareness and decision making using kinematic and linguistic data.
  • Simulation platforms, such as VORTeX, leverage VR and LLM-driven feedback to enhance training, real-time monitoring, and formative debriefing in surgical education.

Non-Technical Skills for Surgeons (NOTSS) comprise a structured taxonomy and assessment framework for those behavioral, cognitive, and interpersonal competencies essential to surgical safety and team performance that are not strictly technical. Developed over the past two decades, NOTSS has become the reference framework for both research and training in surgical teamwork, leadership, situation awareness, decision making, and communication. Recently, computer vision (CV), AI, and LLM-based analytics have enabled quantitative and scalable measurement of NOTSS elements using both kinematic and linguistic data sources, with particular activity in specialities such as cardiothoracic and laparoscopic surgery (Constable et al., 2024, &&&1&&&).

1. Taxonomy and Components of NOTSS

The NOTSS framework, as adopted in current surgical safety research, consists of five principal categories (Constable et al., 2024):

  • Situation Awareness: Monitoring the operating environment, anticipating future states, recognizing and responding to cues. Supported by sub-skills such as anticipatory ability and cognitive flexibility.
  • Decision Making: Selecting and executing an appropriate course of action under time pressure or uncertainty. Key subcomponents include mental readiness and action planning.
  • Communication and Interaction: Clarity of speech, closed-loop communication, checking shared understandings.
  • Co-operation and Team Skills: Sharing information, mutual monitoring, supporting and backup behaviors.
  • Leadership and Managerial Skills: Setting and maintaining standards, resource management, assertiveness, and conflict resolution.

The NOTECHS scale, a sister instrument, extends the same principles and has been validated as a behavioral marker set [71, 72 in (Constable et al., 2024)].

Within simulation and training environments, current practice also identifies specific subskills (e.g., use of communicative gestures for structuring team action) as targets for assessment and feedback (Constable et al., 2024, Barker et al., 19 Jan 2026).

2. Assessment Methodologies: Classical and Algorithmic

2.1 Human-Rated NOTSS Scoring

Historically, NOTSS assessment has relied on direct observation or video review by trained clinical raters, using standardized behavioral marker forms. Reliability is quantified via inter-rater statistics, typically the Intraclass Correlation Coefficient (ICC) or Cohen’s κ:

ICC=MSBMSWMSB+(k1)MSWICC = \frac{MS_B - MS_W}{MS_B + (k-1)MS_W}

where MSBMS_B is between-videos mean square, MSWMS_W is within-videos mean square, kk is the number of raters (Constable et al., 2024).

Yule et al. [72 in (Constable et al., 2024)] demonstrated the construct and criterion validity of human-rated NOTSS, reporting ICCs “in the good to excellent range,” that is, ICC ≥ 0.75.

2.2 Computer Vision and AI Approaches

Recent work proposes a range of CV/AI model families for objective, automated NOTSS scoring. These may be grouped as follows (Constable et al., 2024):

Pose Estimation Frameworks:

  • Markerless 2D/3D pose estimation using deep networks: OpenPose/Part-Affinity Fields, DeeperCut, DeepLabCut (supporting multi-person and multi-instrument tracking).
  • RGB-D (depth sensor) systems for coarse 3D head and upper-body tracking.
  • Hybrid models combining pose skeletons with inertial sensor data under gloves.

Kinematic Indicators for NOTSS Domains:

NOTSS Domain Kinematic Proxy/Feature
Situation Awareness Head pose orientation (yaw/pitch/roll) streams {h(t)}\{h(t)\}; gaze-proxy (face-ROI distance)
Communication/Interaction Gesture amplitude and velocity; response latency between gestures
Decision Making/Team Coordination Idle-time in bimanual activity; hand-trajectory velocity cross-correlation (“synchrony”)

AI Classifier Architectures:

  • Skeleton-based graph neural networks (GCNs) or LSTMs: time-sequence of joint coordinates XtRJ×2X_t \in \mathbb{R}^{J \times 2}, velocity vj(t)v_j(t), and joint angles θjk(t)\theta_{jk}(t) as input, regressing to domain-specific NOTSS scores.
  • Two-stream convolutional networks: one stream ingests RGB, the other pose/optical flow, jointly classifying high/low performance per NOTSS category.
  • Explainable-AI: attention layers localize the spatiotemporal features driving each NOTSS score.

No end-to-end, peer-reviewed, AI-driven NOTSS validation pipeline has yet been reported. Closest analogues are surgical technical skill classifiers with accuracies ≥80% [65 in (Constable et al., 2024)].

3. LLM and VR-Driven NOTSS Analytics

The emergence of LLM analytics and immersive simulation platforms such as the Virtual Operating Room Team Experience (VORTeX) enables team-based NOTSS evaluation using transcript analysis (Barker et al., 19 Jan 2026).

Prompt Engineering and Category Mapping: VORTeX uses a composite prompt embedding the four primary NOTSS categories—“Situational Awareness,” “Decision Making,” “Communication and Teamwork,” “Leadership”—as explicit definitions within the task for the LLM. The LLM is instructed to (a) tag dialogue evidence, (b) self-verify outputs, and (c) return a directed JSON interaction graph representing the session.

Network Science Metrics:

For each participant node viv_i, directed edges eije_{ij} are established when NOTSS behavior of category cc occurs from iji \to j. Derived metrics include:

  • In-degree: din(i)=jAjid_{in}(i) = \sum_{j}A_{ji}
  • Out-degree: dout(i)=jAijd_{out}(i) = \sum_{j}A_{ij}
  • Weighted degree: dtot(i)=din(i)+dout(i)d_{tot}(i) = d_{in}(i) + d_{out}(i)
  • Clustering coefficient: Ci=2#triangles through iki(ki1)C_i = \frac{2 \cdot \#\,\text{triangles through }i}{k_i(k_i-1)}
  • Betweenness centrality: b(i)=sitσst(i)σstb(i) = \sum_{s \ne i \ne t} \frac{\sigma_{st}(i)}{\sigma_{st}}
  • Hierarchy index: H=12i<jmin(Aij,Aji)i<j(Aij+Aji)H = 1 - \frac{2\sum_{i<j} \min(A_{ij},A_{ji})}{\sum_{i<j}(A_{ij} + A_{ji})}

These measures quantify operative hierarchy, interaction density, and role centrality.

Scenario Implementation: VORTeX constructs two clinically-scripted emergencies—pneumothorax and intra-abdominal bleeding—deliberately structured to demand authentic NOTSS behaviors. Realistic dialogue and temporal pressure are layered to elicit and evaluate all four NOTSS domains within a controlled, reproducible environment.

4. Quantitative Results, Validation, and Metrics

Human Annotation Reliability: High inter-rater ICCs (“good to excellent,” ICC ≥ 0.75) are established for expert NOTSS scoring using simulated video (Constable et al., 2024).

AI Metrics and Limitations: While AI frameworks for kinematic or transcript-based NOTSS scoring are described, published numerical results for end-to-end systems applied to NOTSS (as opposed to technical skills) remain lacking (Constable et al., 2024, Barker et al., 19 Jan 2026). VORTeX compared LLM outputs to expert annotations, calculating percent agreement and Cohen’s κ, but did not report specific values (Barker et al., 19 Jan 2026).

System Performance and Usability (VORTeX, pilot n=12n=12) (Barker et al., 19 Jan 2026):

  • Usability: mean ratings (Q1: M=3.50, Q3: M=3.75)
  • Perceived NTS improvement: Q11 (“Continued use would improve my NTS”) M=4.42
  • VR system: mean frame rate 73 fps, network latency 75 ms
  • Qualitative: participants highlighted the salience of debriefing graphs and realism of scenario-induced pressure

5. Integration into Training and Feedback Systems

NOTSS frameworks serve multiple roles in surgical training and credentialing:

  • Formative Feedback: Automated alerts on metrics such as head-pose scanning variance may supplement or structure coach-led debriefs (Constable et al., 2024).
  • Real-Time Monitoring: Fatigue detection using motion characteristics can trigger micro-breaks during high-cognitive-load phases.
  • Milestone Tracking: AI-generated NOTSS indices can be used for summative benchmarking in competency-based education.
  • Automated Debrief: VR/LLM systems enable directed feedback on communication/network structure, identifying bottlenecks or role misalignments (Barker et al., 19 Jan 2026).

A plausible implication is that, as data-driven approaches mature and are validated, large-scale registry mining connecting NOTSS indices to patient outcomes will support more precise, individualized surgical education.

6. Barriers, Challenges, and Prospects

Data, Privacy, and Explainability: The primary obstacles to robust AI/CV-driven NOTSS assessment include the lack of large, expertly annotated video datasets of clinical operations; privacy and consent challenges for 'operating-room black box' recordings; and the risk of algorithmic bias when training sets are limited (Constable et al., 2024).

Explainability remains critical: both AI-based pose-derived and LLM-driven assessments must support interpretable rationales for scored behaviors in order to secure user trust and acceptance [64 in (Constable et al., 2024, Barker et al., 19 Jan 2026)].

Technical Alignment: Speech-to-text and diarization errors, ambiguity in utterance categorization, and lack of multimodal integration for nonverbal cues limit the completeness of current LLM-based approaches (Barker et al., 19 Jan 2026). Ongoing work is integrating gaze/motion embeddings and refining prompt strategies.

Institutional and Cultural Acceptance: Adoption depends on positioning technology as formative (not punitive), ensuring robust data governance, and designing user interfaces that are accessible to both trainees and experienced clinicians.

In summary, the NOTSS framework underpins the objective assessment of surgical team behaviors essential for patient safety. While established behavioral taxonomies and reliable human annotation tools exist, fully validated, scalable AI/CV/LLM platforms to automate such assessments are in development and require further data, transparency, and interdisciplinary collaboration for clinical deployment (Constable et al., 2024, Barker et al., 19 Jan 2026).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Non-Technical Skills for Surgeons (NOTSS).