NOTSS: Non-Technical Skills for Surgeons

Updated 22 January 2026

NOTSS is a structured framework outlining essential behavioral, cognitive, and interpersonal skills necessary for surgical safety and effective teamwork.
Recent advances integrate computer vision, AI, and LLM analytics to quantitatively assess skills like situation awareness and decision making using kinematic and linguistic data.
Simulation platforms, such as VORTeX, leverage VR and LLM-driven feedback to enhance training, real-time monitoring, and formative debriefing in surgical education.

Non-Technical Skills for Surgeons (NOTSS) comprise a structured taxonomy and assessment framework for those behavioral, cognitive, and interpersonal competencies essential to surgical safety and team performance that are not strictly technical. Developed over the past two decades, NOTSS has become the reference framework for both research and training in surgical teamwork, leadership, situation awareness, decision making, and communication. Recently, computer vision (CV), AI, and LLM-based analytics have enabled quantitative and scalable measurement of NOTSS elements using both kinematic and linguistic data sources, with particular activity in specialities such as cardiothoracic and laparoscopic surgery (Constable et al., 2024, Barker et al., 19 Jan 2026).

1. Taxonomy and Components of NOTSS

The NOTSS framework, as adopted in current surgical safety research, consists of five principal categories (Constable et al., 2024):

Situation Awareness: Monitoring the operating environment, anticipating future states, recognizing and responding to cues. Supported by sub-skills such as anticipatory ability and cognitive flexibility.
Decision Making: Selecting and executing an appropriate course of action under time pressure or uncertainty. Key subcomponents include mental readiness and action planning.
Communication and Interaction: Clarity of speech, closed-loop communication, checking shared understandings.
Co-operation and Team Skills: Sharing information, mutual monitoring, supporting and backup behaviors.
Leadership and Managerial Skills: Setting and maintaining standards, resource management, assertiveness, and conflict resolution.

The NOTECHS scale, a sister instrument, extends the same principles and has been validated as a behavioral marker set [71, 72 in (Constable et al., 2024)].

Within simulation and training environments, current practice also identifies specific subskills (e.g., use of communicative gestures for structuring team action) as targets for assessment and feedback (Constable et al., 2024, Barker et al., 19 Jan 2026).

2. Assessment Methodologies: Classical and Algorithmic

2.1 Human-Rated NOTSS Scoring

Historically, NOTSS assessment has relied on direct observation or video review by trained clinical raters, using standardized behavioral marker forms. Reliability is quantified via inter-rater statistics, typically the Intraclass Correlation Coefficient (ICC) or Cohen’s κ:

$ICC = \frac{MS_B - MS_W}{MS_B + (k-1)MS_W}$

where $MS_B$ is between-videos mean square, $MS_W$ is within-videos mean square, $k$ is the number of raters (Constable et al., 2024).

Yule et al. [72 in (Constable et al., 2024)] demonstrated the construct and criterion validity of human-rated NOTSS, reporting ICCs “in the good to excellent range,” that is, ICC ≥ 0.75.

2.2 Computer Vision and AI Approaches

Recent work proposes a range of CV/AI model families for objective, automated NOTSS scoring. These may be grouped as follows (Constable et al., 2024):

Pose Estimation Frameworks:

Markerless 2D/3D pose estimation using deep networks: OpenPose/Part-Affinity Fields, DeeperCut, DeepLabCut (supporting multi-person and multi-instrument tracking).
RGB-D (depth sensor) systems for coarse 3D head and upper-body tracking.
Hybrid models combining pose skeletons with inertial sensor data under gloves.

Kinematic Indicators for NOTSS Domains:

NOTSS Domain	Kinematic Proxy/Feature
Situation Awareness	Head pose orientation (yaw/pitch/roll) streams $\{h(t)\}$ ; gaze-proxy (face-ROI distance)
Communication/Interaction	Gesture amplitude and velocity; response latency between gestures
Decision Making/Team Coordination	Idle-time in bimanual activity; hand-trajectory velocity cross-correlation (“synchrony”)

AI Classifier Architectures:

Skeleton-based graph neural networks (GCNs) or LSTMs: time-sequence of joint coordinates $X_t \in \mathbb{R}^{J \times 2}$ , velocity $v_j(t)$ , and joint angles $\theta_{jk}(t)$ as input, regressing to domain-specific NOTSS scores.
Two-stream convolutional networks: one stream ingests RGB, the other pose/optical flow, jointly classifying high/low performance per NOTSS category.
Explainable-AI: attention layers localize the spatiotemporal features driving each NOTSS score.

No end-to-end, peer-reviewed, AI-driven NOTSS validation pipeline has yet been reported. Closest analogues are surgical technical skill classifiers with accuracies ≥80% [65 in (Constable et al., 2024)].

3. LLM and VR-Driven NOTSS Analytics

The emergence of LLM analytics and immersive simulation platforms such as the Virtual Operating Room Team Experience (VORTeX) enables team-based NOTSS evaluation using transcript analysis (Barker et al., 19 Jan 2026).

Prompt Engineering and Category Mapping: VORTeX uses a composite prompt embedding the four primary NOTSS categories—“Situational Awareness,” “Decision Making,” “Communication and Teamwork,” “Leadership”—as explicit definitions within the task for the LLM. The LLM is instructed to (a) tag dialogue evidence, (b) self-verify outputs, and (c) return a directed JSON interaction graph representing the session.

Network Science Metrics:

For each participant node $v_i$ , directed edges $e_{ij}$ are established when NOTSS behavior of category $MS_B$ 0 occurs from $MS_B$ 1. Derived metrics include:

In-degree: $MS_B$ 2
Out-degree: $MS_B$ 3
Weighted degree: $MS_B$ 4
Clustering coefficient: $MS_B$ 5
Betweenness centrality: $MS_B$ 6
Hierarchy index: $MS_B$ 7

These measures quantify operative hierarchy, interaction density, and role centrality.

Scenario Implementation: VORTeX constructs two clinically-scripted emergencies—pneumothorax and intra-abdominal bleeding—deliberately structured to demand authentic NOTSS behaviors. Realistic dialogue and temporal pressure are layered to elicit and evaluate all four NOTSS domains within a controlled, reproducible environment.

4. Quantitative Results, Validation, and Metrics

Human Annotation Reliability: High inter-rater ICCs (“good to excellent,” ICC ≥ 0.75) are established for expert NOTSS scoring using simulated video (Constable et al., 2024).

AI Metrics and Limitations: While AI frameworks for kinematic or transcript-based NOTSS scoring are described, published numerical results for end-to-end systems applied to NOTSS (as opposed to technical skills) remain lacking (Constable et al., 2024, Barker et al., 19 Jan 2026). VORTeX compared LLM outputs to expert annotations, calculating percent agreement and Cohen’s κ, but did not report specific values (Barker et al., 19 Jan 2026).

System Performance and Usability (VORTeX, pilot $MS_B$ 8) (Barker et al., 19 Jan 2026):

Usability: mean ratings (Q1: M=3.50, Q3: M=3.75)
Perceived NTS improvement: Q11 (“Continued use would improve my NTS”) M=4.42
VR system: mean frame rate 73 fps, network latency 75 ms
Qualitative: participants highlighted the salience of debriefing graphs and realism of scenario-induced pressure

5. Integration into Training and Feedback Systems

NOTSS frameworks serve multiple roles in surgical training and credentialing:

Formative Feedback: Automated alerts on metrics such as head-pose scanning variance may supplement or structure coach-led debriefs (Constable et al., 2024).
Real-Time Monitoring: Fatigue detection using motion characteristics can trigger micro-breaks during high-cognitive-load phases.
Milestone Tracking: AI-generated NOTSS indices can be used for summative benchmarking in competency-based education.
Automated Debrief: VR/LLM systems enable directed feedback on communication/network structure, identifying bottlenecks or role misalignments (Barker et al., 19 Jan 2026).

A plausible implication is that, as data-driven approaches mature and are validated, large-scale registry mining connecting NOTSS indices to patient outcomes will support more precise, individualized surgical education.

6. Barriers, Challenges, and Prospects

Data, Privacy, and Explainability: The primary obstacles to robust AI/CV-driven NOTSS assessment include the lack of large, expertly annotated video datasets of clinical operations; privacy and consent challenges for 'operating-room black box' recordings; and the risk of algorithmic bias when training sets are limited (Constable et al., 2024).

Explainability remains critical: both AI-based pose-derived and LLM-driven assessments must support interpretable rationales for scored behaviors in order to secure user trust and acceptance [64 in (Constable et al., 2024, Barker et al., 19 Jan 2026)].

Technical Alignment: Speech-to-text and diarization errors, ambiguity in utterance categorization, and lack of multimodal integration for nonverbal cues limit the completeness of current LLM-based approaches (Barker et al., 19 Jan 2026). Ongoing work is integrating gaze/motion embeddings and refining prompt strategies.

Institutional and Cultural Acceptance: Adoption depends on positioning technology as formative (not punitive), ensuring robust data governance, and designing user interfaces that are accessible to both trainees and experienced clinicians.

In summary, the NOTSS framework underpins the objective assessment of surgical team behaviors essential for patient safety. While established behavioral taxonomies and reliable human annotation tools exist, fully validated, scalable AI/CV/LLM platforms to automate such assessments are in development and require further data, transparency, and interdisciplinary collaboration for clinical deployment (Constable et al., 2024, Barker et al., 19 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (2)

Enhancing Surgical Performance in Cardiothoracic Surgery with Innovations from Computer Vision and Artificial Intelligence: A Narrative Review (2024)

Integrating Virtual Reality and Large Language Models for Team-Based Non-Technical Skills Training and Evaluation in the Operating Room (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Non-Technical Skills for Surgeons (NOTSS).

NOTSS: Non-Technical Skills for Surgeons

1. Taxonomy and Components of NOTSS

2. Assessment Methodologies: Classical and Algorithmic

2.1 Human-Rated NOTSS Scoring

2.2 Computer Vision and AI Approaches

3. LLM and VR-Driven NOTSS Analytics

4. Quantitative Results, Validation, and Metrics

5. Integration into Training and Feedback Systems

6. Barriers, Challenges, and Prospects

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

NOTSS: Non-Technical Skills for Surgeons

1. Taxonomy and Components of NOTSS

2. Assessment Methodologies: Classical and Algorithmic

2.1 Human-Rated NOTSS Scoring

2.2 Computer Vision and AI Approaches

3. LLM and VR-Driven NOTSS Analytics

4. Quantitative Results, Validation, and Metrics

5. Integration into Training and Feedback Systems

6. Barriers, Challenges, and Prospects

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research