CholecSeg8k: Laparoscopic Segmentation Dataset

Updated 12 December 2025

CholecSeg8k is a multiclass pixel-wise segmentation dataset providing dense annotations for anatomical structures and surgical instruments in laparoscopic cholecystectomy videos.
The dataset comprises 8,080 RGB frames from 17 videos with detailed pixel-level labels across 13 semantic classes, facilitating precise performance metrics like IoU and Dice.
Applications include real-time surgical guidance, tool tracking, and algorithm benchmarking, while challenges such as class imbalance and annotation ambiguity highlight areas for improvement.

CholecSeg8k is a multiclass pixel-wise semantic segmentation dataset derived from laparoscopic cholecystectomy video sequences. Designed specifically for enhancing computer-assisted surgery research, CholecSeg8k provides dense annotations for both anatomical structures and surgical instruments, supporting the development and benchmarking of algorithms for scene understanding, tool tracking, and anatomical identification in minimally invasive surgery. The dataset is based on a subset of the widely-studied Cholec80 video corpus and is publicly released for non-commercial research and educational use (Hong et al., 2020).

1. Dataset Composition and Structure

CholecSeg8k consists of 8,080 RGB laparoscopic frames sampled from 17 video records of cholecystectomy procedures, captured at 25 frames per second. Each source video contributes several temporally contiguous 80-frame clips with phases selected to focus on gallbladder dissection, omitting preoperative and postoperative periods (Hong et al., 2020). All frames are provided at a spatial resolution of 854 × 480 pixels in lossless PNG format. For each frame, four corresponding PNGs are supplied: the raw RGB image, a color-coded segmentation mask, an RGBA annotation mask, and a single-channel watershed mask encoding class IDs.

The directory structure organizes data by video and clip, e.g., videoXX_startFrame/, containing the respective 80-frame sequences (Hong et al., 2020). The dataset totals approximately 3 GB on disk.

2. Annotation Protocol and Semantic Classes

Annotations were produced via the PixelAnnotationTool by trained graduate annotators following phase-specific guidelines. Each frame was cross-checked by a second annotator, though only organ-class boundary consistency was prioritized; minor boundary disagreements were not always resolved (Hong et al., 2020).

Pixel-wise masks encode 13 semantic classes, comprising 10 anatomical entities, two surgical instrument types, and black background. The class taxonomy and interpretations are outlined below:

ID	Short Name	Definition
0	Black background	Pixels outside circular field-of-view (padding)
1	Abdominal wall	Trocar boundaries, peritoneum
2	Liver	Hepatic parenchyma
3	GI tract	Stomach, small intestine, adjacent tissues
4	Fat	Fatty tissue
5	Grasper	Fenestrated-jaw surgical instrument
6	Connective tissue	Loose or dense fibrous tissue
7	Blood	Visible blood pools or stains
8	Cystic duct	Biliary structure
9	L-hook electrocautery	Hook-shaped surgical tool for dissection
10	Gallbladder	Primary organ under removal
11	Hepatic vein	Major vascular structure
12	Liver ligament	Coronary, triangular, falciform, and related ligaments

Class pixel frequencies are strongly imbalanced; major classes (liver, abdominal wall, gallbladder, GI tract, fat) account for the overwhelming majority of labeled pixels, while surgical tools and vessels constitute <1% each (Hong et al., 2020).

3. Data Splits, Preprocessing, and Usage

No single train/validation/test split is prescribed in the initial release; users are encouraged to adopt 80/10/10 or cross-validation strategies across the 101 clips (Hong et al., 2020). Later studies have implemented diverse split conventions by video or frame—for example, one scheme assigns 80% for training, 10% for validation, and 10% for testing, maintaining balance across instrument classes (Ali et al., 1 Dec 2025). Another approach splits by surgical case, e.g., 75% of cases (≈6,060 frames) for training and 25% (≈2,020 frames) for testing, omitting a separate validation set for improved sample efficiency (Grammatikopoulou et al., 2023).

Standard training pipelines normalize per-channel mean and variance. Common augmentations include horizontal flip, random scaling, cropping, color jitter, and Gaussian blur, specifically tailored for domain generalization and feature augmentation tasks (Ali et al., 1 Dec 2025).

4. Statistical Properties and Class Imbalance

CholecSeg8k exhibits substantial label imbalance. For example, grasper and L-hook instruments are present in only 0.6% and 0.3% of all labelled pixels, respectively, despite being critical for algorithmic tool segmentation (Hong et al., 2020). Within the instrument subset, the L-hook appears in 29% of frames and the grasper in 76%, varying by split (Fernández-Rodríguez et al., 15 Mar 2024). The majority of annotated pixels belong to liver (≈30%) and abdominal wall (≈28%) categories, while most vascular and ductal classes remain under 1% (Hong et al., 2020). Models may require class-reweighting or focal-loss schemes to offset this skew, although several benchmarks rely on default loss settings.

5. Evaluation Metrics and Baseline Results

Segmentation performance is consistently evaluated using per-class Intersection over Union (IoU) and Dice coefficient (DSC), along with mean IoU (mIoU), pixel accuracy, precision, and recall. The reported formulas are:

$\mathrm{IoU}_k = \frac{TP_k}{TP_k + FP_k + FN_k}$

$\mathrm{Dice}_k = \frac{2\,TP_k}{2\,TP_k + FP_k + FN_k}$

where $TP_k$ , $FP_k$ , $FN_k$ denote true positives, false positives, and false negatives for class $k$ , respectively (Hong et al., 2020, Ali et al., 1 Dec 2025).

Recent studies have reported state-of-the-art results on CholecSeg8k. For example, RobustSurg achieves a mean IoU of 89.32% and mean Dice of 93.64% across the 13 classes, compared to 87.67% and 92.45% for DeepLabv3+ (Ali et al., 1 Dec 2025). Class-wise IoU for large organs (liver, fat, gallbladder) may exceed 92%, whereas tool and vessel classes show lower, more variable performance.

In instrument segmentation tasks where class aggregation is used, inclusion of optical-flow maps as input (ARFlow-derived, t−1 and t−5 offsets, multiple representations) can substantially improve Dice and recall for underrepresented fast-moving instrument classes, without architecture modification (Fernández-Rodríguez et al., 15 Mar 2024). For instance, the L-hook class Dice improves from 31.97% (baseline RGB) to 49.68% (with OF), highlighting the benefit of explicit temporal cues.

6. Applications, Limitations, and Extensions

CholecSeg8k is extensively used for:

Real-time semantic guidance and highlight overlays during laparoscopic surgery
Development and benchmarking of tool segmentation, anatomical recognition, and video-based spatio-temporal models (Grammatikopoulou et al., 2023)
Training detection and instance segmentation algorithms, and as a source domain for visual domain adaptation (Ali et al., 1 Dec 2025)
Surgical SLAM and camera tracking using semantic landmarks

Limitations include dominant class imbalance, lack of instances masks, small object ambiguity (especially for vessels and tools), and absence of ground-truth depth or 3D pose annotations (Hong et al., 2020). Some annotations for minor classes are singly reviewed, contributing to potential boundary noise or label uncertainty. Appearance variability induced by lighting, blood, and surgical artifacts further challenges model generalization. Proposed extensions include addition of further cases, temporal labels, and richer instance annotation (Hong et al., 2020).

7. Access, Licensing, and Usage Example

CholecSeg8k is publicly hosted on Kaggle (https://www.kaggle.com/cshih/cholecseg8k) under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 license, supporting redistribution and derivative use for non-commercial research and education (Hong et al., 2020).

A minimal PyTorch-compatible pipeline for dataset access and batched evaluation is provided within the data release, enabling direct loading of images and watershed masks, pixel-level label mapping, and computation of per-class IoU (see code example in (Hong et al., 2020)).

References:

“CholecSeg8k: A Semantic Segmentation Dataset for Laparoscopic Cholecystectomy Based on Cholec80” (Hong et al., 2020)
“RobustSurg: Tackling domain generalisation for out-of-distribution surgical scene segmentation” (Ali et al., 1 Dec 2025)
“Exploring Optical Flow Inclusion into nnU-Net Framework for Surgical Instrument Segmentation” (Fernández-Rodríguez et al., 15 Mar 2024)
“A spatio-temporal network for video semantic segmentation in surgical videos” (Grammatikopoulou et al., 2023)
“See in Depth: Training-Free Surgical Scene Segmentation with Monocular Depth Priors” (Yang et al., 5 Dec 2025)