Kvasir-SEG Dataset Overview

Updated 7 June 2026

Kvasir-SEG is an open-access, pixel-wise annotated dataset of 1,000 colonoscopic images designed to benchmark GI polyp detection and segmentation algorithms.
The dataset features variable image resolutions and diverse clinical conditions, with annotations rigorously reviewed by medical experts.
Benchmark results, including metrics from ResUNet and traditional methods, demonstrate its effectiveness in validating advanced segmentation approaches.

Kvasir-SEG is an open-access, pixel-wise annotated dataset of gastrointestinal (GI) polyp images for benchmarking automatic detection and segmentation algorithms in computer vision and medical image analysis. Compiled to address the paucity of high-quality, precisely segmented endoscopic image data, Kvasir-SEG functions as a reference standard for evaluating deep learning and traditional segmentation methods in colonoscopic polyp analysis (Jha et al., 2019).

1. Dataset Construction and Annotation

Kvasir-SEG consists of 1,000 colonoscopic still frames, each paired with a binary segmentation mask outlining one or more polyp regions. Source images cover a range of native spatial resolutions typical of real-world colonoscopy (from 720×576 up to 1920×1080 pixels) and include clinically representative variability in polyp shape/size, lighting, and tissue background (Jha et al., 2019, Asare et al., 17 Sep 2025). Images are encoded as JPEG files, while masks are saved as lossless 1-bit PNGs, mapped so that background pixels equal 0 and polyp pixels equal 255, supporting precise pixel-wise delineation.

Annotation was performed via a two-stage process: first, frames were manually outlined by an engineer and a medical doctor using the Labelbox platform. Subsequently, all masks were reviewed for clinical validity and completeness by an experienced gastroenterologist. No inter-observer agreement metrics (e.g., Cohen’s $\kappa$ ) were reported, but the process was designed to ensure medical-grade annotation consistency. In addition to masks, per-frame bounding boxes are provided in a JSON file (“polyp_bboxes.json”), generated by extracting the minimal and maximal $x$ / $y$ coordinates from each binary mask, thus facilitating direct use in object detection pipelines (Jha et al., 2019).

2. Access, Format, and Recommended Data Splits

The dataset is freely downloadable at https://datasets.simula.no/kvasir-seg/ under an open-access license (CC BY-NC or similar—exact terms detailed at the source site) for academic and non-commercial purposes (Jha et al., 2019). Image files are stored in an “images/” directory, masks in “masks/”, with bounding boxes in the accompanying JSON.

For experimental reproducibility, the official recommendation is an 80/10/10 split into training, validation, and test sets, i.e., 800/100/100 images, respectively. Standard $k$ -fold cross-validation (e.g., five folds of 200 frames each) is also endorsed, provided that no frames from the same colonoscopic sequence are distributed across both training and test splits, to avoid data leakage. Other works using the dataset report minor variations (e.g., holding out 120 images for the test set), but all emphasize the importance of unbiased evaluation and partitioning (Asare et al., 17 Sep 2025).

3. Canonical Preprocessing and Augmentation Protocols

Raw images are typically resized or padded to fixed computational resolutions (e.g., 320×320, 256×256, or 512×512) while maintaining aspect-ratio integrity. Input normalization to $[0,1]$ or $[-1,1]$ is standard. Data augmentation schemes include horizontal/vertical flips, elastic deformations, random cropping, rotations, brightness/contrast modification, cutout, random erasing, and color jitter (Jha et al., 2019, Asare et al., 17 Sep 2025). Mask preprocessing may involve one-hot encoding for compatibility with softmax segmentation outputs (Asare et al., 17 Sep 2025).

Frame-by-frame, the variation in polyp morphology and imaging artifacts (e.g., specular highlights, stool occlusion, blurred boundaries) necessitate robust augmentation to avoid overfitting and to improve generalization across the diverse anatomical landscape covered by Kvasir-SEG (Asare et al., 17 Sep 2025, Tomar, 2021).

4. Baseline Methods, Metrics, and Benchmark Results

Kvasir-SEG was released with baseline segmentation results from both traditional clustering and modern deep neural architectures.

Fuzzy C-means (FCM) clustering: Involves grayscale conversion, median blurring, Otsu thresholding, morphological operations, and 1D feature flattening, yielding binary segmentation via membership thresholding. On the held-out test set, FCM achieved Dice $=0.2390$ and IoU $=0.3142$ (Jha et al., 2019).
ResUNet: A 5-level encoder–decoder with residual blocks, optimized with Nadam ( $\mathrm{lr}=10^{-4}$ , $\beta_1=0.9$ , $x$ 0), batch size 8, and Dice loss. Data were resized to 320×320 with extensive augmentation. ResUNet achieved Dice $x$ 1 and IoU $x$ 2 (Jha et al., 2019).
Recent approaches (e.g., PolypSeg-GradCAM, 2025): U-Net-based architectures, trained and evaluated exclusively on Kvasir-SEG using a 792/88/120 train/val/test split, report mean IoU $x$ 3 and Dice (F-score) $x$ 4 on the held-out test set (Asare et al., 17 Sep 2025).

Standard evaluation metrics include the Dice coefficient,

$x$ 5

and Intersection-over-Union,

$x$ 6

Precision and recall are also reported: $x$ 7 where $x$ 8, $x$ 9, and $y$ 0 denote true positive, false positive, and false negative pixels, respectively (Jha et al., 2019, Asare et al., 17 Sep 2025).

Further, FPS (frames-per-second) evaluation is included in challenge settings, with real-time architectures reporting up to 80.6 FPS on 512×512 frames (Tomar, 2021).

5. Challenges, Variability, and Dataset Characteristics

Polyp segmentation in Kvasir-SEG is challenged by high intra- and interlesion variability, presence of cryptic or flat lesions, artifact contamination (e.g., stool, mucosal folds), and pronounced class imbalance—polyps may occupy only a small fraction of the frame (Asare et al., 17 Sep 2025, Tomar, 2021). Annotation strategies and network architectures (e.g., skip connections, residual blocks, attention modules) must compensate for this complexity. The dataset encompasses a range of protuberant and flat polyps, supporting investigation of algorithmic robustness to diverse clinical presentations (Asare et al., 17 Sep 2025).

Masks accurately delineate irregular polyp margins. Bounding boxes—derivable directly from masks—facilitate object detection and weakly supervised training approaches. Absence of inter-observer statistics leaves open the question of annotation reproducibility, though expert verification provides a level of clinical reliability (Jha et al., 2019).

6. Applications, Limitations, and Research Directions

Kvasir-SEG serves multiple research and clinical development objectives:

Semantic segmentation: Direct benchmarking of pixel-accurate polyp segmentation, a prerequisite for AI-driven computer-aided diagnosis (CAD) and risk stratification (Jha et al., 2019, Asare et al., 17 Sep 2025).
Polyp detection/localization: Via provided bounding boxes, supports object detection and region proposal schemes (Jha et al., 2019).
Morphometry and growth modeling: Enables quantitative analysis of polyp morphology, aiding efforts in progression-risk prediction.
Multi-task learning: Supports architectures jointly trained for segmentation and classification of polyp subtype or malignancy risk (Jha et al., 2019).
Explainability: Utilized in work coupling U-Net segmentation with Grad-CAM to visualize model attention and support clinical trust in algorithmic outputs (Asare et al., 17 Sep 2025).

The dataset enables comparative work in architecture advancement (e.g., attention U-Nets, transformer-based models, multi-scale analysis), weakly-supervised/semi-supervised learning, domain adaptation, temporal modeling for video stabilization, and data-centric expansion (including stratification by polyp subtype and multi-center acquisition) (Jha et al., 2019).

Key limitations include lack of formal inter-annotator reliability data and finite scale (1,000 frames), though the diversity of acquisition conditions partly mitigates the latter. A plausible implication is that future releases would benefit from larger, multicenter collections and protocolized annotation for further benchmarking utility.

7. Impact on the Field and Benchmark Status

Kvasir-SEG is widely recognized as the first openly available, pixel-wise segmented GI polyp dataset for algorithmic evaluation (Jha et al., 2019). It has become a central resource in both traditional and deep learning-based polyp segmentation research, baseline challenge competitions (e.g., Medico 2020) (Tomar, 2021), and real-world algorithm development for clinical automation. Performance gains on Kvasir-SEG have tracked the evolution of model architectures—enabling robust, explainable, and real-time segmentation pipelines—and have established the dataset as a standard benchmark for reproducible results and fair comparison across the biomedical image analysis literature (Jha et al., 2019, Asare et al., 17 Sep 2025, Tomar, 2021).

Markdown Report Issue Upgrade to Chat

References (3)

Kvasir-SEG: A Segmented Polyp Dataset (2019)

PolypSeg-GradCAM: Towards Explainable Computer-Aided Gastrointestinal Disease Detection Using U-Net Based Segmentation and Grad-CAM Visualization on the Kvasir Dataset (2025)

Automatic Polyp Segmentation using Fully Convolutional Neural Network (2021)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Kvasir-SEG Dataset.