Papers
Topics
Authors
Recent
2000 character limit reached

Weed Segmentation Datasets

Updated 11 December 2025
  • Weed segmentation datasets are specialized, annotated corpora that precisely differentiate weeds, crops, and background in varied agricultural imagery.
  • They incorporate multiple imaging modalities—including RGB, multispectral, and synthetic—with detailed annotations like semantic, instance, and organ-level labeling.
  • These datasets are crucial for advancing precision agriculture, enhancing robotic weeding, yield forecasting, and adaptive management practices.

Weed segmentation datasets are specialized, pixel- or instance-labeled corpora designed to enable the development, benchmarking, and operational deployment of computer vision models for discriminating weeds from crops and background in agricultural imagery. These datasets are foundational for precision agriculture, robotic weeding, yield forecasting, and site-specific weed management, providing both real and synthetic imagery annotated at the plant or organ level. Today’s weed segmentation datasets encompass a range of imaging modalities (RGB, multispectral, NIR), annotation granularity (semantic, instance, multi-label), and ecological scopes (monocultures, mixed fields, multi-stage temporal series).

1. Dataset Modalities and Acquisition Protocols

Weed segmentation datasets are acquired under diverse conditions reflecting agricultural variability. The most prevalent formats include:

Acquisition protocols commonly document:

  • Imaging platforms and sensor models (e.g., MicaSense Sequoia, DJI Phantom 4 Multispectral)
  • Camera deployment (height, nadir/oblique, robotic/UAV/handheld)
  • Environmental conditions (greenhouse vs. field, lighting variability, soil backgrounds)
  • Sampling strategy (fixed timepoints for growth series, grid sampling across field blocks, frame capture intervals for video-derived datasets)

Most datasets implement either single or multitemporal acquisition, with time-series offering phenological context and robustness evaluation (e.g., 11-week growth in WeedSense (Sarker et al., 20 Aug 2025), full crop cycle in GrowingSoy (Steinmetz et al., 1 Jun 2024), and four-date sampling in WeedsGalore (Celikkan et al., 18 Feb 2025)).

2. Annotation Schemes, Classes, and Quality Control

Annotation granularity and schema vary along several axes:

3. Dataset Composition, Task Focus, and Evaluation Metrics

A broad survey of leading datasets illustrates task scope, scale, and their typical evaluation practices:

Name (arXiv id) Modality Classes* Images/Instances Tasks Metrics
WeedSense (Sarker et al., 20 Aug 2025) RGB, time-series 16 weed spp. + bg (17-class); + height, growth 120,341 frames, 32 plants Segm, height, stage mIoU, MAE, accuracy
Sugarbeet/Corn-Weed (Marrewijk et al., 3 Apr 2024) RGB crop, weed, bg (3-class) 9,287~ images Segmentation mIoU
GrowingSoy (Steinmetz et al., 1 Jun 2024) RGB, instance soy, caruru weed, grassy weed (3-class) 1,000 images, ~11k inst. Inst. segm. mAP-50, recall
WeedNet (Sa et al., 2017) Multispectral crop, weed, bg (3-class) 465 images Sem. segm. F1, AUC, precision/recall
WeedMap (Sa et al., 2018) Multispectral UAV crop, weed, bg (3-class) 1,026 tiles Sem. segm. AUC
RiceSEG (Zhou et al., 2 Apr 2025) RGB 6 classes (incl. weed, organs) 3,078 images Sem. segm. IoU, recall, Dice
WeedsGalore (Celikkan et al., 18 Feb 2025) MSI, instance, time maize, 4 weed spp.+bg/other 156 images, 10k inst. Sem/Inst. segm. mIoU, mAP, ECE
Synthetic Crop-Weed (Boyadjian et al., 4 Nov 2025) RGB synth maize, weed, bg (3-class) 1,500 synthetic Sem. segm. mIoU, IoU
KWD-2023/MSCD-2023 (Asad et al., 2023) RGB kochia, canola (binary+bg), field-mixed 99/305 images Segmentation mIoU, fwIoU

*Excludes non-weed classes for brevity.

Evaluation is nearly universal in terms of mean Intersection-over-Union (mIoU), class-specific IoU, and—for instance segmentation—mean Average Precision at IoU=0.5 (mAP-50). Some datasets report auxiliary regression/classification metrics, e.g., WeedSense’s 1.67 cm MAE in height estimation and 99.99% growth stage classification accuracy (Sarker et al., 20 Aug 2025). Balancing protocols (e.g., frequency-of-appearance loss) handle class imbalance due to the typically dominant background class (Sa et al., 2017, Sa et al., 2018).

4. Domain Coverage, Imbalance, and Environmental Complexity

Weed segmentation datasets differ markedly in environmental scope and annotation difficulty:

Pitfalls include high image redundancy (e.g., Sugarbeet) limiting the efficacy of image-level active learning, and domain gaps in synthetic to real data transfer as evidenced in sim-to-real benchmarks (Marrewijk et al., 3 Apr 2024, Boyadjian et al., 4 Nov 2025). Recommended countermeasures are increased scenario diversity, hard-mining, and hybrid labeling.

5. Synthetic versus Real-World Datasets and Sim-to-Real Transfer

Synthetic datasets generated via procedural modeling (Blender, CropCraft) and physically based rendering enable rapid annotation and controlled diversity. These sets simulate conditions such as plant architecture, lighting, soil texture, and camera angle (Cicco et al., 2016, Boyadjian et al., 4 Nov 2025). When benchmarked, models trained on synthetic images achieve competitive mIoU on synthetic test sets (e.g., 96.1% (Boyadjian et al., 4 Nov 2025)) but exhibit a sim-to-real gap of ≈10%, improved from previous 20% reports. Fine-tuning with even small real datasets (<5% of real set) recovers much of the domain gap for weeds (∆IoU ≈+12% on Montoldre) (Boyadjian et al., 4 Nov 2025). Synthetic datasets can outperform small real sets in cross-domain robustness, though are ultimately limited by model and textural realism.

Integration strategies include “real-augmented” regimes, where synthetic images are supplemented with a modest number of labeled real frames to boost generalization, with some protocols surpassing “real-only” performance on weeds (Cicco et al., 2016, Boyadjian et al., 4 Nov 2025).

6. Major Open Datasets and Benchmarking Resources

The following resources represent major public benchmarks for weed segmentation research:

Access protocols typically involve academic or CC BY-style licensing, though some industrial/developmental sets (e.g., Corn-Weed) are restricted (Marrewijk et al., 3 Apr 2024). Formats span JPEG/PNG for images/masks, GeoTIFF for radiometric and georeferenced tiles, and JSON/Pickle for instance annotations.

7. Research Challenges, Best Practices, and Future Directions

Key technical challenges include:

Best practices include:

A plausible implication is that future research will emphasize hybrid datasets (synthetic + real), richer organ- and instance-level labels (supporting simultaneous segmentation and trait analysis), and improved quantification of prediction uncertainty to enable active learning and autonomous agricultural interventions. Multispectral and time-series modalities, as exemplified by WeedsGalore and WeedSense, are likely to become increasingly central to both weed segmentation research and its operational translation.

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Weed Segmentation Datasets.