Weed Segmentation Datasets
- Weed segmentation datasets are specialized, annotated corpora that precisely differentiate weeds, crops, and background in varied agricultural imagery.
- They incorporate multiple imaging modalities—including RGB, multispectral, and synthetic—with detailed annotations like semantic, instance, and organ-level labeling.
- These datasets are crucial for advancing precision agriculture, enhancing robotic weeding, yield forecasting, and adaptive management practices.
Weed segmentation datasets are specialized, pixel- or instance-labeled corpora designed to enable the development, benchmarking, and operational deployment of computer vision models for discriminating weeds from crops and background in agricultural imagery. These datasets are foundational for precision agriculture, robotic weeding, yield forecasting, and site-specific weed management, providing both real and synthetic imagery annotated at the plant or organ level. Today’s weed segmentation datasets encompass a range of imaging modalities (RGB, multispectral, NIR), annotation granularity (semantic, instance, multi-label), and ecological scopes (monocultures, mixed fields, multi-stage temporal series).
1. Dataset Modalities and Acquisition Protocols
Weed segmentation datasets are acquired under diverse conditions reflecting agricultural variability. The most prevalent formats include:
- RGB datasets: Acquired via DSLR, smartphone, UAV, or ground robot cameras; typical resolution ranges from 512×512 px (RiceSEG) to 4K (GrowingSoy, downsampled in practice) (Zhou et al., 2 Apr 2025, Steinmetz et al., 1 Jun 2024).
- Multispectral and NIR datasets: Extend standard RGB with red-edge, NIR, or other spectral bands, often captured via UAV-mounted sensors (weedNet, WeedMap, WeedsGalore) (Sa et al., 2017, Sa et al., 2018, Celikkan et al., 18 Feb 2025).
- Synthetic datasets: Generated procedurally in 3D simulators (Blender, CropCraft) with precise ground-truth masks and controlled variations in plant morphology, density, lighting, and camera perspective (Cicco et al., 2016, Boyadjian et al., 4 Nov 2025).
Acquisition protocols commonly document:
- Imaging platforms and sensor models (e.g., MicaSense Sequoia, DJI Phantom 4 Multispectral)
- Camera deployment (height, nadir/oblique, robotic/UAV/handheld)
- Environmental conditions (greenhouse vs. field, lighting variability, soil backgrounds)
- Sampling strategy (fixed timepoints for growth series, grid sampling across field blocks, frame capture intervals for video-derived datasets)
Most datasets implement either single or multitemporal acquisition, with time-series offering phenological context and robustness evaluation (e.g., 11-week growth in WeedSense (Sarker et al., 20 Aug 2025), full crop cycle in GrowingSoy (Steinmetz et al., 1 Jun 2024), and four-date sampling in WeedsGalore (Celikkan et al., 18 Feb 2025)).
2. Annotation Schemes, Classes, and Quality Control
Annotation granularity and schema vary along several axes:
- Task/label structure
- Semantic segmentation: Per-pixel labeling; most common classes are background, crop, weed (sometimes single or multiclass weed taxonomies).
- Instance segmentation: Polygon- or mask-wise annotation of each plant, supporting counting and per-plant trait extraction (Steinmetz et al., 1 Jun 2024, Celikkan et al., 18 Feb 2025).
- Multi-task sets: Datasets like WeedSense include auxiliary measurements (plant height, growth stage classification) aligned with masks (Sarker et al., 20 Aug 2025).
- Label class definitions
- Species-level (WeedSense: 16 weed species + background) (Sarker et al., 20 Aug 2025)
- Coarse weed/crop (WeedNet, most public UAV datasets)
- Organ-level (RiceSEG: six classes including background, green vegetation, senescent vegetation, panicle, weed, duckweed) (Zhou et al., 2 Apr 2025)
- Multi-weed-class (WeedsGalore: amaranth, barnyard grass, quickweed, “weed_other”) (Celikkan et al., 18 Feb 2025)
- Annotation workflow
- Manual annotation dominates, with tools including polygon drawing, superpixel brushing (RiceSEG), or commercial platforms (Roboflow: GrowingSoy) (Zhou et al., 2 Apr 2025, Steinmetz et al., 1 Jun 2024).
- Automated proposals (SAM2-Hiera-L in WeedSense) can accelerate labeling (Sarker et al., 20 Aug 2025).
- Quality control comprises cross-checking (10-20% mask re-inspection, inter-annotator overlap, arbitrary per-pixel IoU thresholds for acceptance) (Steinmetz et al., 1 Jun 2024, Sarker et al., 20 Aug 2025, Zhou et al., 2 Apr 2025).
- Synthetic sets generate masks automatically during rendering, eliminating human bias at the expense of domain realism (Cicco et al., 2016, Boyadjian et al., 4 Nov 2025).
3. Dataset Composition, Task Focus, and Evaluation Metrics
A broad survey of leading datasets illustrates task scope, scale, and their typical evaluation practices:
| Name (arXiv id) | Modality | Classes* | Images/Instances | Tasks | Metrics |
|---|---|---|---|---|---|
| WeedSense (Sarker et al., 20 Aug 2025) | RGB, time-series | 16 weed spp. + bg (17-class); + height, growth | 120,341 frames, 32 plants | Segm, height, stage | mIoU, MAE, accuracy |
| Sugarbeet/Corn-Weed (Marrewijk et al., 3 Apr 2024) | RGB | crop, weed, bg (3-class) | 9,287~ images | Segmentation | mIoU |
| GrowingSoy (Steinmetz et al., 1 Jun 2024) | RGB, instance | soy, caruru weed, grassy weed (3-class) | 1,000 images, ~11k inst. | Inst. segm. | mAP-50, recall |
| WeedNet (Sa et al., 2017) | Multispectral | crop, weed, bg (3-class) | 465 images | Sem. segm. | F1, AUC, precision/recall |
| WeedMap (Sa et al., 2018) | Multispectral UAV | crop, weed, bg (3-class) | 1,026 tiles | Sem. segm. | AUC |
| RiceSEG (Zhou et al., 2 Apr 2025) | RGB | 6 classes (incl. weed, organs) | 3,078 images | Sem. segm. | IoU, recall, Dice |
| WeedsGalore (Celikkan et al., 18 Feb 2025) | MSI, instance, time | maize, 4 weed spp.+bg/other | 156 images, 10k inst. | Sem/Inst. segm. | mIoU, mAP, ECE |
| Synthetic Crop-Weed (Boyadjian et al., 4 Nov 2025) | RGB synth | maize, weed, bg (3-class) | 1,500 synthetic | Sem. segm. | mIoU, IoU |
| KWD-2023/MSCD-2023 (Asad et al., 2023) | RGB | kochia, canola (binary+bg), field-mixed | 99/305 images | Segmentation | mIoU, fwIoU |
*Excludes non-weed classes for brevity.
Evaluation is nearly universal in terms of mean Intersection-over-Union (mIoU), class-specific IoU, and—for instance segmentation—mean Average Precision at IoU=0.5 (mAP-50). Some datasets report auxiliary regression/classification metrics, e.g., WeedSense’s 1.67 cm MAE in height estimation and 99.99% growth stage classification accuracy (Sarker et al., 20 Aug 2025). Balancing protocols (e.g., frequency-of-appearance loss) handle class imbalance due to the typically dominant background class (Sa et al., 2017, Sa et al., 2018).
4. Domain Coverage, Imbalance, and Environmental Complexity
Weed segmentation datasets differ markedly in environmental scope and annotation difficulty:
- Species and organ-level diversity: E.g., WeedSense (16 weed species), RiceSEG (comprehensive rice genotypes and weed classes, multiple growth stages across Asia and Africa) (Sarker et al., 20 Aug 2025, Zhou et al., 2 Apr 2025).
- Background/clutter and occlusion: Field datasets (KWD-2023, MSCD-2023) capture uncontrolled lighting, shadows, and overlapping vegetation, whereas greenhouse or synthetic sets offer controlled conditions (Asad et al., 2023, Sarker et al., 20 Aug 2025).
- Class imbalance: Background pixels typically account for >90% of dataset area (Sugarbeet: 98.5%, Corn-Weed: 89.8%; RiceSEG: 1.6% weed pixels) (Marrewijk et al., 3 Apr 2024, Zhou et al., 2 Apr 2025).
- Temporal and morphological variation: Datasets capture entire phenological cycles or stage-stratified sampling (WeedSense, GrowingSoy, WeedsGalore), enabling evaluation under plant growth, size, and background color variation (Sarker et al., 20 Aug 2025, Steinmetz et al., 1 Jun 2024, Celikkan et al., 18 Feb 2025).
Pitfalls include high image redundancy (e.g., Sugarbeet) limiting the efficacy of image-level active learning, and domain gaps in synthetic to real data transfer as evidenced in sim-to-real benchmarks (Marrewijk et al., 3 Apr 2024, Boyadjian et al., 4 Nov 2025). Recommended countermeasures are increased scenario diversity, hard-mining, and hybrid labeling.
5. Synthetic versus Real-World Datasets and Sim-to-Real Transfer
Synthetic datasets generated via procedural modeling (Blender, CropCraft) and physically based rendering enable rapid annotation and controlled diversity. These sets simulate conditions such as plant architecture, lighting, soil texture, and camera angle (Cicco et al., 2016, Boyadjian et al., 4 Nov 2025). When benchmarked, models trained on synthetic images achieve competitive mIoU on synthetic test sets (e.g., 96.1% (Boyadjian et al., 4 Nov 2025)) but exhibit a sim-to-real gap of ≈10%, improved from previous 20% reports. Fine-tuning with even small real datasets (<5% of real set) recovers much of the domain gap for weeds (∆IoU ≈+12% on Montoldre) (Boyadjian et al., 4 Nov 2025). Synthetic datasets can outperform small real sets in cross-domain robustness, though are ultimately limited by model and textural realism.
Integration strategies include “real-augmented” regimes, where synthetic images are supplemented with a modest number of labeled real frames to boost generalization, with some protocols surpassing “real-only” performance on weeds (Cicco et al., 2016, Boyadjian et al., 4 Nov 2025).
6. Major Open Datasets and Benchmarking Resources
The following resources represent major public benchmarks for weed segmentation research:
- WeedSense (https://weedsense.github.io): 120,341 images, 16 weed species, 11 weeks, semantic masks, heights, growth stages, CC BY 4.0 (Sarker et al., 20 Aug 2025).
- GrowingSoy (https://github.com/raulsteinmetz/soy-segmentation-ds): 1,000 frames, soybean + caruru + grassy weeds, polygonal masks, full lifecycle, CC BY 4.0 (Steinmetz et al., 1 Jun 2024).
- RiceSEG (http://www.global-rice.com): 3,078 images × 512² px, 6-class semantic with weed label, multi-country, all growth stages, open-access (Zhou et al., 2 Apr 2025).
- WeedsGalore (https://github.com/GFZ/weedsgalore): 156 multispectral/UAV images with maize and 4 weed classes, instance and semantic masks, multitemporal (Celikkan et al., 18 Feb 2025).
- WeedNet (https://goo.gl/UK2pZq): Multispectral drone images in sugar beet fields, NDVI-based labels, supporting “crop-only”/“weed-only”/mixed plots (Sa et al., 2017).
- WeedMap: Multispectral sugar beet UAV orthomosaics, radiometrically calibrated, expert-verified ground truth (Sa et al., 2018).
- Sugarbeet/Corn-Weed: Public/extensive pixel-wise labels for crops and weeds in field/industrial scenarios (see (Marrewijk et al., 3 Apr 2024), Chebrolu et al. 2017 for Sugarbeet).
Access protocols typically involve academic or CC BY-style licensing, though some industrial/developmental sets (e.g., Corn-Weed) are restricted (Marrewijk et al., 3 Apr 2024). Formats span JPEG/PNG for images/masks, GeoTIFF for radiometric and georeferenced tiles, and JSON/Pickle for instance annotations.
7. Research Challenges, Best Practices, and Future Directions
Key technical challenges include:
- Severe class imbalance (background dominance (Marrewijk et al., 3 Apr 2024, Zhou et al., 2 Apr 2025)), structural similarity between crops and weeds (e.g., juvenile rice vs. grass weeds (Zhou et al., 2 Apr 2025)), and label ambiguities in complex canopies or aquatic fields.
- Domain adaptation and generalization: Sim-to-real transfer, field-to-field drift, and adaptation to rare or new weed species remain limitations.
- Need for high annotation quality: Superpixel tools, consensus-based multicoder QC, and hybrid active learning (targeting rare/uncertain classes) are increasingly adopted (Zhou et al., 2 Apr 2025, Marrewijk et al., 3 Apr 2024).
Best practices include:
- Employing transformer-based backbones (Mask2Former, SegFormer) for improved weed recall in cluttered, low-prevalence settings (Zhou et al., 2 Apr 2025).
- Prioritizing sub-millimeter GSD for fine weed structure capture, and carefully balancing phenological, lighting, and spatial variation during data sampling (Zhou et al., 2 Apr 2025, Steinmetz et al., 1 Jun 2024).
- Augmenting real datasets with synthetic diversity, moderate hard-example mining, and domain adaptation pipelines for robust generalization (Boyadjian et al., 4 Nov 2025, Cicco et al., 2016).
- Public release of annotation code, scripts, and human-in-the-loop correction tools to standardize benchmarking and reproducibility (Celikkan et al., 18 Feb 2025).
A plausible implication is that future research will emphasize hybrid datasets (synthetic + real), richer organ- and instance-level labels (supporting simultaneous segmentation and trait analysis), and improved quantification of prediction uncertainty to enable active learning and autonomous agricultural interventions. Multispectral and time-series modalities, as exemplified by WeedsGalore and WeedSense, are likely to become increasingly central to both weed segmentation research and its operational translation.