Sorghum Aphid Cluster Dataset Benchmark
- The Sorghum Aphid Cluster Dataset is a large-scale, field-collected benchmark with expert-annotated images and multi-scale patches for accurate aphid cluster detection.
- Robust methodology includes rigorous manual annotations, quality assurance measures, and multi-scale patching to capture diverse lighting and canopy conditions.
- Baseline evaluations demonstrate improved AP and recall through cluster merging and class-balancing techniques, enhancing precision in automated pest management.
The Sorghum Aphid Cluster Dataset is a large-scale, field-collected benchmark designed to enable the detection and segmentation of aphid clusters on sorghum crops under realistic environmental conditions. The dataset principally targets precise, localized pest management through data-driven machine learning approaches. It provides detailed, expert-annotated images and multi-scale patches capturing the diverse phenotypes and distribution of aphid clusters, facilitating robust training and evaluation of object detection and semantic segmentation models for automated crop monitoring applications (Zhang et al., 2023, Rahman et al., 2024, Zhang et al., 2023, Rahman et al., 2023).
1. Data Acquisition and Composition
The core of the Sorghum Aphid Cluster Dataset consists of 5,447 high-resolution (3648×2736 px) RGB images, collected over two growing seasons in commercial and research sorghum fields in Kansas. Imaging was conducted using a fixed three-GoPro (Hero 5 or similar) rig, which simultaneously sampled from three vertical positions—top, middle, and bottom of the sorghum canopy—enabling the dataset to capture substantial heterogeneity in leaf orientation, cluster scale, and lighting (from direct sun to overcast) (Zhang et al., 2023, Rahman et al., 2024, Zhang et al., 2023). The original image pool comprised several million candidates, manually screened by trained researchers to retain only those with at least one qualifying aphid cluster. The final images are approximately evenly distributed across the three viewpoints.
For downstream machine learning, full images were systematically cropped into overlapping patches:
- Original cut size: 400×400 px (stride 200, 50% overlap, in some releases)
- Multi-scale patching (introduced in later versions): patches of size 0.132×W × 0.132×H, 0.263×W × 0.263×H, and 0.525×W × 0.525×H with 10% overlap, yielding patch sizes from leaf-level to panicle-level contexts (Rahman et al., 2024, Rahman et al., 2023)
Resulting datasets include:
- Patch-based detection corpus: 151,380 patches (after filtering empties) (Zhang et al., 2023, Zhang et al., 2023)
- Multi-scale segmentation corpus: 54,742 image-mask pairs (Rahman et al., 2024, Rahman et al., 2023)
These patches are systematically stratified by scale, view, and cluster presence to facilitate robust cross-validation and reproducible experimentation.
2. Annotation Schema, Quality Assurance, and Preprocessing
The dataset operationally defines an “aphid cluster” as any contiguous region containing at least six aphids, based on expert consensus that clusters below this threshold lack economic impact (Zhang et al., 2023, Zhang et al., 2023, Rahman et al., 2024). Annotation was performed primarily using polygon masks tightly circumscribing each aphid aggregation, with derived bounding boxes automatically extracted for object detection. In semantic segmentation versions, mask pixels are labeled as “aphid cluster” or “background” (Rahman et al., 2024, Rahman et al., 2023).
Key annotation highlights:
- Manual annotation conducted in Labelbox or CVAT, by teams of trained research assistants, with spot-checking and inter-annotator IoU > 90% (Zhang et al., 2023, Rahman et al., 2023)
- Masks rasterized or polygonized at full original resolution, then patch indices are assigned
- Bounding-box (Pascal VOC/COCO) and pixel-wise (PNG mask) formats are both provided (Rahman et al., 2024, Zhang et al., 2023)
- Quality assurance includes regular expert cross-checks, enforced annotation criteria, and thresholding on inter-annotator agreement
- Patches containing no cluster pixels are discarded; clusters occupying <1% of patch area are optionally filtered during training set preparation (Zhang et al., 2023)
Additionally, post-processing merges close bounding boxes (within 10 px) to address annotation fragmentation and removes tiny clusters near patch borders. This workflow reduces label noise and augments the training signal for dense cluster aggregations (Zhang et al., 2023).
3. Statistical Properties and Dataset Organization
The dataset exhibits the following empirical distributions:
- Clusters per original image: mean ≈ 11, std ≈ 4; mean mask area: 1,442 px (median), 7,867 px (mean, highly skewed) (Zhang et al., 2023, Zhang et al., 2023)
- After patching, most image patches contain a single cluster, but the annotation is long-tailed, with some patches exhibiting up to five or more clusters (Zhang et al., 2023)
- For multi-scale segmentation, over 85% of patches have <10% of area labeled as cluster, and most have 0–2% coverage (Rahman et al., 2024, Rahman et al., 2023)
Identity and split management:
- 10-fold cross-validation is standard, with stratified assignment ensuring balanced representation of scales and viewpoints per fold (Zhang et al., 2023, Rahman et al., 2024)
- Directory structures are strictly organized by fold, patch scale, and annotation format, with matched image–mask (segmentation) and image–XML (detection) pairs. Metadata on camera height, scale index, and row/column indices are encoded in filenames and supplementary txt/csv split files (Zhang et al., 2023, Rahman et al., 2023)
- The dataset is available via Harvard Dataverse and project-specific URLs, under CC BY-NC-4.0 or CC-BY 4.0 licensing depending on corpus version (Rahman et al., 2024, Zhang et al., 2023, Zhang et al., 2023)
4. Baseline Model Benchmarks and Quantitative Evaluations
Multiple baseline benchmarks have been established for both object detection and semantic segmentation:
Object Detection (10-fold cross-validation, [email protected] IoU unless stated otherwise)
| Model | mAP % | mAP % | Recall % | Recall % |
|---|---|---|---|---|
| VFNet | 41.9 ± 1.9 | 58.3 ± 1.9 | 80.4 ± 0.9 | 96.8 ± 0.4 |
| GFLV2 | 41.6 ± 1.8 | 58.3 ± 1.9 | 79.2 ± 1.3 | 96.2 ± 0.5 |
| PAA | 41.2 ± 1.7 | 58.7 ± 1.9 | 84.1 ± 0.9 | 98.4 ± 0.2 |
| ATSS | 41.8 ± 1.7 | 59.0 ± 1.8 | 80.0 ± 1.0 | 97.0 ± 0.3 |
- Merging clusters (ε=10 px) and eliminating boxes smaller than 1% of patch area increases AP by ≈17 points and recall by ≈16 points (Zhang et al., 2023, Zhang et al., 2023)
Multi-scale Detection Benchmarks (Rahman et al., 2024):
| Model | mAP [%] | Recall [%] | FPS (V100 GPU) |
|---|---|---|---|
| RT-DETR | 61.63 | 92.60 | 72.55 |
| YOLOv7 | 57.33 | 56.43 | 113.64 |
| Faster R-CNN | 57.83 | 78.40 | 48.03 |
Semantic Segmentation (10-fold, mean ± std, patch-based, mIoU/Dice at 2-class aphid/background, V100 or P100 GPU)
| Model | mIoU [%] | mDice [%] | Precision [%] | Recall [%] | FPS |
|---|---|---|---|---|---|
| Fast-SCNN | 71.25±0.59 | 80.87±0.50 | 80.46±1.47 | 81.21±0.67 | 91.66 |
| Small HRNet | 71.62±0.47 | 81.15±0.36 | 80.82±1.20 | 81.64±0.65 | 31.57 |
| BiSeNetV2 | 65.72±0.53 | 75.58±0.55 | 77.47±1.34 | 74.06±1.05 | 53.70 |
| BiSeNetV1 | 59.94±0.54 | 69.22±0.70 | 72.39±1.55 | 67.12±1.40 | 56.19 |
- Fast-SCNN yields the fastest and most accurate segmentation among real-time models (Rahman et al., 2024, Rahman et al., 2023)
- Non-real-time architectures such as DeepLabV3 or PSPNet achieve modestly higher mIoU and mDice but with a substantial reduction in throughput (Rahman et al., 2023)
- Semantic segmentation is empirically found to yield more reliable infestation assessment, as cluster masks provide better exclusion of plant background than bounding boxes—especially for elongated or irregular aphid aggregations (Rahman et al., 2024)
5. Evaluation Metrics and Methodological Recommendations
The dataset and accompanying literature standardize and recommend the following evaluation protocols:
- Object Detection: Average Precision (AP) at IoU thresholds (usually 0.5). Cross-validation is always 10-fold by image, with no leakage between folds (Zhang et al., 2023, Zhang et al., 2023).
- Semantic Segmentation: Mean Intersection over Union (mIoU), mean Dice, class-wise Precision/Recall
- ,
- Cluster Density: For each annotated cluster, a density metric is defined with = estimated aphid count (≥6) and = mask area in px², providing a per-cluster proxy for infestation severity (Zhang et al., 2023).
- Inference Speed: Reported as frames per second (batch size 1, on NVIDIA V100 or P100 GPU) (Rahman et al., 2024, Rahman et al., 2023).
Key methodological recommendations include multi-scale training, class-weighted loss functions for segmentation to address class imbalance (as positive pixels are typically <3% of the area), and comprehensive data augmentation (random flips, jitter, crops) to improve model robustness to real-world variations (Rahman et al., 2023, Rahman et al., 2024). Consistent use of cluster-level annotation (≥6 aphids) and 10-fold cross-validation is advised to fairly benchmark new approaches.
6. Utility, Limitations, and Accessibility
The dataset establishes a reliable, open-access foundation for research in fine-grained and small-object detection and segmentation in precision agriculture. It is directly applicable to tasks such as:
- Automated deployment of targeted pesticide application (variable rate spraying)
- Ecological monitoring of aphid population dynamics
- Benchmarking of small-object and dense-object detection architectures
Reported limitations include:
- Occlusion: aphid clusters often shielded by overlapping leaves, causing missed annotations
- Class imbalance: positive (cluster) pixels are highly sparse
- Lighting: high solar variability, strong shadows, and leaf background occlude visual contrast
- Morphology: elongated, linear or irregularly shaped clusters can challenge bounding-box-based models (Rahman et al., 2024)
- Domain transfer: dataset is strictly for sorghum/melanaphis sacchari; generalization to other crops/species is not evaluated
Data is available for non-commercial research use (CC BY-NC-4.0 or CC BY 4.0) via Harvard Dataverse and project-specific repositories:
- https://doi.org/10.7910/DVN/N3YJXG
- https://www3.cs.torontomu.ca/~wang/aphid_cluster_dataset.zip Researchers are expected to cite the relevant dataset publication in derived works.
7. Impact in Automatic Pest Monitoring and Future Directions
The Sorghum Aphid Cluster Dataset has catalyzed comparative benchmarking of both conventional and transformer-based object detectors, as well as real-time semantic segmentation architectures, providing reproducible baselines for both detection and infestation quantification. Results have established that semantic segmentation is more effective than detection for quantifying true cluster area, as bounding boxes often overestimate coverage due to unavoidable inclusion of leaf background, especially for slender, edge-aligned clusters (Rahman et al., 2024).
A plausible implication is that future work should continue to exploit and extend the semantic segmentation paradigm, leveraging multi-scale, class-balancing, and occlusion-robust strategies to enable deployment in other crops and environments. Additionally, as pest pressure evolves and autonomous intervention platforms mature, the dataset structure and annotation guidance documented here will remain directly relevant for new taxa and agricultural deployments.
References: