Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Global Wheat Head Detection (GWHD) dataset: a large and diverse dataset of high resolution RGB labelled images to develop and benchmark wheat head detection methods (2005.02162v2)

Published 25 Apr 2020 in cs.CV, cs.LG, and stat.ML

Abstract: Detection of wheat heads is an important task allowing to estimate pertinent traits including head population density and head characteristics such as sanitary state, size, maturity stage and the presence of awns. Several studies developed methods for wheat head detection from high-resolution RGB imagery. They are based on computer vision and machine learning and are generally calibrated and validated on limited datasets. However, variability in observational conditions, genotypic differences, development stages, head orientation represents a challenge in computer vision. Further, possible blurring due to motion or wind and overlap between heads for dense populations make this task even more complex. Through a joint international collaborative effort, we have built a large, diverse and well-labelled dataset, the Global Wheat Head detection (GWHD) dataset. It contains 4,700 high-resolution RGB images and 190,000 labelled wheat heads collected from several countries around the world at different growth stages with a wide range of genotypes. Guidelines for image acquisition, associating minimum metadata to respect FAIR principles and consistent head labelling methods are proposed when developing new head detection datasets. The GWHD is publicly available at http://www.global-wheat.com/ and aimed at developing and benchmarking methods for wheat head detection.

Citations (181)

Summary

  • The paper introduces the Global Wheat Head Detection (GWHD) dataset, the largest open-source labeled dataset for field plant phenotyping, containing 4,700 images and 190,000 labeled wheat heads.
  • Collected from nine institutions across seven countries, the dataset harmonizes images with diverse genotypes, growth stages, and environmental conditions for robust model training.
  • This large-scale dataset facilitates standardized training processes, aids comparative analysis of detection methods, and promotes adherence to FAIR principles for broad research use.

A Comprehensive Overview of the Global Wheat Head Detection Dataset

The paper presents the Global Wheat Head Detection (GWHD) dataset, a significant resource for the development and benchmarking of methods for wheat head detection from high-resolution RGB images. The dataset encompasses 4,700 images featuring 190,000 labeled wheat heads collected from multiple countries at different growth stages with diverse genotypes. This initiative addresses the complexities inherent in wheat head detection due to genotype variability, development stages, environmental influence, and head orientation. The dataset aims to overcome limitations of previous studies that relied on sparse and localized data sources.

Deep learning methods have become the cornerstone of computer vision applications, particularly in object detection, semantic segmentation, and plant phenotyping. However, their robustness is often questioned when extrapolating models trained on limited datasets to new conditions. A large dataset like GWHD facilitates standardized training processes and aids comparative analyses across diverse wheat phenotyping conditions, addressing challenges related to genotypic differences and regional variations.

Key Features and Dataset Composition

The GWHD dataset comprises contributions from nine institutions spread across seven countries, employing different experimental setups and varying pedoclimatic conditions. Each sub-dataset offers unique parameters such as row spacing, sowing density, and targeted stages of wheat growth. Diversity in image acquisition settings is highlighted, with cameras operated from various platforms capturing images at varying ground sampling distances (GSDs).

The process of harmonizing these sub-datasets involved manual inspection for proper interpretability, rescaling images to maintain consistent resolution at the wheat head level, and cropping them into square patches. Labelling utilized both manual and semi-automatic techniques, achieving a notable throughput increase through a weakly supervised deep learning framework. The comprehensive labelling process supports consistency and accuracy across the entire dataset.

Statistical Evaluation and Dataset Comparison

The GWHD dataset stands out as the largest open-source labeled dataset currently available for field plant phenotyping, focusing on wheat head detection. A statistical evaluation reveals a skewed Gaussian distribution in bounding box dimensions, affirming the variation across sub-datasets which reflects real-world diversity. Compared to existing datasets such as MinneApple and MS COCO, the GWHD dataset offers a unique complexity with high occurrence of overlapping and occluded wheat heads.

Practical Implications for Phenotyping and Image Acquisition

The dataset provides crucial guidelines for image acquisition, emphasizing stages where wheat heads are fully emerged and upright. For accurate wheat head density estimation, a near-nadir viewing direction and appropriate camera height are recommended to limit head overlap and increase sample area. These guidelines are essential for minimizing errors in phenotype evaluation and ensuring the collected data are robust for phenotyping research.

FAIR Principles and Future Dataset Expansion

Adhering to FAIR principles (Findable, Accessible, Interoperable, and Reusable) is pivotal. The paper outlines a standardized set of metadata essential for harmonization and broad data application, advocating for comprehensive data contextualization. Future expansion of the GWHD dataset is encouraged to increase genetic and geographic diversity and address current regional representation gaps. Such expansion would enhance its utility in wheat breeding, classification, and segmentation tasks.

Concluding Thoughts and Contributions

The GWHD dataset serves as a foundation for improved methods in wheat head detection, potentially influencing various agronomic practices. Its large-scale collaborative nature and commitment to FAIR principles mark significant progress toward standardized phenotyping methods. The paper foresees advancements spurred by an open machine learning competition designed to benchmark detection methods, fostering an inclusive environment for innovation and collaboration in the field of plant phenotyping.