PlantDoc: A Dataset for Visual Plant Disease Detection (1911.10317v1)

Published 23 Nov 2019 in cs.CV and eess.IV

Abstract: India loses 35% of the annual crop yield due to plant diseases. Early detection of plant diseases remains difficult due to the lack of lab infrastructure and expertise. In this paper, we explore the possibility of computer vision approaches for scalable and early plant disease detection. The lack of availability of sufficiently large-scale non-lab data set remains a major challenge for enabling vision based plant disease detection. Against this background, we present PlantDoc: a dataset for visual plant disease detection. Our dataset contains 2,598 data points in total across 13 plant species and up to 17 classes of diseases, involving approximately 300 human hours of effort in annotating internet scraped images. To show the efficacy of our dataset, we learn 3 models for the task of plant disease classification. Our results show that modelling using our dataset can increase the classification accuracy by up to 31%. We believe that our dataset can help reduce the entry barrier of computer vision techniques in plant disease detection.

Citations (332)

View on Semantic Scholar

Summary

The paper introduces PlantDoc, a field-realistic dataset that boosts plant disease detection accuracy by up to 31% over lab-based images.
It employs both cropped and uncropped image experiments to isolate leaf features and validate model performance in complex natural environments.
The study leverages advanced models like Faster R-CNN with InceptionResNetV2, underscoring practical improvements for mobile and in-field disease diagnostics.

Overview of "PlantDoc: A Dataset for Visual Plant Disease Detection"

The paper "PlantDoc: A Dataset for Visual Plant Disease Detection" addresses the significant challenge faced by the agricultural sector: the early detection of plant diseases. The authors aim to harness the potential of computer vision techniques to develop scalable solutions for plant disease detection, particularly in resource-constrained environments. They introduce "PlantDoc," a dataset comprising 2,598 images across 13 plant species and up to 17 disease classifications, extracted from real-world conditions rather than lab environments. This annotation process involved around 300 hours of manual labor.

Dataset Contribution

The Paper emphasizes the lack of existing large-scale datasets that reflect field conditions and the subsequent challenge this poses for deploying computer vision models in practical scenarios. Existing datasets, like PlantVillage, generally consist of images obtained in lab environments, limiting their applicability in real-world conditions. "PlantDoc" fills this gap by providing a dataset obtained from natural settings with images that include background noise typical in real environments.

Experimental Evaluation

Classification Accuracy: The authors benchmark the dataset using three models, revealing an increase in classification accuracy by up to 31% compared to models based solely on pre-existing datasets. This is particularly significant, given the real-world variations present in the new dataset compared to controlled conditions.
Cropped vs. Uncropped Dataset: The paper encompasses experiments using cropped variants of the dataset to isolate leaf features, demonstrating that fine-tuning models with PlantDoc images significantly enhances accuracy over traditional datasets.
Leaf Detection: The paper evaluates state-of-the-art detection models, such as Faster R-CNN with InceptionResNetV2, for detecting diseased regions in the images. This experimentation illustrates the value of pre-training on datasets with real-world variability, reinforcing the importance of using a dataset like PlantDoc for this nuanced detection task.

Implications and Future Directions

The dataset not only provides an effective resource for training computer vision models but also sets a new benchmark for plant disease detection using visual data. The authors propose that the reduced entry barriers, enabled by PlantDoc, could facilitate the development of applications capable of robust disease identification in situ using mobile technology, benefiting farmers globally, particularly those in regions lacking laboratory diagnostic facilities.

The authors acknowledge the limitations posed by the relatively small class sizes and potential misclassifications due to the visual similarity between certain diseases. Future work could focus on expanding the dataset and incorporating advanced techniques like deep segmentation for even more precise leaf and disease characterization.

In conclusion, PlantDoc represents a significant stride forward in equipping computer vision systems for practical agricultural applications, bridging the gap between theoretical model capabilities and their real-world deployment. This dataset lays the groundwork for exploration into more efficient disease detection methodologies and could stimulate further research into robust AI-driven solutions within agriculture.

PDF Markdown