Grape detection, segmentation and tracking using deep neural networks and three-dimensional association (1907.11819v3)

Published 26 Jul 2019 in cs.CV

Abstract: Agricultural applications such as yield prediction, precision agriculture and automated harvesting need systems able to infer the crop state from low-cost sensing devices. Proximal sensing using affordable cameras combined with computer vision has seen a promising alternative, strengthened after the advent of convolutional neural networks (CNNs) as an alternative for challenging pattern recognition problems in natural images. Considering fruit growing monitoring and automation, a fundamental problem is the detection, segmentation and counting of individual fruits in orchards. Here we show that for wine grapes, a crop presenting large variability in shape, color, size and compactness, grape clusters can be successfully detected, segmented and tracked using state-of-the-art CNNs. In a test set containing 408 grape clusters from images taken on a trellis-system based vineyard, we have reached an F 1 -score up to 0.91 for instance segmentation, a fine separation of each cluster from other structures in the image that allows a more accurate assessment of fruit size and shape. We have also shown as clusters can be identified and tracked along video sequences recording orchard rows. We also present a public dataset containing grape clusters properly annotated in 300 images and a novel annotation methodology for segmentation of complex objects in natural images. The presented pipeline for annotation, training, evaluation and tracking of agricultural patterns in images can be replicated for different crops and production systems. It can be employed in the development of sensing components for several agricultural and environmental applications.

Citations (256)

View on Semantic Scholar

Summary

The paper demonstrates that Mask R-CNN achieves a 0.91 F1-score in grape cluster segmentation, establishing a robust deep learning pipeline.
It employs an interactive graph matching technique for efficient annotation and integrates 3-D association for precise tracking across video frames.
The study introduces the WGISD dataset and compares CNN architectures, providing actionable insights for scalable yield prediction in viticulture.

Analyzing Grape Detection, Segmentation, and Tracking with Deep Neural Networks

The paper presented by Santos et al. addresses critical technical challenges in the agriculture sector, specifically focusing on grape detection, segmentation, and tracking using advanced deep neural network architectures. This research is crucial given the complexity of automating agricultural tasks compared to industrial environments due to varying field conditions and the heterogeneous nature of crops.

The researchers employ deep convolutional neural networks (CNNs), notably Mask R-CNN and YOLO variants, to successfully detect, segment, and track grape clusters in vineyard environments. Mask R-CNN was favored for its robust ability to perform instance segmentation, reaching an impressive $F_1$ -score of up to 0.91 on a dataset of 408 grape clusters captured from a trellis-system vineyard. This score highlights the paper's success in finely segmenting grape clusters, allowing precise assessment of fruit size and shape across different grape varieties with varied morphologies.

The methodology presented addresses the entire pipeline from annotation to tracking, providing a comprehensive approach to tackling pattern recognition challenges in outdoor settings. The authors introduce the Embrapa Wine Grape Instance Segmentation Dataset (WGISD), which includes images and annotations that facilitate deep learning tasks such as object detection and instance segmentation. This dataset comprises 300 images with 4,432 annotated grape clusters and supports both rectangular boxes and detailed pixel-level segmentation.

A novel aspect of the paper is the annotation methodology that incorporates interactive image segmentation using graph matching. This technique efficiently generates object masks, thus simplifying the labor-intensive annotation process for complex natural scenes.

The research evaluates state-of-the-art CNN architectures: Mask R-CNN, YOLOv2, and YOLOv3. Results demonstrated that Mask R-CNN outperformed YOLO networks, particularly at higher intersection over union (IoU) thresholds, signifying better coverage and agreement in segmentation tasks, critical for accurate yield prediction application.

A key component of the paper is the spatial integration of detection results using three-dimensional (3-D) association. This is accomplished via structure-from-motion (SfM), enabling precise localization and fruit counting by integrating detections from multiple camera perspectives. By constructing a directed graph where nodes represent detected instances across video frames, the researchers track grape clusters robustly, addressing issues like double-counting and occlusions effectively.

While the paper leverages mature deep learning techniques for object detection and segmentation, it also speculates on potential enhancements such as integrating simultaneous localization and mapping (SLAM) algorithms for real-time applications. Using ordinary panoramic RGB cameras, their approach is both cost-effective and versatile, suggesting broader applications in various trellis-based cropping systems. Future extensions could explore yield estimation through regression models correlating visible fruit counts, adapting to different agricultural contexts, varieties, or climatic conditions.

In conclusion, the paper showcases innovative methodologies and robust experimental results that contribute significantly to the field of automated agriculture. It offers substantial potential for practical implementations in viticulture and beyond, advancing the automation of crop monitoring, yield estimation, and potentially informing breeding programs. As the landscape of artificial intelligence and its application in agriculture continues to evolve, this research establishes a solid foundation for future endeavors in precision farming technologies.

PDF Markdown

Related Papers

YouTube

Show All Videos