Robust Fruit Counting: Combining Deep Learning, Tracking, and Structure from Motion (1804.00307v2)

Published 1 Apr 2018 in cs.CV

Abstract: We present a novel fruit counting pipeline that combines deep segmentation, frame to frame tracking, and 3D localization to accurately count visible fruits across a sequence of images. Our pipeline works on image streams from a monocular camera, both in natural light, as well as with controlled illumination at night. We first train a Fully Convolutional Network (FCN) and segment video frame images into fruit and non-fruit pixels. We then track fruits across frames using the Hungarian Algorithm where the objective cost is determined from a Kalman Filter corrected Kanade-Lucas-Tomasi (KLT) Tracker. In order to correct the estimated count from tracking process, we combine tracking results with a Structure from Motion (SfM) algorithm to calculate relative 3D locations and size estimates to reject outliers and double counted fruit tracks. We evaluate our algorithm by comparing with ground-truth human-annotated visual counts. Our results demonstrate that our pipeline is able to accurately and reliably count fruits across image sequences, and the correction step can significantly improve the counting accuracy and robustness. Although discussed in the context of fruit counting, our work can extend to detection, tracking, and counting of a variety of other stationary features of interest such as leaf-spots, wilt, and blossom.

Citations (98)

View on Semantic Scholar

Summary

The paper presents a multi-stage pipeline that combines deep segmentation, tracking, and 3D localization to accurately count fruits in image sequences.
The methodology employs a Fully Convolutional Network for segmentation, a Hungarian algorithm with Kalman filtering for tracking, and SfM for spatial accuracy.
Experimental results on apple and orange datasets show substantial error reduction and improved counting reliability under diverse orchard conditions.

Overview of Robust Fruit Counting: Combining Deep Learning, Tracking, and Structure from Motion

The paper "Robust Fruit Counting: Combining Deep Learning, Tracking, and Structure from Motion" presents a sophisticated pipeline designed to address the challenge of counting visible fruits in a sequence of images, utilizing a monocular camera. This research leverages advanced computer vision techniques, integrating deep learning for segmentation with robust tracking and 3D localization methods to enhance counting accuracy and reliability, even in complex and unstructured farm environments.

Technical Contributions

This work introduces a fruit counting pipeline constituting three major stages: deep segmentation, frame-to-frame tracking, and 3D localization.

Deep Segmentation:
- The pipeline begins with a Fully Convolutional Network (FCN), which segments video frame images into fruit and non-fruit pixels. This stage is crucial for identifying candidate fruit regions and facilitates reliable object detection across different illumination conditions and varied orchard environments.
Tracking:
- The paper employs the Hungarian Algorithm to track the identified fruit objects across consecutive frames. This is enriched by incorporating a Kalman Filter with a KLT tracker for cost calculation. This approach actively addresses common issues in video-based counting, such as occlusions and motion variances.
3D Localization:
- Utilizing Structure from Motion (SfM), the pipeline further refines tracking estimations by calculating the relative 3D positions and sizes of the fruits. This step is implemented to reject double-counting errors and outliers, which significantly refines accuracy.

Evaluation and Results

The researchers tested the algorithm on two distinct datasets—orange and apple sequences—characterized by varying levels of brightness, occlusion, and structural arrangement of trees. The experimental results underlined the robustness of the proposed method, with substantial improvements in accuracy post 3D localization corrections. For instance, the error mean for oranges was notably reduced from 17.2% (uncorrected) to -0.2% (corrected), a clear indication of the efficacy of the 3D correction step.

Contributions and Implications

Generalizability and Robustness: The incorporation of deep learning models enhances the pipeline’s ability to generalize across different fruit types and orchard layouts, making it adaptable for various agricultural applications beyond fruit counting.
3D Spatial Analysis: The exploitation of SfM provides spatial context which mitigates typical pitfalls like background tree interference, a common issue in monocular setups.
Practical Applications: The achieved counting accuracy can significantly aid in optimizing agricultural management decisions ranging from labor allocation to yield estimation.

Future Directions

Potential areas for future research include expanding this technique to count other stationary agricultural features, refining absolute size estimation by incorporating IMU data or utilizing known fruit dimensions, and testing the pipeline's efficacy on an even broader class of objects and environmental conditions. Such advances would further consolidate the applicability of this approach in global agricultural practices, especially in low-resource settings where sensor availability might be limited to basic cameras.

This paper contributes a viable solution in the field of agricultural automation, demonstrating a fusion of machine learning and traditional computer vision methods to solve a fundamental counting challenge with promising accuracy and consistency across varied environments.

PDF Markdown

Related Papers

YouTube

Show All Videos