A comparison between single-stage and two-stage 3D tracking algorithms for greenhouse robotics (2404.12963v1)
Abstract: With the current demand for automation in the agro-food industry, accurately detecting and localizing relevant objects in 3D is essential for successful robotic operations. However, this is a challenge due the presence of occlusions. Multi-view perception approaches allow robots to overcome occlusions, but a tracking component is needed to associate the objects detected by the robot over multiple viewpoints. Multi-object tracking (MOT) algorithms can be categorized between two-stage and single-stage methods. Two-stage methods tend to be simpler to adapt and implement to custom applications, while single-stage methods present a more complex end-to-end tracking method that can yield better results in occluded situations at the cost of more training data. The potential advantages of single-stage methods over two-stage methods depends on the complexity of the sequence of viewpoints that a robot needs to process. In this work, we compare a 3D two-stage MOT algorithm, 3D-SORT, against a 3D single-stage MOT algorithm, MOT-DETR, in three different types of sequences with varying levels of complexity. The sequences represent simpler and more complex motions that a robot arm can perform in a tomato greenhouse. Our experiments in a tomato greenhouse show that the single-stage algorithm consistently yields better tracking accuracy, especially in the more challenging sequences where objects are fully occluded or non-visible during several viewpoints.
- G. Kootstra, X. Wang, P. M. Blok, J. Hemming, and E. van Henten, “Selective Harvesting Robotics: Current Research, Trends, and Future Directions,” Current Robotics Reports, vol. 2, no. 1, pp. 95–104, Mar. 2021. [Online]. Available: https://doi.org/10.1007/s43154-020-00034-1
- J. Crowley, “Dynamic world modeling for an intelligent mobile robot using a rotating ultra-sonic ranging device,” in Proceedings. 1985 IEEE International Conference on Robotics and Automation, vol. 2. St. Louis, MO, USA: Institute of Electrical and Electronics Engineers, 1985, pp. 128–135. [Online]. Available: http://ieeexplore.ieee.org/document/1087380/
- J. Elfring, S. van den Dries, M. van de Molengraft, and M. Steinbuch, “Semantic world modeling using probabilistic multiple hypothesis anchoring,” Robotics and Autonomous Systems, vol. 61, no. 2, pp. 95–105, Feb. 2013. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S0921889012002163
- B. Arad, J. Balendonck, R. Barth, O. Ben‐Shahar, Y. Edan, T. Hellström, J. Hemming, P. Kurtser, O. Ringdahl, T. Tielen, and B. v. Tuijl, “Development of a sweet pepper harvesting robot,” Journal of Field Robotics, vol. n/a, no. n/a, 2020, _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/rob.21937. [Online]. Available: https://www.onlinelibrary.wiley.com/doi/abs/10.1002/rob.21937
- A. K. Burusa, J. Scholten, D. R. Rincon, X. Wang, E. J. van Henten, and G. Kootstra, “Efficient Search and Detection of Relevant Plant Parts using Semantics-Aware Active Vision,” June 2023, arXiv:2306.09801 [cs]. [Online]. Available: http://arxiv.org/abs/2306.09801
- A. Persson, P. Z. D. Martires, A. Loutfi, and L. De Raedt, “Semantic Relational Object Tracking,” IEEE Transactions on Cognitive and Developmental Systems, vol. 12, no. 1, pp. 84–97, Mar. 2020, arXiv: 1902.09937. [Online]. Available: http://arxiv.org/abs/1902.09937
- D. Rapado-Rincón, E. J. van Henten, and G. Kootstra, “Development and evaluation of automated localisation and reconstruction of all fruits on tomato plants in a greenhouse based on multi-view perception and 3D multi-object tracking,” Biosystems Engineering, vol. 231, pp. 78–91, July 2023. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1537511023001162
- A. Bewley, Z. Ge, L. Ott, F. Ramos, and B. Upcroft, “Simple Online and Realtime Tracking,” 2016 IEEE International Conference on Image Processing (ICIP), pp. 3464–3468, Sept. 2016, arXiv: 1602.00763. [Online]. Available: http://arxiv.org/abs/1602.00763
- N. Wojke, A. Bewley, and D. Paulus, “Simple online and realtime tracking with a deep association metric,” in 2017 IEEE International Conference on Image Processing (ICIP), Sept. 2017, pp. 3645–3649, iSSN: 2381-8549.
- Y. Zhang, C. Wang, X. Wang, W. Zeng, and W. Liu, “FairMOT: On the Fairness of Detection and Re-identification in Multiple Object Tracking,” International Journal of Computer Vision, vol. 129, no. 11, pp. 3069–3087, Nov. 2021. [Online]. Available: https://doi.org/10.1007/s11263-021-01513-4
- M. Halstead, C. McCool, S. Denman, T. Perez, and C. Fookes, “Fruit Quantity and Ripeness Estimation Using a Robotic Vision System,” IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 2995–3002, Oct. 2018. [Online]. Available: https://ieeexplore.ieee.org/document/8392450/
- R. Kirk, M. Mangan, and G. Cielniak, “Robust Counting of Soft Fruit Through Occlusions with Re-identification,” in Computer Vision Systems, ser. Lecture Notes in Computer Science, M. Vincze, T. Patten, H. I. Christensen, L. Nalpantidis, and M. Liu, Eds. Cham: Springer International Publishing, 2021, pp. 211–222.
- M. Halstead, A. Ahmadi, C. Smitt, O. Schmittmann, and C. McCool, “Crop Agnostic Monitoring Driven by Deep Learning,” Frontiers in Plant Science, vol. 12, 2021. [Online]. Available: https://www.frontiersin.org/article/10.3389/fpls.2021.786702
- N. Hu, D. Su, S. Wang, P. Nyamsuren, and Y. Qiao, “LettuceTrack: Detection and tracking of lettuce for robotic precision spray in agriculture,” Frontiers in Plant Science, vol. 13, Sept. 2022, publisher: Frontiers. [Online]. Available: https://www.frontiersin.org/journals/plant-science/articles/10.3389/fpls.2022.1003243/full
- J. Villacrés, M. Viscaino, J. Delpiano, S. Vougioukas, and F. Auat Cheein, “Apple orchard production estimation using deep learning strategies: A comparison of tracking-by-detection algorithms,” Computers and Electronics in Agriculture, vol. 204, p. 107513, Jan. 2023. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0168169922008213
- D. Rapado-Rincón, E. J. van Henten, and G. Kootstra, “MinkSORT: A 3D deep feature extractor using sparse convolutions to improve 3D multi-object tracking in greenhouse tomato plants,” July 2023, arXiv:2307.05219 [cs]. [Online]. Available: http://arxiv.org/abs/2307.05219
- D. Rapado-Rincon, H. Nap, K. Smolenova, E. J. van Henten, and G. Kootstra, “MOT-DETR: 3D Single Shot Detection and Tracking with Transformers to build 3D representations for Agro-Food Robots,” Feb. 2024, arXiv:2311.15674 [cs]. [Online]. Available: http://arxiv.org/abs/2311.15674
- “ultralytics/ultralytics: NEW - YOLOv8 in PyTorch > ONNX > OpenVINO > CoreML > TFLite.” [Online]. Available: https://github.com/ultralytics/ultralytics
- N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-End Object Detection with Transformers,” May 2020, arXiv:2005.12872 [cs]. [Online]. Available: http://arxiv.org/abs/2005.12872