CountingFruit: Real-Time 3D Fruit Counting with Language-Guided Semantic Gaussian Splatting

Published 1 Jun 2025 in cs.CV, cs.AI, and cs.MM | (2506.01109v1)

Abstract: Accurate fruit counting in real-world agricultural environments is a longstanding challenge due to visual occlusions, semantic ambiguity, and the high computational demands of 3D reconstruction. Existing methods based on neural radiance fields suffer from low inference speed, limited generalization, and lack support for open-set semantic control. This paper presents FruitLangGS, a real-time 3D fruit counting framework that addresses these limitations through spatial reconstruction, semantic embedding, and language-guided instance estimation. FruitLangGS first reconstructs orchard-scale scenes using an adaptive Gaussian splatting pipeline with radius-aware pruning and tile-based rasterization for efficient rendering. To enable semantic control, each Gaussian encodes a compressed CLIP-aligned language embedding, forming a compact and queryable 3D representation. At inference time, prompt-based semantic filtering is applied directly in 3D space, without relying on image-space segmentation or view-level fusion. The selected Gaussians are then converted into dense point clouds via distribution-aware sampling and clustered to estimate fruit counts. Experimental results on real orchard data demonstrate that FruitLangGS achieves higher rendering speed, semantic flexibility, and counting accuracy compared to prior approaches, offering a new perspective for language-driven, real-time neural rendering across open-world scenarios.

Abstract PDF Chat (Pro)

Summary

Real-Time 3D Fruit Counting with Language-Guided Semantic Gaussian Splatting

The paper "CountingFruit: Real-Time 3D Fruit Counting with Language-Guided Semantic Gaussian Splatting" introduces FruitLangGS, an advanced framework for fruit counting in agricultural environments through real-time 3D scene reconstruction and semantic filtering. The focus is on overcoming existing limitations in neural radiance field techniques, which suffer from lengthy inference times, poor generalization, and inadequate support for open-vocabulary semantic control.

FruitLangGS tackles these challenges using a unique pipeline involving adaptive Gaussian splatting for efficient scene reconstruction, language-guided semantic embedding for flexible querying, and direct 3D space filtering for precise instance estimation. The method demonstrates an impressive rendering speed of over 300 FPS, making it suitable for real-time agricultural robotics operations.

FruitLangGS begins by reconstructing orchard-scale scenes using an adaptive Gaussian splatting pipeline. This method incorporates opacity-aware pruning and load-aware tile scheduling, significantly bolstering rendering efficiency. Semantic embedding leverages compressed language features aligned with CLIP vectors, allowing for prompt-driven selection. At inference time, semantic filtering directly occurs in 3D space, facilitated by compressed language embeddings without relying on conventional 2D segmentation masks or view-level fusion approaches.

The experimental evaluations reveal that FruitLangGS excels in both rendering speed and fruit counting accuracy compared to prior state-of-the-art algorithms such as FruitNeRF and 3DGS. Notably, FruitLangGS achieved high recall rates, with 98.6% accuracy in instance-level fruit counting across multiple tree datasets, a testament to its robust semantic filtering and clustering strategies.

Ablation studies confirm the significant role that language-conditioned semantic embedding plays in achieving high fruit counting accuracy, with disadvantages evident when this module is altered or removed. The semantic embedding facilitates open-vocabulary filtering, allowing for adaptability across different fruit species and presenting potential for multi-class separation and object recognition.

The implications of this research lie predominantly in the field of smart agriculture and precision farming. By providing an efficient and adaptable method for fruit counting, FruitLangGS aids yield prediction and harvest planning, promoting critical decision-making processes in precision agriculture. The ability to generalize across various fruit scenes without specific retraining underscores the relevance of FruitLangGS in diverse agricultural scenarios.

Future directions may include adapting the framework to support embodied AI for agricultural simulation and extending vocabulary support for a wider range of crops and field objects. This extension could broaden the scope of applications in precision agriculture, offering solutions that are both scalable and increasingly nuanced in crop management. This paper reflects a significant step towards integrating language-driven 3D perception in a practical and efficient manner within the agricultural domain.