Botany-Bot: Integrated Robotic Plant Phenotyping

Updated 3 July 2026

Botany-Bot is an integrated system combining robotics, machine learning, plant sciences, and multimodal data fusion to advance plant phenotyping, taxonomy, and habitat monitoring.
It leverages multimodal deep networks, 3D digital twins, and autonomous surveyors to enhance species identification, plant counting, and digital reconstruction with significant accuracy gains.
The system supports bio-hybrid growth control through adaptive LED arrays, modular robotic actuation, and ontology-driven knowledge, enabling dynamic plant–robot interaction in diverse environments.

A Botany-Bot is an integrated system at the intersection of robotics, machine learning, plant sciences, and multimodal data fusion, engineered for advanced plant phenotyping, taxonomy, habitat monitoring, and bio-hybrid growth control. Architectures under this umbrella range from digital twin creation via 3D vision and robot-manipulated imaging to field-deployable autonomous surveyors, ontology-driven knowledge systems, and bio-hybrid assemblies coupling plant tropisms with distributed robotic actuation. Botany-Bots leverage modular perception pipelines, learning-based control, and botanical domain knowledge to enable precise plant observation, classification, counting, and environment adaptation across laboratory and ecological field scenarios.

1. Multimodal Classification and Taxonomy Embedding

Botany-Bots implement high-accuracy species identification by integrating visual, spatial, temporal, and ecological context through multimodal deep networks. Architectural blueprints draw from models such as "Digital Taxonomist," utilizing a late-fusion classifier where each modality—raw RGB image $\mathbf{I} \in \mathbb{R}^{3\times H \times W}$ , geolocation $(\text{longitude}, \text{latitude}, \text{altitude})$ , day-of-year $t$ , and (optionally) multispectral satellite patch $S \in \mathbb{R}^{B \times H' \times W'}$ —is encoded into a shared feature space via specialized branches (CNN for images, small MLPs for context) and fused at the logits level. Biological taxonomy is respected by computing posterior distributions at species-level and successively marginalizing over genus, family, and higher taxonomic ranks:

$p_i^{(l)} = \sum_{j \in \mathrm{children}(i)} p_j^{(l-1)},$

with a global loss

$\mathcal{L}_{\mathrm{hier}} = \sum_{l=0}^L \mathcal{L}_{\mathrm{CE}}(p^{(l)}, y^{(l)}).$

This structuring naturally supports cross-level query answering and penalizes taxonomic misclassifications appropriately.

Empirically, this strategy achieves top-1 per-species accuracy of 69.8% (vs. 62.5% for image-only), and similar gains at top-3/5 as well as macro-averaged precision/recall, when evaluated over nearly 1,000 Swiss species ( $C=977$ , 56,608 images) in stratified 5-fold CV (Lutio et al., 2021). Incorporating geo-temporal context and hierarchy-based marginalization both yield substantial accuracy improvement, particularly in long-tailed data distributions.

2. 3D Digital Twins and Active Inspection

For high-resolution reconstruction and inspection of occluded plant organs, Botany-Bots combine multi-view imaging, 3D Gaussian Splat models, and manipulation via robot arms. System implementation comprises a lightbox-turntable for controlled illumination; stereo RGB-D cameras (e.g., ZED 2) at varying elevations; precision turntable with ArUco calibration; and a 7-DOF robot arm (e.g., ABB YuMi) equipped with a ring-shaped end-effector. Acquisition proceeds in discrete angular increments, capturing synchronized images whose camera poses are rigorously registered ( $\pm0.1^\circ$ via ArUco tags).

Dense 3D models are parameterized as ellipsoidal Gaussian splats $(\mu_i, \Sigma_i, w_i)$ composited into RGB-D views. An L1 alpha loss penalizes opacity mismatch with learned foreground masks to avoid “floaters” under multiview lighting:

$L_\alpha = \lambda \sum_u | \mathrm{Accum}(u) - M(u) |,\ \mathrm{Accum}(u) = \sum_i \alpha_i(u).$

Multi-view 2D SAM-2 masks are back-projected and agglomerated for 3D segmentation (GARField), achieving leaf segmentation accuracy of 90.8% and leaf detection of 86.2% (mean IoU metric), with length/width estimates at 2.0 cm MAE (23% rel.) (Adebola et al., 20 Oct 2025).

Robotic leaf manipulation exploits learned 3D centroids and principal axes, planning rotation and gentle lift/push motions to reveal underleaf structures. Manipulation success exceeds 77%, and the system captures high-resolution, leaf-indexed images—extending digital twin annotation to otherwise hidden plant features.

3. Taxonomy-Aware Plant Counting

Botany-Bots deployed for quantitative ecological monitoring integrate fine-grained, taxonomy-resolved instance counting over diverse botanical scales. "TPC-268" serves as a reference dataset: 10,473 images, over 375,000 annotations, with full Linnaean strings (kingdom $(\text{longitude}, \text{latitude}, \text{altitude})$ 0 species) and organ-level labels (leaf, flower, fruit, stoma) (Xu et al., 22 Mar 2026). Evaluation tasks implement class-agnostic and taxonomy-aware counting, with scale- and rank-consistent train/val/test splits to rigorously assess both intraspecific and cross-taxa generalization.

Benchmark approaches include density regression (CSRNet: MAE 22.1; DM-Count: 20.3), local-exemplar matching (LOCA: 16.8 MAE, best overall), detection-based summing (Faster R-CNN: MAE 40.9), and transformer variants (CACViT, CountTR). Incorporation of taxonomy via text-prompted features yields a further MAE reduction (~2 points). These benchmarks reveal that hierarchical context and phylotaxa-aware architectures enhance model robustness over species, genera, and families, though Val $(\text{longitude}, \text{latitude}, \text{altitude})$ 1Test transfer remains challenging at higher taxonomic novelty and out-of-distribution scenarios.

4. Field Robotics and Autonomous Habitat Monitoring

Botany-Bots extend to mobile platforms equipped for in-situ habitat assessment. Systems such as the ANYmal C quadruped (50 kg, 12-DOF, 2–3 hour runtime, integrated Nvidia-GPU) autonomously traverse challenging alpine scree based on lidar/GPS SLAM, executing predefined Hamiltonian grid traversals and capturing synchronized RGB-D data from multiple on-board cameras. Object detection leverages YOLOv9 “gelan-c” (10M params, CSP backbone), trained on local scree flora datasets with on-the-fly augmentation. Detection performance across six species in Valfurva/Italy achieves a mean [email protected] of 0.726 (macro-F1 = 0.702), with per-class recall varying from 0.429 to 0.755 (Benedittis et al., 16 Nov 2025).

Field campaigns demonstrate ~60% reduction in survey time per plot versus manual survey (25–35 min vs 60–90 min), with robot–botanist workflows synchronized via unique mission IDs, GPS/odometry logs, and integrated vegetation-cover mask computation. This protocol allows both traditional phytosociological assessments and high-throughput, reproducible data capture suitable for long-term monitoring and rapid data acquisition under hazardous conditions.

5. Bio-Hybrid Systems and Growth Control

Beyond passive observation, Botany-Bots encompass distributed plant–robot hybrid assemblies. In "flora robotica," architectural artifacts grow via interactions between plants (e.g., climbing beans, poplars, Dracaena) and modular robotic braiding units. Robotics fabricate filament scaffolds (driver + switch modules), networked with sensory-actuator nodes that steer plant development through spatially distributed blue/far-red LED arrays and hormone dispensers. Feedback loops, running on decentralized mesh buses, execute adaptive tip-steering: local proximity triggers phototropic attraction with blue LEDs, while user-defined repulsion is enforced by far-red.

Scaffold growth is driven by a Vascular Morphogenesis Controller, where branching probabilities are computed by softmaxing over occupancy, environmental favorability, and user maps. Plant response models parametrize stem growth/curvature as explicit functions of local stimuli:

$(\text{longitude}, \text{latitude}, \text{altitude})$ 2

enabling dynamic adaptation, material accumulation, self-repair, and emergent architecture (Hamann et al., 2017).

LSTM-driven plant models further enable evolved controllers for bio-hybrid shape modulation. Robot controllers, evolved via NEAT, drive programmable LED arrays steering bean growth to avoid obstacles and reach targets according to fitness criteria integrating tip location and obstacle avoidance. Validated in closed-loop experiments, fitness exceeds 87% in real-world guides, confirming the hybrid synergy (Wahby et al., 2018).

6. Botany-Aware Foundation Models and Knowledge Systems

Botany-Bots benefit from ontology-driven architectures and foundation-model adaptation to infuse domain knowledge. BotaCLIP aligns pretrained Earth Observation (EO) Vision Transformers (DOFA) with ecological relevés via contrastive learning, using frozen backbones and lightweight linear adapters regularized to preserve semantic structure. Downstream predictors (e.g., species distribution, butterfly occurrence, soil group abundance) exhibit consistent gains: plant presence modeling TSS increases from 0.42 to 0.49; butterfly occurrence Boyce Index from 0.66 to 0.70; soil eDNA group Spearman $(\text{longitude}, \text{latitude}, \text{altitude})$ 3 from 0.40 to 0.41 (Cerna et al., 26 Nov 2025).

Ontology development using OWL (Protégé) encodes plant anatomy, development, and gene–trait relationships, supporting advanced SPARQL queries for automated inference in “Botany-Bot” systems. Formal class axioms (e.g., transcription factors as proteins regulating genes), property constraints (hasPart, growsIn), and instance data (Arabidopsis Col‐0, phenological stages) ensure semantic consistency and enable compatibility with robotic controllers and plant science databases (Kassani et al., 2018).

7. Limitations and Future Directions

Botany-Bot architectures currently face challenges with dense or clustered foliage (segmentation failures), open-loop manipulation, dataset imbalances in both counting and classification, and domain gaps in cross-region or cross-taxon transfer. Field robotics encounter hardware thermal limits and detection ambiguity in low-contrast species. Literature highlights the need for closed-loop, contact-aware manipulation, learned taxonomy-aware regularizers (for hierarchical count consistency), richer multimodal input (multispectral, depth, text), and expanded coverage of taxa (woody plants, non-angiosperms). Integration of real-time feedback, citizen-science data fusion, and ecological model-embedding in foundation representations remains an active area of research across laboratory and field-deployed Botany-Bots (Lutio et al., 2021, Adebola et al., 20 Oct 2025, Xu et al., 22 Mar 2026, Cerna et al., 26 Nov 2025, Hamann et al., 2017, Wahby et al., 2018, Kassani et al., 2018, Benedittis et al., 16 Nov 2025).