Papers
Topics
Authors
Recent
Search
2000 character limit reached

PlantPose: Universal Plant Skeleton Estimation via Tree-constrained Graph Generation

Published 18 May 2026 in cs.CV | (2605.17773v1)

Abstract: Accurate estimation of plant skeletal structures (e.g., branching structures) from images is essential for smart agriculture and plant science. Unlike human skeletons with fixed topology, plant skeleton estimation presents a unique challenge, i.e., estimating arbitrary tree graphs from images. To address this problem, we introduce PlantPose, a universal plant skeleton estimator via tree-constrained graph generation. PlantPose combines learning-based graph generation with traditional graph algorithms to enforce tree constraints during the training loop. To enhance the model's generalization capability, we curate a large and diverse dataset comprising real-world and synthetic plant images, along with simplified representations (e.g., sketches and abstract drawings). This dataset enables the generalized model to adapt to diverse input styles and categories of plant images while preserving topological consistency. Our approach demonstrates robust and accurate plant skeleton estimation across multiple domains, including previously unseen out-of-domain scenarios. Further analyses highlight the method's strengths and limitations in handling complex, heterogeneous data distributions. All implementations and datasets are available at https://github.com/huntorochi/PlantPose/.

Summary

  • The paper introduces a novel Selective Feature Suppression (SFS) layer that enforces tree constraints directly in the network, ensuring biologically plausible plant skeleton predictions.
  • It employs a transformer-based RelationFormer with integrated MST-based projection, achieving superior spatial (SMD) and topological (TOPO) metrics compared to baselines.
  • The study demonstrates robust generalization across diverse, real-world and synthetic datasets, enabling effective plant phenotyping in varied imaging scenarios.

PlantPose: Universal Plant Skeleton Estimation via Tree-Constrained Graph Generation

Introduction and Problem Statement

"PlantPose: Universal Plant Skeleton Estimation via Tree-constrained Graph Generation" (2605.17773) addresses the challenge of estimating arbitrary tree-structured plant skeletons directly from single images. Unlike human pose estimation, which relies on a fixed underlying topology, the topology of plant skeletons is highly variable and unconstrained, presenting fundamental challenges for robust image-to-graph pipelines. Existing methods—such as those derived from human pose estimation or geometric structure extraction—either do not support arbitrary topologies or do not enforce global constraints during learning, resulting in inconsistent or physically implausible skeletons.

The core insight of this work is the necessity of integrating strict global tree constraints into the graph generation pipeline to robustly model plant skeletons. Instead of relying on post-hoc corrections (e.g., MST-based pruning after unconstrained predictions), the PlantPose framework injects the tree constraint directly into the learning objective and network architecture, ensuring that predictions inherently conform to biologically plausible tree structures. Figure 1

Figure 1: Overview of PlantPose demonstrating the effect of unconstrained (a), naive tree-constrained (b), and the proposed constraint-aware approach (c) for plant graph extraction, along with applications (d) and generalization capability (e).

Methodology: Tree-Constrained Graph Generation with the SFS Layer

To guarantee that the generated graphs are valid trees, PlantPose introduces a Selective Feature Suppression (SFS) layer—a differentiable reparameterization module that enforces tree constraints within the network during training. The unconstrained graph generator produces edge existence probabilities for all node pairs. These are projected, via a non-differentiable minimum spanning tree (MST) algorithm, to their closest valid tree. The SFS layer then selectively suppresses features associated with edges inconsistent with the MST, imposing the desired graph prior in a way compatible with end-to-end training. Figure 2

Figure 2: The SFS reparameterization layer integrates MST-based constraints into training by modifying edge features between unconstrained and projected graphs.

Formally, given unconstrained edge predictions, the SFS layer modifies the appropriate logits in the feature vector to force the softmax output to favor tree-constrained topology. This operation is fully differentiable with respect to the unconstrained node and edge features, allowing gradients to propagate properly for those parameters conforming to the MST projection.

Implementation with RelationFormer and Dataset Construction

PlantPose is implemented by embedding the SFS layer into the RelationFormer model—a non-autoregressive transformer-based graph generator designed for efficient image-to-graph tasks. The SFS layer sits atop the relation prediction head. The MST projection is realized using Kruskal’s algorithm, with edge costs derived from the negative probability of edge existence.

A major obstacle to robust generalization is the standard practice of species- or context-specific dataset assembly. PlantPose overcomes this by assembling a large, heterogeneously sourced dataset. Six real-world, synthetic, and web-sourced sources constitute the training pool, and four out-of-domain sources provide strong tests of robustness and universality. Figure 3

Figure 3: Domain-specific dataset examples with overlays indicating annotated graph nodes (yellow) and edges (red).

Figure 4

Figure 4: Broad diversity of plant structures and backgrounds in the composite training and out-of-domain test datasets, covering both real and synthetic examples.

Experimental Evaluation and Quantitative Results

Baseline Comparisons

Three strong baselines are selected: a two-stage skeletonization and graph optimization pipeline, the unconstrained RelationFormer, and a test-time MST-only constraint. The baseline two-stage approach leverages vector field regression followed by graph optimization, which, while effective for certain regular morphologies, fails to capture the joint distribution of nodes and edges in a single feedforward pass and is not directly end-to-end trainable for the global skeleton structure.

Domain-Specific Performance

PlantPose outperforms all baselines on synthetic and real datasets, including grapevine and root architectures. Numerically, PlantPose consistently achieves the lowest Street Mover’s Distance (SMD), which measures spatial/fidelity alignment between predicted and ground-truth skeletons, and the highest topological F1 scores (TOPO), which reflect structural correctness at the node level. Notably, unconstrained models produce invalid (non-tree) or fragmented skeletons, even when trained entirely on tree-structured targets, with tree rates frequently below 40%. Figure 5

Figure 5: Synthetic data: PlantPose (bottom) produces superior edge prediction and structural coherence relative to all baselines, with improved spatial and topological fidelity.

Figure 6

Figure 6

Figure 6: Real data: PlantPose recovers detailed skeletons for both root and grapevine images, surpassing the two-stage and unconstrained baselines.

Generalization and Out-of-Domain Robustness

PlantPose demonstrates superior robustness to distributional shift. On out-of-domain datasets (e.g., unseen tree types, variable backgrounds, hand-drawn or abstract images), it maintains high tree rates, spatial fidelity, and balanced topology precision/recall. In contrast, non-integrated constraint methods rapidly degrade either producing disconnected/keypoint-sparse outputs (unconstrained) or structurally noisy graphs with poor spatial alignment (test-time constraint only). Figure 7

Figure 7: Domain-specific model deployed on strongly shifted domains, maintaining consistent skeleton extraction across varied species and environments.

Figure 8

Figure 8: Example output graphs on diverse in-domain (top) and out-of-domain (bottom) benchmarks. PlantPose maintains structure, connectivity, and plausible topology.

Figure 9

Figure 9: Robustness on out-of-domain scenarios—PlantPose preserves tree consistency and geometry where baselines fail or fragment.

Figure 10

Figure 10: PlantPose generalizes to web-sourced and abstract domains, including flowers, rare plants, and stylized/tree-drawing imagery.

Comparative Metrics

Across all metrics and domains, PlantPose either matches or exceeds the state-of-the-art. Test-time MST achieves valid tree rates but with higher SMD and lower F1, highlighting the loss of geometric and topological accuracy when constraints are not integrated during learning. The two-stage baseline achieves high TOPO at the cost of excessive short/redundant edge predictions and higher SMD. PlantPose’s joint node/edge constraint integration leads to balanced TOPO scores and the lowest SMD values.

Theoretical and Practical Implications

This work confirms that enforcing global, non-differentiable constraints (like the tree property) is tractable in deep neural architectures through differentiable reparameterization. The SFS strategy can be embedded with minimal engineering into any existing graph generator and does not require specialized differentiable combinatorial solvers. Practically, this delivers a marked improvement in plant phenotyping technologies—supporting applications ranging from high-throughput crop screening to automated root trait extraction. The universal, robust plant skeleton estimator enables across-species or uncurated-domain deployment with minimized retraining and annotation overhead.

Theoretically, PlantPose’s methodology is extensible to other domains requiring globally consistent graph-structured predictions, such as road network extraction or biomedical vessel tree analysis, providing a general blueprint for integrating global combinatorial structure into end-to-end deep pipelines.

Limitations and Future Directions

Limitations are primarily computational: MST-based constraint projection at every iteration slows training, though this could be alleviated by future GPU-accelerated MST implementations. Accuracy is bottlenecked by the performance of underlying node detection; in very challenging images, missed keypoints propagate to reduced skeleton quality. Current work is constrained to binary tree-like topologies, and extension to more general super-structures or hybrid multi-constraint graph types is an area for future research.

Advancements in memory-efficient graph attention and pruning strategies should further scale the model to enable even more complex plant or multi-entity phenotyping settings. The learnable reparameterization approach presents opportunities for general constrained optimization (e.g., enforcing acyclicity, connectivity) within deep graph generation.

Conclusion

PlantPose defines a rigorous, universal solution for plant skeleton estimation, delivering state-of-the-art spatial and topological fidelity across a wide array of plant forms and imaging modalities. By directly integrating global tree constraints into the learning objective via differentiable reparameterization, PlantPose provides robust, scalable, and practically valuable plant phenotyping models. Its theoretical generality and well-characterized empirical advantages establish a new standard for end-to-end structure discovery in visual graph domains.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.