- The paper introduces HEIR, a GNN-based framework that learns hierarchical motion structures by inferring interpretable parent-child relationships directly from data.
- The paper decomposes motion into parent-inherited and local residual components, achieving up to 73% success in reconstructing hierarchies on synthetic datasets and robust performance under noise.
- The paper demonstrates HEIR's versatility by extending to rotational motions and 3D scene deformations, outperforming existing methods in preserving structural integrity.
HEIR: Learning Graph-Based Motion Hierarchies
Introduction and Motivation
The paper presents HEIR, a general-purpose framework for learning hierarchical motion structures directly from data, leveraging graph neural networks (GNNs) to infer interpretable parent-child relationships among motion elements. The motivation stems from the ubiquity of hierarchical motion in domains such as computer vision, graphics, and robotics, where complex dynamics are often composed of coordinated, simpler motion primitives. Existing approaches typically rely on manually defined or heuristic hierarchies, which restrict generalizability and interpretability. HEIR addresses these limitations by formulating hierarchy inference as a differentiable graph learning problem, enabling automatic discovery of motion hierarchies without domain-specific priors.
Figure 1: The HEIR pipeline learns motion hierarchies by inferring parent-child relationships from observed positions Xt over time, using a GNN to predict edge weights and sample a hierarchy matrix H for motion decomposition.
Methodology
HEIR models observed absolute motions Δt as a sum of parent-inherited motion and local residuals, structured by a learnable directed acyclic graph (DAG) hierarchy matrix H. Each motion element is assigned a single parent, and cycles are prohibited, ensuring a valid hierarchical structure. The core decomposition is:
Δt=HΔt+δt
where Hij​=1 indicates that element j is the parent of i, and δt encodes the local residual motion.
Graph Construction and Learning
A proximity graph G0​ is constructed, with vertices representing motion elements and edges connecting each node to its k nearest neighbors. Edge weights wij​, representing parent probabilities, are learned via a graph attention mechanism. The encoder G computes relative motions by aggregating parent candidate velocities, while the decoder D reconstructs absolute motion by recursively aggregating relative velocities along the hierarchy.
Hierarchies are sampled using the Gumbel-Softmax trick for differentiable discrete selection, and the training objective combines reconstruction loss and regularization on the magnitude of local velocities to encourage meaningful hierarchical structure.
Rotational Motion Extension
HEIR is extended to rotational motion by decomposing velocities into radial and angular components in polar coordinates. The encoder predicts these components, and the loss function is augmented with regularization terms for radial and angular velocities, as well as a Laplacian-based connectivity prior to enforce well-formed hierarchies.
Figure 2: HEIR successfully reconstructs hierarchical relations in a 1D trajectory, correctly identifying parent-child clusters and disentangling nested motion patterns.
Application to 3D Gaussian Splatting
For dynamic 3D scene deformation, each Gaussian splat is treated as a graph vertex. The learned hierarchy enables interpretable scene editing: user-specified handles propagate deformations to descendants via an as-rigid-as-possible (ARAP) solver, preserving local structure and rigidity. The hierarchy is learned on downsampled Gaussians, with skinning weights used for full-resolution deformation.
Figure 3: HEIR reconstructs rotational hierarchies in a synthetic planetary system, correctly assigning moons to their planetary parents and capturing inherited motion.
Experimental Results
1D Hierarchical Motion
On a synthetic 1D dataset with known hierarchical structure, HEIR achieves a 73% success rate in reconstructing valid hierarchies, far exceeding random chance (5×10−7). The method reliably disentangles nested motions and identifies correct parent clusters, validating its expressiveness and robustness to noise.
Rotational Hierarchies
In a synthetic planetary system, HEIR reconstructs 100% of hierarchies in noise-free settings and 73.6% under moderate Gaussian noise. The Laplacian connectivity prior is critical for maintaining well-formed, non-fragmented hierarchies.
On the D-NeRF dataset, HEIR outperforms SC-GS across all perceptual and structural metrics (PSNR, SSIM, CLIP-I, LPIPS), producing more realistic and physically coherent deformations. Qualitative results demonstrate superior preservation of structural relationships and avoidance of unnatural distortions, especially in articulated and rigid body scenarios.
Figure 4: HEIR yields more realistic and structurally faithful deformations in dynamic Gaussian splatting scenes, outperforming SC-GS in perceptual quality and physical plausibility.
Implementation Considerations
HEIR's graph-based approach is scalable to large numbers of motion elements, with computational complexity governed by the sparsity of the proximity graph and the efficiency of GNN message passing. The Gumbel-Softmax sampling enables end-to-end differentiability, but careful annealing of the temperature parameter is required to avoid premature collapse of parent probabilities. The method assumes each element has a single parent, which may limit expressiveness in systems with multi-source influences; future work could relax this constraint via multi-parent or hypergraph formulations.
For 3D scene deformation, the ARAP solver ensures local rigidity, but the quality of deformation depends on the accuracy of the learned hierarchy and the selection of user handles. Downsampling and skinning strategies are necessary for tractable training and inference in high-resolution scenes.
Implications and Future Directions
HEIR provides a unified, interpretable framework for hierarchical motion modeling, applicable across domains and dimensionalities. Its data-driven approach enables transferability and adaptability to diverse motion-centric tasks, including action recognition, pose estimation, and scene editing. The explicit hierarchy facilitates controllable and semantically meaningful deformations, with potential for integration into interactive graphics and robotics pipelines.
Theoretically, HEIR bridges the gap between fixed kinematic models and unsupervised relational inference, offering a principled mechanism for discovering multi-scale dependencies. Future research may explore extensions to multi-parent hierarchies, global attention mechanisms for long-range dependencies, and integration with semantic or task-driven priors for enhanced generalization.
Conclusion
HEIR introduces a robust, generalizable method for learning graph-based motion hierarchies from data, leveraging GNNs and differentiable sampling to infer interpretable parent-child relationships. Empirical results demonstrate strong performance in reconstructing hierarchical motion structures and enabling realistic 3D scene deformations. While limited by single-parent assumptions and reliance on observable motion, HEIR lays a foundation for structured, data-driven motion modeling with broad applicability and extensibility.