Papers
Topics
Authors
Recent
2000 character limit reached

HEIR: Learning Graph-Based Motion Hierarchies (2510.26786v1)

Published 30 Oct 2025 in cs.CV, cs.GR, and cs.LG

Abstract: Hierarchical structures of motion exist across research fields, including computer vision, graphics, and robotics, where complex dynamics typically arise from coordinated interactions among simpler motion components. Existing methods to model such dynamics typically rely on manually-defined or heuristic hierarchies with fixed motion primitives, limiting their generalizability across different tasks. In this work, we propose a general hierarchical motion modeling method that learns structured, interpretable motion relationships directly from data. Our method represents observed motions using graph-based hierarchies, explicitly decomposing global absolute motions into parent-inherited patterns and local motion residuals. We formulate hierarchy inference as a differentiable graph learning problem, where vertices represent elemental motions and directed edges capture learned parent-child dependencies through graph neural networks. We evaluate our hierarchical reconstruction approach on three examples: 1D translational motion, 2D rotational motion, and dynamic 3D scene deformation via Gaussian splatting. Experimental results show that our method reconstructs the intrinsic motion hierarchy in 1D and 2D cases, and produces more realistic and interpretable deformations compared to the baseline on dynamic 3D Gaussian splatting scenes. By providing an adaptable, data-driven hierarchical modeling paradigm, our method offers a formulation applicable to a broad range of motion-centric tasks. Project Page: https://light.princeton.edu/HEIR/

Summary

  • The paper introduces HEIR, a GNN-based framework that learns hierarchical motion structures by inferring interpretable parent-child relationships directly from data.
  • The paper decomposes motion into parent-inherited and local residual components, achieving up to 73% success in reconstructing hierarchies on synthetic datasets and robust performance under noise.
  • The paper demonstrates HEIR's versatility by extending to rotational motions and 3D scene deformations, outperforming existing methods in preserving structural integrity.

HEIR: Learning Graph-Based Motion Hierarchies

Introduction and Motivation

The paper presents HEIR, a general-purpose framework for learning hierarchical motion structures directly from data, leveraging graph neural networks (GNNs) to infer interpretable parent-child relationships among motion elements. The motivation stems from the ubiquity of hierarchical motion in domains such as computer vision, graphics, and robotics, where complex dynamics are often composed of coordinated, simpler motion primitives. Existing approaches typically rely on manually defined or heuristic hierarchies, which restrict generalizability and interpretability. HEIR addresses these limitations by formulating hierarchy inference as a differentiable graph learning problem, enabling automatic discovery of motion hierarchies without domain-specific priors. Figure 1

Figure 1: The HEIR pipeline learns motion hierarchies by inferring parent-child relationships from observed positions Xt\mathbf{X}^t over time, using a GNN to predict edge weights and sample a hierarchy matrix HH for motion decomposition.

Methodology

Problem Formulation

HEIR models observed absolute motions Δt\mathbf{\Delta}^t as a sum of parent-inherited motion and local residuals, structured by a learnable directed acyclic graph (DAG) hierarchy matrix HH. Each motion element is assigned a single parent, and cycles are prohibited, ensuring a valid hierarchical structure. The core decomposition is:

Δt=HΔt+δt\mathbf{\Delta}^t = H \mathbf{\Delta}^t + \boldsymbol{\delta}^t

where Hij=1H_{ij}=1 indicates that element jj is the parent of ii, and δt\boldsymbol{\delta}^t encodes the local residual motion.

Graph Construction and Learning

A proximity graph G0G_0 is constructed, with vertices representing motion elements and edges connecting each node to its kk nearest neighbors. Edge weights wijw_{ij}, representing parent probabilities, are learned via a graph attention mechanism. The encoder G\mathcal{G} computes relative motions by aggregating parent candidate velocities, while the decoder D\mathcal{D} reconstructs absolute motion by recursively aggregating relative velocities along the hierarchy.

Hierarchies are sampled using the Gumbel-Softmax trick for differentiable discrete selection, and the training objective combines reconstruction loss and regularization on the magnitude of local velocities to encourage meaningful hierarchical structure.

Rotational Motion Extension

HEIR is extended to rotational motion by decomposing velocities into radial and angular components in polar coordinates. The encoder predicts these components, and the loss function is augmented with regularization terms for radial and angular velocities, as well as a Laplacian-based connectivity prior to enforce well-formed hierarchies. Figure 2

Figure 2: HEIR successfully reconstructs hierarchical relations in a 1D trajectory, correctly identifying parent-child clusters and disentangling nested motion patterns.

Application to 3D Gaussian Splatting

For dynamic 3D scene deformation, each Gaussian splat is treated as a graph vertex. The learned hierarchy enables interpretable scene editing: user-specified handles propagate deformations to descendants via an as-rigid-as-possible (ARAP) solver, preserving local structure and rigidity. The hierarchy is learned on downsampled Gaussians, with skinning weights used for full-resolution deformation. Figure 3

Figure 3: HEIR reconstructs rotational hierarchies in a synthetic planetary system, correctly assigning moons to their planetary parents and capturing inherited motion.

Experimental Results

1D Hierarchical Motion

On a synthetic 1D dataset with known hierarchical structure, HEIR achieves a 73% success rate in reconstructing valid hierarchies, far exceeding random chance (5×10−75 \times 10^{-7}). The method reliably disentangles nested motions and identifies correct parent clusters, validating its expressiveness and robustness to noise.

Rotational Hierarchies

In a synthetic planetary system, HEIR reconstructs 100% of hierarchies in noise-free settings and 73.6% under moderate Gaussian noise. The Laplacian connectivity prior is critical for maintaining well-formed, non-fragmented hierarchies.

3D Scene Deformation

On the D-NeRF dataset, HEIR outperforms SC-GS across all perceptual and structural metrics (PSNR, SSIM, CLIP-I, LPIPS), producing more realistic and physically coherent deformations. Qualitative results demonstrate superior preservation of structural relationships and avoidance of unnatural distortions, especially in articulated and rigid body scenarios. Figure 4

Figure 4: HEIR yields more realistic and structurally faithful deformations in dynamic Gaussian splatting scenes, outperforming SC-GS in perceptual quality and physical plausibility.

Implementation Considerations

HEIR's graph-based approach is scalable to large numbers of motion elements, with computational complexity governed by the sparsity of the proximity graph and the efficiency of GNN message passing. The Gumbel-Softmax sampling enables end-to-end differentiability, but careful annealing of the temperature parameter is required to avoid premature collapse of parent probabilities. The method assumes each element has a single parent, which may limit expressiveness in systems with multi-source influences; future work could relax this constraint via multi-parent or hypergraph formulations.

For 3D scene deformation, the ARAP solver ensures local rigidity, but the quality of deformation depends on the accuracy of the learned hierarchy and the selection of user handles. Downsampling and skinning strategies are necessary for tractable training and inference in high-resolution scenes.

Implications and Future Directions

HEIR provides a unified, interpretable framework for hierarchical motion modeling, applicable across domains and dimensionalities. Its data-driven approach enables transferability and adaptability to diverse motion-centric tasks, including action recognition, pose estimation, and scene editing. The explicit hierarchy facilitates controllable and semantically meaningful deformations, with potential for integration into interactive graphics and robotics pipelines.

Theoretically, HEIR bridges the gap between fixed kinematic models and unsupervised relational inference, offering a principled mechanism for discovering multi-scale dependencies. Future research may explore extensions to multi-parent hierarchies, global attention mechanisms for long-range dependencies, and integration with semantic or task-driven priors for enhanced generalization.

Conclusion

HEIR introduces a robust, generalizable method for learning graph-based motion hierarchies from data, leveraging GNNs and differentiable sampling to infer interpretable parent-child relationships. Empirical results demonstrate strong performance in reconstructing hierarchical motion structures and enabling realistic 3D scene deformations. While limited by single-parent assumptions and reliance on observable motion, HEIR lays a foundation for structured, data-driven motion modeling with broad applicability and extensibility.

Whiteboard

Paper to Video (Beta)

Open Problems

We found no open problems mentioned in this paper.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 2 likes about this paper.