FIGNet: Face Interaction Graph Networks
- The paper introduces FIGNet as a learned rigid-body simulator that models face–face interactions to capture detailed contact dynamics.
- FIGNet uses a heterogeneous graph structure with mesh nodes, object nodes, and hyper-edges to improve collision detection and computational efficiency.
- Extensions like FIGNet* and Act-FIGNet enhance scalability and allow action-conditioned predictions for applications in robotics, graphics, and design.
Face Interaction Graph Networks (FIGNet) are a class of learned rigid-body simulators that utilize graph neural network (GNN) architectures to predict the dynamics of objects undergoing contact and collision. Unlike prior simulation frameworks based on node- or particle-level interactions, FIGNet explicitly models interactions between mesh faces, providing increased accuracy and efficiency for systems with complex geometries and contact-rich interactions. This approach has been extended to handle action-conditioned scenarios and scaled to real-world scenes, enabling robust simulation in robotics, graphics, and engineering contexts (Allen et al., 2022, Lopez-Guevara et al., 2024, Yi et al., 15 Sep 2025).
1. Motivation and Foundational Principles
The main challenge in simulating rigid-body dynamics, especially with arbitrary shapes, arises from the difficulty of accurately capturing contacts that occur away from mesh nodes or in regions inadequately sampled by particles. Traditional GNN-based simulators—based on node proximity—tend to miss collisions when the closest mesh nodes are farther apart than the contact distance, and dense particle-based representations are computationally infeasible for high-resolution shapes due to quadratic cost scaling.
FIGNet addresses this by shifting collision detection and interaction modeling from nodes or particles to mesh faces. By constructing hyper-edges that connect the vertices of contacting faces (typically triangles), FIGNet ensures that face-level contacts are explicitly represented. This approach guarantees detection of face–face, edge–edge, and vertex–face contacts without the need for high-density sampling or global distance thresholds, resulting in both higher accuracy and improved computational efficiency (Allen et al., 2022).
2. Graph Construction and Feature Encoding
At each simulation step, FIGNet represents the physical scene as a heterogeneous graph:
- Mesh nodes (): Each vertex of an object's mesh (triangle-based surface) is a node. Features include finite-difference velocity history, mass, friction, and restitution.
- Object node (): One per rigid body, located at the body's center of mass, facilitating rapid transmission of global impulses.
- Mesh–mesh edges (): Connect mesh vertices that share a triangle, encoding local geometry via relative positions (both in deformed and undeformed reference frames).
- Object–mesh and mesh–object edges (): Link object node to mesh vertices for distributing global forces.
- Face–face hyper-edges (): Created between triangles on distinct bodies that are found—via BVH—within a collision radius . Each such connection encodes:
- Closest points () on the two faces,
- Separating vector ,
- Per-vertex span vectors from vertices to closest points,
- Face normals .
The full face–face interaction feature vector is 17-dimensional:
This representation enables the model to accurately encode the local geometry and kinematics pertinent to contact dynamics (Allen et al., 2022).
3. Message Passing and Dynamics Prediction
FIGNet employs an encode–process–decode GNN architecture, with 10 unshared message-passing layers:
- Edge updates: Standard edges update via MLPs taking as input the edge latent and source/receiver node latents.
- Face–face hyper-edge updates: MLPs jointly update messages from sender triangle to each receiver vertex, propagating detailed contact information.
- Node updates: Each mesh node aggregates incoming messages from mesh–mesh edges, face–face hyper-edges corresponding to that vertex, and object–mesh connections.
After message passing, the latent at each mesh node is decoded to predict next-step acceleration: 0 Positions are integrated using a second-order Euler scheme: 1
A per-node mean squared error (MSE) loss is used during training: 2 Regularization involves random-walk noise and random world 3-axis rotations to stabilize multi-step rollouts and encourage robust generalization (Allen et al., 2022).
4. Scaling, Action Conditioning, and Perception Integration
Memory-Efficient Simulation (FIGNet*)
Scaling to real-world scenes with many objects and high-resolution meshes is challenging for the original FIGNet due to memory requirements dominated by node–node mesh edges. FIGNet* is a variant that omits within-mesh adjacency edges, relying entirely on object–mesh and face–face edges for rigid-body dynamics. Empirically, this yields a 2–3× reduction in memory consumption and 20–30% speedup during simulation, with minimal impact on predictive accuracy for translation and rotation:
- Peak per-step memory: 463 MiB (FIGNet) 5 650 MiB (FIGNet*)
- Edge count reduced by 50–70%
- Training on large-scale scenes (e.g., Kubric MOVi-C) becomes feasible, as out-of-memory failures are avoided (Lopez-Guevara et al., 2024).
Perception-Driven Interfaces
Integration with Neural Radiance Fields (NeRFs) enables FIGNet* to operate directly on real-world RGB data. Objects are segmented from multiview images, lifted to 3D via depth estimation, and meshed from density fields. The resulting mesh is used as the active object in the FIGNet* graph. After simulating dynamics, the trajectory is rendered by editing the NeRF with simulated object poses. Zero-shot transfer from synthetic data to real scenes is observed, as the model is robust to noise and errors introduced by the perception pipeline (Lopez-Guevara et al., 2024).
Action-Conditioned Extensions
Act-FIGNet extends the FIGNet paradigm to predict the outcome of explicit control actions for contact-rich manipulation tasks. The input graph is augmented with "world" nodes and additional edge types to incorporate external force and torque (wrench) information. The action signal is injected via these nodes and bidirectional edges, allowing the GNN to condition its predictions on control inputs without architectural changes in the message passing procedure (Yi et al., 15 Sep 2025).
Act-FIGNet's graph contains:
- Mesh nodes: per-vertex historical positions and static attributes (mass, friction, dynamic/static flag)
- Object nodes: per-rigid-body, for global attributes/communications
- World nodes: encode external force and torque applied to each rigid body
- Interaction edges: object–mesh, mesh–mesh (face–face), world–mesh
Prediction heads output next-step vertex accelerations and (for tool bodies) aggregate force/torque estimates at the end effector. The model is trained to jointly minimize error in vertex positions, forces, and torques under an MSE loss. This explicit action conditioning enables its use in model-predictive control, contact-aided state estimation, and force-feedback robot policy learning (Yi et al., 15 Sep 2025).
5. Empirical Performance and Evaluation
Synthetic Benchmarks
On standard synthetic contact datasets (Kubric MOVi-A, MOVi-B):
- Translation RMSE @ 50 steps: FIGNet achieves 7 m (MOVi-A), outperforming MGN-LargeRadius (8 m), DPI (9 m)
- Rotation RMSE: 0 (MOVi-A), markedly lower than DPI (1)
- Collision edge count: Orders-of-magnitude reduction; e.g., 2 (FIGNet) vs. 3 (MGN-LargeRadius) in MOVi-A
- Efficiency: Up to 4 faster per-step runtime on CPU than state-of-the-art node-based mesh GNNs
For complex meshes (MOVi-B):
- Translation and rotation RMSE for FIGNet are 5 lower than prior learned methods
- When scaling using FIGNet*, translation RMSE remains equivalent or slightly improves, while memory savings enable simulation with tens of thousands of faces per object (Allen et al., 2022, Lopez-Guevara et al., 2024)
Real-World Experiments
On MIT planar pushing and robotic manipulation tasks:
- Median translation error: 6 (FIGNet) versus 7 for Tuned MuJoCo, PyBullet, and the Lynch analytic pushing model
- Force/torque estimation (Act-FIGNet): 8 reduction in position error and 9 reduction in force/torque RMSE compared to state-of-the-art analytic simulators, with robust zero-shot transfer to novel tasks (Yi et al., 15 Sep 2025).
Act-FIGNet matches ground-truth simulation in MPC for peg-in-hole insertion (up to 0 success), with generalization to unseen geometry and control distributions. Fine-tuning on task-specific data yields only marginal improvements, indicating strong model robustness.
6. Applications, Limitations, and Outlook
Applications
- Differentiable robotics planning: Enabling model-based control and contact-rich manipulation with learned dynamics and force feedback.
- High-fidelity animation: Supporting interactive simulation and animation in computer graphics and virtual environments.
- Mechanical design: Facilitating differentiable design optimization by efficiently simulating detailed contact events among complex geometries.
- Perception-based simulation: Bridging RGB-based scene understanding and mesh-based physics, with the potential for fully end-to-end learned pipelines (Allen et al., 2022, Lopez-Guevara et al., 2024, Yi et al., 15 Sep 2025).
Limitations
- State requirements: Current FIGNet and FIGNet* models require full 3D vertex information at each timestep and do not consume raw pixel data or depth images directly.
- Scalability: At extremely high mesh resolutions (1 million faces per object), graph memory requirements may become prohibitive, suggesting the need for adaptive or hybrid representations.
- Stochasticity: FIGNet is trained with deterministic MSE loss; highly chaotic or stochastic interactions may call for probabilistic latent-variable extensions.
- Runtime on massive scenes: For extremely large, multi-object scenes, further optimization may be needed despite the memory benefits of FIGNet*.
Future Directions
Promising directions include tighter integration with perception modules (e.g., direct image-to-mesh pipelines), runtime optimization (including GPU/TPU accelerators), and incorporation of predictive uncertainty to handle ambiguous or stochastic contact events. Extensions to deformable objects, fluid–rigid coupling, and articulated multibody contact dynamics are also plausible (Allen et al., 2022, Lopez-Guevara et al., 2024, Yi et al., 15 Sep 2025).
7. Comparative Table: FIGNet and Recent Variants
| Model | Key Distinction | Memory Use (MOVi-B) | Translation RMSE | Notable Features |
|---|---|---|---|---|
| FIGNet | Face–face hyper-edges, full mesh | 63.4 MiB | 0.14 m | Accurate, but memory-bound |
| FIGNet* | Drops mesh node adjacency edges | 50.1 MiB | 0.13 m | Efficient, scalable |
| Act-FIGNet | Action, force/torque inputs | (Task-dependent) | Task-dependent | Control-conditional, F/T out |
This systematic development marks FIGNet and its variants as a versatile, high-accuracy framework for learned rigid-body simulation, scaling from synthetic benchmarks to complex real-world scenarios, and supporting advanced robotics and perception-driven applications (Allen et al., 2022, Lopez-Guevara et al., 2024, Yi et al., 15 Sep 2025).