MLP-Based Deformation Fields

Updated 17 December 2025

The paper introduces MLP-based deformation fields that map spatial coordinates to transformation parameters, enabling high-fidelity dynamic modeling.
It details network architectures with SIREN activations and local Jacobian estimators for continuous, differentiable deformations of explicit and implicit representations.
Applications include dynamic avatar synthesis, shape interpolation, and registration, with performance metrics highlighted in rendering and tracking tasks.

Multilayer perceptron (MLP)-based deformation fields constitute a foundational approach to modeling geometric transformations in both explicit (point-based, mesh-based, or Gaussian-based) and implicit (SDF or radiance field) neural representations of 2D and 3D shapes. These methods employ compact coordinate-based neural networks to construct continuous, differentiable, and often physically or geometrically regularized mappings between source and deformed configurations, enabling high-fidelity dynamic modeling, shape interpolation, motion synthesis, and registration. This article systematically reviews key construction principles, mathematical formulations, network architectures, supervision strategies, and comparative advantages of MLP-based deformation fields in the context of contemporary research.

1. Mathematical Formulation of MLP-Based Deformation Fields

MLP-based deformation fields map spatial or parametric coordinates of geometric primitives to translation, rotation, flow, or more general transformation parameters via neural networks. Standard formulations fall into three main categories:

Directly parameterized deformation fields: Given a canonical point cloud or mesh $X^\mathcal{C} = \{x_i^\mathcal{C}\}_{i=1}^N$ , a per-frame or per-parameter MLP $g_\theta^{(t)}: \mathbb{R}^3 \to \mathbb{R}^3$ predicts a translation $\Delta x_i^{(t)} = g_\theta^{(t)}(x_i^\mathcal{C})$ and forms the deformed shape $x_i^{(t)} = x_i^\mathcal{C} + \Delta x_i^{(t)}$ (Yu et al., 2023, Prokudin et al., 2023).
Forward-warping fields for explicit surface elements: Embeddings of discrete graphics primitives, such as 3D Gaussians, are deformed by predicting position, shape, and orientation deltas via MLPs $f_\theta(\gamma(x), \gamma(t)) \to (\Delta x^{(t)}, \Delta s^{(t)}, \Delta q^{(t)})$ , with the dynamic configuration computed as $x^{(t)} = x + \Delta x^{(t)}$ and similarly for scales and rotations (Liang et al., 2023).
Implicit deformation fields in canonical domains: Given a base domain $D \subset \mathbb{R}^3$ (e.g., unit sphere, shell parameter domain), MLPs $f:\mathbb{R}^3 \to \mathbb{R}^3$ (or hierarchically stacked MLPs $f_0, f_1$ ) apply residual deformations to $D$ to generate the target embedding $S = \{ f(x) \mid x \in D \}$ (Walker et al., 2023, Kairanda et al., 2023).
Neural flow models: Structure-preserving morphing is achieved by constructing a vector flow field $\mathbf{v}(x, t)$ modeled by a SIREN MLP, which is then integrated via an ODE to produce a one-parameter family of diffeomorphisms $\Phi(x, t)$ (Bizzi et al., 10 Oct 2025).
Local Jacobian-based deformation: The Local Jacobian Network (LJN) predicts per-vertex Jacobians from coarse one-ring neighborhood estimates using per-point MLPs and spectral smoothing, with the global embedding recovered by a Poisson equation (Sundararaman et al., 26 Sep 2024).

Across these settings, the spatial input to the MLP may represent 3D position, 2D parametric location, or mesh/graph features, and may be augmented by time, pose, latent codes, or local geometric context.

2. Network Architectures and Conditioning Mechanisms

MLPs for deformation fields are designed for expressive modeling while maintaining computational efficiency:

Depth and Width: Architectures range from three hidden layers (width 128, SIREN activation) for per-frame surface deformations (Prokudin et al., 2023), to 8-layer, 256-unit MLPs (ReLU or SIREN) for Gaussian deformation fields (Liang et al., 2023), up to single-layer, wide (400-unit) residual blocks for mesh surface deformation (Walker et al., 2023).
Activation functions: Sinusoidal (SIREN) activations are used for high-frequency, detail-preserving deformations (Prokudin et al., 2023, Kairanda et al., 2023, Bizzi et al., 10 Oct 2025). ReLU and SoftPlus activations support general point-based models (Liang et al., 2023, Walker et al., 2023, Atzmon et al., 2021).
Input encoding: Spatial or temporal coordinates may be encoded by random Fourier features (Walker et al., 2023), trigonometric positional encodings (Liang et al., 2023), Laplace-Beltrami eigenfunctions for intrinsic geometry (Walker et al., 2023), or remain unencoded (if geometry-aligned features suffice) (Merrouche et al., 11 Dec 2024).
Conditioning: Temporal and pose dependencies are handled by explicit per-frame MLPs, concatenation of time/pose/latent codes, or attention-masked inputs in facial expression synthesis (Chen et al., 2023). Locality is imposed by Gaussian spatial kernels or per-landmark MLP ensembles for regional control (Chen et al., 2023). In medical registration, hybrid latent codes, combining global and locally interpolated vectors, efficiently encode spatial variability (Tian et al., 2023).
Hybrid graph–MLP decoders: Deformation of structured patches or segments exploits GCNs followed by MLP regressors for per-patch affine transformations, as in joint SDF-deformation approaches for motion tracking (Merrouche et al., 11 Dec 2024).

3. Supervision, Constraints, and Regularization

MLP-based deformation field supervision strategies leverage geometric, physical, and task-based regularization:

Keypoint or correspondence supervision: Fit is performed to known correspondences, typically via $\ell_2$ losses on predicted deformed positions versus ground truth or tracked keypoints (Yu et al., 2023, Prokudin et al., 2023). For local field control (facial avatars), attention-masked latent variables selectively drive landmark-centric deformations (Chen et al., 2023).
Physically inspired regularization: Thin shell energies (membrane and bending) derived from the Kirchhoff–Love shell theory are directly encoded as loss terms for cloth simulation, enabling the recovery of physically plausible equilibria (Kairanda et al., 2023).
As-isometric-as-possible (ARAP) or rigidity energies: Regularizers such as the Killing energy or as-isometric-as-possible penalties are imposed via neighborhoods or affine part decompositions, enforcing near-rigidity or preventing part popping in piecewise-rigid settings (Prokudin et al., 2023, Atzmon et al., 2021, Merrouche et al., 11 Dec 2024).
Curvature and smoothness: Thin-plate regularization penalizing the Frobenius norm of the velocity Jacobian at $t=0$ ensures structure-preserving, minimal-energy trajectories in flow-based morphing (Bizzi et al., 10 Oct 2025).
Cycle and inverse-consistency: Cycle-consistency losses and gradient consistency (e.g., GradICON) bolster invertibility and regularity in volumetric or image deformation tasks (Tian et al., 2023, Merrouche et al., 11 Dec 2024).
Weak or no explicit regularizers: Some frameworks rely on the inductive bias and capacity limitations of MLPs, with local keypoint or correspondence losses, to yield sufficiently smooth and plausible deformations without auxiliary terms (Yu et al., 2023).

4. Integration with Rendering, Tracking, and Surface Extraction

MLP-based deformation fields are tightly coupled with differentiable renderers, surface extractors, or downstream task networks:

Radiance field deformation: Deformations are applied element-wise to canonical point clouds before querying view-dependent radiance functions. Local rotations are estimated via SVD between neighborhoods and quaternion interpolations along rays ("ray bending") are used to ensure coherency in view-dependent appearance (Yu et al., 2023, Liang et al., 2023).
Mesh recovery via Poisson solve: For Jacobian-based deformation fields, global vertex embeddings are reconstructed from predicted local Jacobians by solving discrete Poisson equations; this allows detail-preserving, invertible global shape updates while learning remains local (Sundararaman et al., 26 Sep 2024).
Temporal correspondence and tracking: In neural field plus mesh-deformation hybrids, patchwise deformations (rotation plus translation per patch) are blended across the surface, and cycle/matching losses ensure temporally coherent tracking over partial or unaligned observations (Merrouche et al., 11 Dec 2024).
Real-time dynamic rendering: In explicit Gaussian splatting or point-based NeRFs, MLP-predicted deformations allow per-frame updates at real-time (30–96 FPS) rates by compactly warping a set of static primitives, with static/dynamic segmentation for efficient computation on complex scenes (Liang et al., 2023).

5. Comparative Advantages, Limitations, and Applications

MLP-based deformation fields, as opposed to traditional grid/mesh-based or dense voxel methods, show distinct strengths and trade-offs:

Approach/Task	Key Benefit	Limitation
Point-based radiance fields	Fine-level deformation, fast surface update	Local MLP per pose; SVD may introduce noise
Forward-warped explicit Gaussians	Real-time, memory efficient, scene decomposition	Struggles with topology changes
Implicit meshless fields	Continuous, high fidelity, adapts to detail	Solving global consistency/integrability needed
Jacobian-local field (LJN)	Category-agnostic, local supervision, fast inference	Limited global context, potential volume shrinkage
Structure-preserving flow fields	Guaranteed invertibility, low distortion	Fails on topology changes, requires dense features
Hybrid neural field + mesh	High temporal coherence, geometric fidelity	Complexity in patching, dependency on association
Deformation-based registration	Memory savings, smooth diffeomorphisms	Less flexible than dense-vectors for sharp transitions

Applications span controllable avatar synthesis, dynamic non-rigid reconstruction, morphing and registration (medical and graphics), detail-preserving surface mapping, adaptive physics-informed simulation, and robust out-of-distribution pose generalization (Yu et al., 2023, Prokudin et al., 2023, Liang et al., 2023, Walker et al., 2023, Kairanda et al., 2023, Atzmon et al., 2021, Tian et al., 2023, Chen et al., 2023, Sundararaman et al., 26 Sep 2024, Merrouche et al., 11 Dec 2024, Bizzi et al., 10 Oct 2025).

6. Evaluation Metrics and Empirical Performance

Across the referenced works, evaluation targets high-fidelity reconstruction, motion/scripted pose adaptation, and structure-preserving transformations in dynamic and static settings. Common metrics:

PSNR, LPIPS, SSIM: For rendering tasks and novel view synthesis, with human/character and dynamic sequences reaching $\sim$ 10–25 dB (PSNR) and LPIPS as low as 0.0465 under fine-grained, locally controlled deformations (Yu et al., 2023, Chen et al., 2023).
Chamfer-L1 and correspondence accuracy: In shape mapping and 3D reconstruction settings, e.g., 1.22 mm Chamfer-L1 on DTU for ENS, with structure-preserving local Jacobian approaches yielding geodesic errors improved to 1.5 cm on FAUST (Walker et al., 2023, Sundararaman et al., 26 Sep 2024).
Registration error (mTRE, Dice, % folds): In medical registration, MLP fields match or outperform dense-grid approaches, with perfect invertibility (0% foldings) and competitive accuracy at reduced memory footprints (Tian et al., 2023).
Motion coherence and tracking error: Temporal coherence is quantified via surface-to-field, matching, and rigidity losses, with IoU of 80-90% and tracking errors ( $\sim$ 0.012 correspondence error) for non-rigid 3D motion (Merrouche et al., 11 Dec 2024).
Speed and memory: Point-based and explicit approaches offer orders-of-magnitude faster training and inference times than volumetric SDF/NeRF baselines (ENS: 5 min vs. 5 h) and require only per-frame or per-patch MLPs, key for interactive/editable systems (Walker et al., 2023, Liang et al., 2023, Sundararaman et al., 26 Sep 2024).

7. Synthesis and Open Research Directions

MLP-based deformation fields, through their differentiable, resolution-independent, and regularizable structure, provide a flexible and extensible foundation for dynamic 3D scene synthesis, physically plausible simulation, detailed mapping, and high-quality registration. Notable emerging directions include:

Scalable local/global hybrids: Integrating global shape/context in addition to per-point or per-patch locality for improved global deformation modeling, as alluded to in LJN (Sundararaman et al., 26 Sep 2024).
Robust topology handling: Extending MLP-based deformation fields to explicitly address topological changes and large-scale discontinuities remains a challenge, with forward-warped Gaussian and morphing flows highlighting these limitations (Liang et al., 2023, Bizzi et al., 10 Oct 2025).
Physical/semantic regularizers: Deeper integration of physics-driven constraints, learned material models, and semantic part priors to enable robust and interpretable deformation for simulation and control (Kairanda et al., 2023, Merrouche et al., 11 Dec 2024).
Data-efficient and category-agnostic learning: LJN-style approaches demonstrate that extreme data efficiency and cross-category generalization are possible when leveraging local neighborhood statistics, offering a promising direction for generalized deformation models (Sundararaman et al., 26 Sep 2024).

MLP-based deformation fields continue to bridge the strengths of explicit geometry, differentiable implicit representations, and learned coordinate transformation, underpinning advances across computer graphics, vision, and geometric deep learning.