Papers
Topics
Authors
Recent
Search
2000 character limit reached

MeshRet: Unified Motion Retargeting Framework

Updated 22 June 2026
  • The paper presents an end-to-end framework that retargets skinned character motions while optimizing dense geometric interactions.
  • It replaces separate skeletal semantics and geometry correction with a unified sensor-based approach that minimizes self-interpenetration and contact errors.
  • Empirical results demonstrate state-of-the-art performance on synthetic and real-scan datasets with significant improvements in contact accuracy and penetration metrics.

MeshRet is a unified skinned character motion retargeting framework that directly models dense geometric interactions between body parts through a spatio-temporal field representation. Unlike standard two-stage retargeting pipelines that separately handle skeletal semantics and geometry correction—often leading to conflicts manifesting as jitter, interpenetration, and contact errors—MeshRet achieves motion retargeting and geometric relationship preservation in an end-to-end manner by learning to align dense interaction statistics. The approach enables motion retargeting across diverse mesh topologies while minimizing both self-interpenetration and contact mismatch, and demonstrates state-of-the-art results on both synthetic (Mixamo) and real-scan (ScanRet) datasets (Ye et al., 2024).

1. Pipeline and Input Representation

The MeshRet pipeline accepts as input a source motion sequence mA=(XA,QA)\mathbf{m}_A=(\mathbf{X}_A, \mathbf{Q}_A), where XA\mathbf{X}_A represents root joint translations and QA\mathbf{Q}_A represents 6D joint rotations over TT frames; source and target template geometries are denoted as GA=(OA,JA)\mathbf{G}_A=(\mathbf{O}_A, \mathbf{J}_A) and GB=(OB,JB)\mathbf{G}_B=(\mathbf{O}_B, \mathbf{J}_B), encoding mesh vertices and rest-pose skeletons. The core stages are:

  • Semantically Consistent Sensor extraction (SCS): SA=Fs(GA)∈RS×4×3\mathbf{S}_A = \mathcal{F}_s(\mathbf{G}_A) \in \mathbb{R}^{S \times 4 \times 3}, with SB\mathbf{S}_B likewise extracted.
  • Sensor Forward Kinematics and Dense Mesh Interaction (DMI) field construction: The function Fd\mathcal{F}_d maps source motion and SCS locations to a field DA∈RT×K×L×P\mathbf{D}_A \in \mathbb{R}^{T \times K \times L \times P}, describing spatio-temporal interactions.
  • Retargeting Network: A transformer encoder/decoder produces the predicted target sequence XA\mathbf{X}_A0, where, unlike prior methods, both motion semantics and geometry interactions are optimized simultaneously.

This framework obviates the need for post-hoc collision or contact correction by redefining the retargeting target: the preservation of the dense geometric interaction field itself.

2. Semantically Consistent Sensors (SCS)

SCS are sets of dense, taxonomy-aligned sample points established on both source and target meshes regardless of surface topology. Each sensor is parameterized by a semantic triplet XA\mathbf{X}_A1:

  • XA\mathbf{X}_A2: Bone index (XA\mathbf{X}_A3), referencing the skeleton's medial axis.
  • XA\mathbf{X}_A4: Normalized offset along bone XA\mathbf{X}_A5.
  • XA\mathbf{X}_A6: Ray angle within the local plane orthogonal to bone XA\mathbf{X}_A7.

Positions are determined by algorithmically casting a ray from XA\mathbf{X}_A8 in direction XA\mathbf{X}_A9 and recording the intersection with mesh surface for bone QA\mathbf{Q}_A0. Each sensor yields a tuple QA\mathbf{Q}_A1, the 3D intersection and the tangent-space basis. By synchronizing the QA\mathbf{Q}_A2 indexing on both characters, semantically aligned, dense mesh correspondences are maintained even for dissimilar topologies.

3. Dense Mesh Interaction Field and Sparsification

The DMI field encodes relative geometric relationships between SCS points over time:

  • For each pair QA\mathbf{Q}_A3 at frame QA\mathbf{Q}_A4,

QA\mathbf{Q}_A5

  • The full DMI is thus QA\mathbf{Q}_A6.
  • For computational tractability, pairs are sparsified: QA\mathbf{Q}_A7 observation sensors are selected, each referencing QA\mathbf{Q}_A8 associated target sensors (half nearest, half furthest), yielding a final representation of

QA\mathbf{Q}_A9

  • The field can be conceptualized in continuous fashion via weighted aggregation over SCS.

Sparsification preserves key contact and farfield interactions, ensuring critical relationships (e.g., hand-to-body, foot-to-ground) are prioritized in the retargeting process.

4. Loss Functions and Training Objectives

MeshRet is trained without reference to ground-truth target motions (unsupervised for target), using the following loss terms:

  • Pose regularization (reconstruction):

TT0

  • DMI consistency (geometry interaction preservation):

TT1

  • Adversarial loss (motion plausibility):

TT2

  • End-effector orientation loss:

TT3

  • Total loss:

TT4

All losses, except the adversarial and end-effector terms, are computed through DMI or the original source motion, allowing the framework to learn to synthesize plausible target motions that preserve fine-grained geometric relationships.

5. Network Architecture

MeshRet employs a combination of PointNet and Transformer modules:

  • SCS Geometry Encoder: PointNet-style aggregation over sensor features TT5 produces global geometry encodings TT6.
  • DMI Encoder: A two-stage PointNet pipeline:
    • Per-sensor: Each TT7 point cloud comprising TT8 per sensor is encoded as TT9.
    • Per-frame: Aggregation across GA=(OA,JA)\mathbf{G}_A=(\mathbf{O}_A, \mathbf{J}_A)0 sensors yields GA=(OA,JA)\mathbf{G}_A=(\mathbf{O}_A, \mathbf{J}_A)1.
  • Transformer Retargeting Network: The encoder receives DMI and target geometry, and the decoder receives source joint pose and geometry. Configuration: 8 transformer layers, 4 attention heads, feed-forward size 256, GA=(OA,JA)\mathbf{G}_A=(\mathbf{O}_A, \mathbf{J}_A)2. A specialized attention mask couples each target frame prediction to its corresponding DMI representation, ensuring temporal and spatial coherence.

6. Contact Preservation and Self-Interpenetration Avoidance

The application of DMI-driven loss terms enforces fidelity in both contact and near-contact relationships. For any pair GA=(OA,JA)\mathbf{G}_A=(\mathbf{O}_A, \mathbf{J}_A)3, large changes in vector direction or shrinkage (potentially reflecting mesh collision or loss of contact) are penalized via GA=(OA,JA)\mathbf{G}_A=(\mathbf{O}_A, \mathbf{J}_A)4. For contact pairs (determined by a threshold GA=(OA,JA)\mathbf{G}_A=(\mathbf{O}_A, \mathbf{J}_A)5 arm diameter), this effect forces the model to preserve semantic and physical plausibility. No explicit mesh collision or contact penalty is required; preservation of the learned DMI field is sufficient to implicitly avoid most self-intersections and maintain correct contacts.

A plausible implication is that, due to the explicit modeling of pairwise geometric relationships in sensor-tangent space, MeshRet generalizes robustly across both synthetic and real-world mesh scanning domains, where geometry topology can be highly variable.

7. Empirical Results and Benchmarks

MeshRet was evaluated quantitatively and qualitatively on the Mixamo+ (cartoon+ScanRet) and ScanRet datasets. Key metrics include contact error, penetration percentage, and joint mean squared error (MSE). Table 1 summarizes core results.

Metric Ours PMnet SAN R²ET
Contact Error (Mixamo+) 0.772 2.716 2.432 2.209
Contact Error (ScanRet) 0.284 0.890 0.627 0.589
Penetration % (Mixamo+) 3.45 5.23 4.95 4.21
Penetration % (ScanRet) 1.59 2.23 1.72 2.01
Joint MSE (ScanRet) 0.047 0.130 0.049 0.063

User studies (600 comparisons) reported approximately 81% preference for MeshRet across overall motion quality, contact accuracy, and semantics preservation.

References

"Skinned Motion Retargeting with Dense Geometric Interaction Perception" (Ye et al., 2024)

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MeshRet.