Papers
Topics
Authors
Recent
2000 character limit reached

Category-Free 3D Pose Transfer

Updated 30 November 2025
  • The paper introduces a framework that transfers 3D poses across structurally distinct meshes without relying on shared skeletal rigs using implicit representations.
  • It leverages data-driven shape understanding, soft keypoints, and optimal transport methods to achieve robust cross-category deformation.
  • Empirical evaluations demonstrate improved metrics like PMD, CD, and EMD with strong generalization across diverse geometries and stylized characters.

Category-free 3D pose transfer encompasses methods that transfer the pose from a source 3D shape to a structurally distinct target, without dependence on shared skeletal rigs, explicit correspondences, or constrained class-specific priors. This capability is fundamental for animation, shape deformation, style transfer, and embodied agent control across diverse object categories, from humans and animals to stylized or synthetic 3D characters. Contemporary frameworks achieve strong generalization by leveraging data-driven shape understanding, soft or learned part correspondences, implicit or hybrid deformation models, and robust loss formulations that disentangle pose from identity. The following sections synthesize the core principles and methodologies underlying leading approaches to category-free pose transfer.

1. Formal Problem Definition and Scope

In category-free 3D pose transfer, the objective is to deform a target mesh S={xi}\mathbf{S} = \{x_i\}—in rest pose or canonical configuration—such that it acquires the pose PP of a source instance (e.g., via joint rotations, spatial deformations, or latent pose codes) yet fully preserves its own identity and geometry. The absence of semantic, topological, or anatomical correspondence assumptions distinguishes this setting from classical skeleton-based or identity-constrained methods. Key inputs and outputs are typically:

  • Inputs:
    • Stylized or arbitrary target mesh/point-cloud in rest configuration.
    • Reference pose information (e.g., via source mesh in desired pose, joint parameters, or implicit pose code).
  • Output:
    • The target mesh, articulated into the reference pose, maintaining its intrinsic shape identity.

Leading works formulate the deformation mapping as an implicit or explicit function: fθ:(S,P)S^f_\theta : (\mathbf{S}, P) \mapsto \hat{\mathbf{S}} where S^\hat{\mathbf{S}} preserves the target's identity but adopts the source pose (Wang et al., 2023, Yoo et al., 14 Jun 2024).

2. Disentanglement of Shape and Pose

A foundational aspect for category-free transfer is the disentanglement of shape (identity-specific details) and pose (deformation or articulation):

  • Hybrid Keypoint Representations: Neural frameworks extract a set of KK sparse, spatially distributed keypoints (e.g., by farthest point sampling) with accompanying learned descriptors, capturing both global pose layout and localized deformation cues. This hybrid representation compresses pose information and enables transfer across diverse identities (Yoo et al., 14 Jun 2024).
  • Implicit Deformation Fields & Face Jacobian Decoding: Rather than conditioning on explicit skeletons or rig-specific features, implicit fields parameterize mesh deformation per vertex or per face, with the face Jacobian approach yielding a locally linearized mapping suitable for reconstructing the deformed mesh via Poisson-based solvers (Yoo et al., 14 Jun 2024).
  • Per-point Feature Embeddings: Local point-wise features, extracted in a semi-supervised or self-supervised manner, feed into deformation modules conditioned on low-dimensional pose codes (such as those produced by pretrained VPoser networks). These approaches are agnostic to mesh topology and class structure (Wang et al., 2023).
  • Adversarial and Latent Space Approaches: GAN-based models encode pose and shape as discrete latent codes (pose code zez_e, shape code ziz_i), with pose transfer realized by code swapping and generative mapping, enforcing a structured latent space for disentanglement (Chen et al., 2021).

3. Learning Correspondences and Soft Part-based Articulation

A central challenge in category-free setting is the absence of hand-annotated correspondences or shared skeletons. Recent methodologies address this via:

  • Soft and Many-to-many Correspondence: Transformer architectures or Sinkhorn-normalized similarity matrices learn soft, doubly stochastic assignments between sets of keypoints, enabling many-to-many matching (source-to-target) without explicit constraints (Chai et al., 23 Nov 2025). Semantic keypoint labels and text encoders (e.g., CLIP) further support matching across semantically similar functional regions (e.g., human arms to bird wings).
  • Optimal Transport-based Dense Correspondence: Multiple works formulate dense, geometry-aware correspondences as entropic optimal transport problems, computing minimal-cost mappings between source and target vertices or features. These plans are then used for mesh warping (Song et al., 2022, Song et al., 2021).
  • Part Segmentation via Semi/Self-supervision: Semi-supervised segmentation modules output soft partitionings (skin weights) over a fixed number of parts, enabling rest pose and deformed pose correspondence via a shared part index; rigid or locally-parameterized transforms are then estimated and applied per part (Liao et al., 2022).
  • Topology-agnostic Keypoint Detection: Weakly supervised methods use point cloud detectors (e.g., PointNet) to obtain semantically meaningful keypoints on arbitrary mesh connectivity, supporting explicit inverse kinematics (IK) or forward kinematics (FK) based transfer without paired meshes (Chen et al., 2023).

4. Deformation and Pose Transfer Mechanisms

Techniques for deformation from matched correspondences or representations vary:

  • Implicit Deformation Networks: Networks process per-point or per-part feature embeddings and pose codes to generate displacement vectors (offsets), yielding point-wise deformations applicable to surfaces of arbitrary topology (Wang et al., 2023, Yoo et al., 14 Jun 2024).
  • Blend-Skinning and Rigid-Part Models: In skeleton-free or learned-articulation frameworks, target meshes are decomposed into KK soft parts, each transformed by a rigid transform predicted via a learned decoder; combined using linear blend skinning (LBS) (Liao et al., 2022).
  • Conditional Refinement: After coarse warping or deformation, conditional normalization layers such as Elastic Instance Normalization (ElaIN) blend pose-dependent and identity-dependent statistics, refining outputs to maintain target identity and smoothness (Song et al., 2022, Song et al., 2021).
  • ARAP & Mesh Regularization: As-rigid-as-possible (ARAP) post-processing or regularization is used at training or inference to preserve local geometric consistency and minimize mesh artifacts (Chai et al., 23 Nov 2025, Song et al., 2022).

5. Training Paradigms and Supervision

Category-free approaches mitigate supervision constraints by combining multiple supervision levels and self-supervision objectives:

Method Data Requirement Supervision
ZPT (Wang et al., 2023) unpaired/nonrigged stylized, volume-based test-time,
paired non-stylized avatars occupancy, parts (semi-sup)
Neural Pose RL (Yoo et al., 14 Jun 2024) template + posed source meshes self-supervised adaptation
IEP-GAN (Chen et al., 2021) unpaired meshes unsupervised (GAN, co-occ)
MimiCAT (Chai et al., 23 Nov 2025) large-scale paired keypoints semantic, cycle-consistency
X-DualNet (Song et al., 2022) unpaired meshes cross-consistency, ARAP
MAPConNet (Sun et al., 2023) any (with or w/o labels) mesh/point contrastive
Weakly-supervised IK (Chen et al., 2023) sparse keypoints cycle/self-recon, pseudo-labels
Skeleton-free (Liao et al., 2022) rest pose + any pose mesh semi-supervised, cyclical

Training regimes exploit cycle and self-reconstruction losses, cross-domain consistency, and unsupervised or semi-supervised part segmentation to enable transfer without explicit ground truth for the target category or pose.

6. Quantitative Evaluation and Empirical Performance

Evaluation relies on both mesh distance metrics (e.g., Pointwise Mesh Distance (PMD), Chamfer Distance (CD), Earth Mover’s Distance (EMD)) and qualitative fidelity of articulation. Key empirical findings:

  • ZPT (Wang et al., 2023): Demonstrates effective transfer to stylized characters, outperforming state-of-the-art on unseen bipeds/quadrupeds, with no supervision on stylized poses.
  • Neural Pose RL (Yoo et al., 14 Jun 2024): Outperforms prior NJF, ZPT, SPT baselines on FID, KID, PMD, and classification metrics, with preserved detail in challenging animal-to-animal and Mixamo transfer.
  • IEP-GAN (Chen et al., 2021): Achieves significantly lower pose disentanglement error (0.19 cm) and strong generalization to hands, animals, humans—demonstrating category robustness.
  • MimiCAT (Chai et al., 23 Nov 2025): Excels in both intra-category and cross-category transfer (PMD=3.570/4.264), with soft matching enabling “arms-to-wings” transfers.
  • MAPConNet (Sun et al., 2023): Delivers superior PMD/CD/EMD under all supervision modes, with strong generalization to topologically mismatched or real-scan data.

A common observation is that models leveraging soft or implicit correspondences, part segmentation, and robust disentanglement consistently generalize better to new categories and extreme stylizations. Volume constraints, local Jacobian preservation, and ARAP refinement are critical to minimizing deformation artifacts and maintaining plausible geometry.

7. Open Challenges, Limitations, and Future Directions

Several limitations remain for fully category-free, production-grade pose transfer:

  • Fine-grained and Unseen Semantics: Methods may falter on topologies with unique appendages (e.g., tails, tentacles), out-of-distribution body plans, or mesh regions missing key semantic signals.
  • Temporal Coherence: Frame-wise transfer may induce temporal artifacts or interpenetration; sequence-level or motion-consistent models are needed (Chai et al., 23 Nov 2025, Liao et al., 2022).
  • Lack of Explicit Rigging: Most approaches still benefit from at least sparse semantic keypoints or rest-pose templates; inference without any semantic or geometric priors remains a major research direction (Chen et al., 2023).
  • Computational Scalability: Vanilla attention and dense transport solvers present bottlenecks on high-resolution meshes or extremely large datasets (Chai et al., 23 Nov 2025).
  • Integration of Texture and Non-manifold Structures: Current methods focus primarily on geometry; robust texture and topological transfer remain less explored.

Planned advances include unsupervised or equivariant keypoint discovery, learned bone hierarchies, temporally aware transformers, and improved geometric priors integrating ARAP, Poisson, or data-driven deformation models.


References:

  • "Zero-shot Pose Transfer for Unrigged Stylized 3D Characters" (Wang et al., 2023)
  • "Neural Pose Representation Learning for Generating and Transferring Non-Rigid Object Poses" (Yoo et al., 14 Jun 2024)
  • "Intrinsic-Extrinsic Preserved GANs for Unsupervised 3D Pose Transfer" (Chen et al., 2021)
  • "MimiCAT: Mimic with Correspondence-Aware Cascade-Transformer for Category-Free 3D Pose Transfer" (Chai et al., 23 Nov 2025)
  • "Unsupervised 3D Pose Transfer with Cross Consistency and Dual Reconstruction" (Song et al., 2022)
  • "MAPConNet: Self-supervised 3D Pose Transfer with Mesh and Point Contrastive Learning" (Sun et al., 2023)
  • "Weakly-supervised 3D Pose Transfer with Keypoints" (Chen et al., 2023)
  • "Skeleton-free Pose Transfer for Stylized 3D Characters" (Liao et al., 2022)
  • "3D Pose Transfer with Correspondence Learning and Mesh Refinement" (Song et al., 2021)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Category-Free 3D Pose Transfer.