3D-CODED : 3D Correspondences by Deep Deformation (1806.05228v2)

Published 13 Jun 2018 in cs.CV

Abstract: We present a new deep learning approach for matching deformable shapes by introducing {\it Shape Deformation Networks} which jointly encode 3D shapes and correspondences. This is achieved by factoring the surface representation into (i) a template, that parameterizes the surface, and (ii) a learnt global feature vector that parameterizes the transformation of the template into the input surface. By predicting this feature for a new shape, we implicitly predict correspondences between this shape and the template. We show that these correspondences can be improved by an additional step which improves the shape feature by minimizing the Chamfer distance between the input and transformed template. We demonstrate that our simple approach improves on state-of-the-art results on the difficult FAUST-inter challenge, with an average correspondence error of 2.88cm. We show, on the TOSCA dataset, that our method is robust to many types of perturbations, and generalizes to non-human shapes. This robustness allows it to perform well on real unclean, meshes from the the SCAPE dataset.

Citations (313)

View on Semantic Scholar

Summary

The paper introduces Shape Deformation Networks that learn 3D correspondences by transforming a template to match input shapes.
The encoder-decoder architecture minimizes Chamfer distance to refine reconstructions, achieving an average error of 2.88 cm on the FAUST challenge.
The approach generalizes across varied datasets and deformations, paving the way for enhanced applications in animation, VR, and AR.

3D-CODED: 3D Correspondences by Deep Deformation

The paper entitled "3D-CODED: 3D Correspondences by Deep Deformation" introduces an innovative approach to matching deformable 3D shapes using deep learning techniques. The central proposition of the paper is the development of Shape Deformation Networks (SDNs) that utilize deep learning to encode 3D shapes and establish correspondences through a transformation of a template into the input surface. The methodology hinges on the premise that by predicting the transformation features for a new shape, correspondences with the template can be inferred implicitly.

Approach and Methodology

The research leverages an encoder-decoder architecture where the encoder extracts global features from the input shape, and the decoder uses these features to deform a common template to match input shapes. The authors propose a strategy to refine these correspondences by minimizing the Chamfer distance between the input shape and the deformed template, ensuring closer alignment during reconstruction.

The methodology is articulated in three stages:

Network Training: The encoder-network creates a global shape feature from input 3D points, which serves as input to the decoder, tasked with deforming a template to match the input shape.
Shape Reconstruction Optimization: The second stage involves optimizing the encoder's predicted global features further to improve the shape's reconstruction quality by minimizing the Chamfer distance. This stage is critical for ensuring precise and robust correspondences, especially in challenging deformation scenarios.
3D Shape Correspondence Mapping: Finally, to find correspondences between two 3D shapes, the method maps each point in the input to the closest point on the deformed template using the two learned transformations.

Experimental Validation

The experimental results demonstrate the effectiveness and robustness of the presented approach across varied datasets, including FAUST, TOSCA, and SCAPE, showing resilience to non-rigid deformations and perturbations like noise and holes. Notably, the method achieved an average correspondence error of 2.88 cm on the FAUST-inter challenge, outperforming prior state-of-the-art techniques.

The paper further extends the applicability of the method to novel, unclean datasets, such as animal shapes modeled using SMAL. This indicates the network's potential for cross-category generalization without necessitating extensive category-specific feature engineering, although the learning relies significantly on having large and representative training datasets, as the synthetic datasets used for training played a fundamental role in achieving the observed results.

Future Implications and Considerations

The implications of the research are manifold. Practically, this refined method for 3D shape matching holds promise for applications in animation, virtual reality, and augmented reality, where accurate model alignment is paramount. Theoretically, the paper suggests a shift towards implicit learning of transformation parameters through encoder networks, providing an alternative to traditional hand-crafted template methods that often entail higher computational costs and expert involvement.

One of the highlighted future avenues is enhancing the robustness of the correspondence mapping, particularly under circumstances involving intricate surface topologies or extreme transformations. Furthermore, exploring fewer reliance on extensive synthetic datasets, possibly through semi-supervised learning paradigms or enhanced data augmentation strategies, could propel this methodology towards wider applicability.

Overall, "3D-CODED" provides a comprehensive view of introducing automated, deep learning-based solutions for the challenging task of 3D shape correspondence, laying the groundwork for more exploratory research in 3D shape morphing and correspondence learning.

PDF Markdown