Procrustean Regression Networks: Learning 3D Structure of Non-Rigid Objects from 2D Annotations (2007.10961v1)

Published 21 Jul 2020 in cs.CV

Abstract: We propose a novel framework for training neural networks which is capable of learning 3D information of non-rigid objects when only 2D annotations are available as ground truths. Recently, there have been some approaches that incorporate the problem setting of non-rigid structure-from-motion (NRSfM) into deep learning to learn 3D structure reconstruction. The most important difficulty of NRSfM is to estimate both the rotation and deformation at the same time, and previous works handle this by regressing both of them. In this paper, we resolve this difficulty by proposing a loss function wherein the suitable rotation is automatically determined. Trained with the cost function consisting of the reprojection error and the low-rank term of aligned shapes, the network learns the 3D structures of such objects as human skeletons and faces during the training, whereas the testing is done in a single-frame basis. The proposed method can handle inputs with missing entries and experimental results validate that the proposed framework shows superior reconstruction performance to the state-of-the-art method on the Human 3.6M, 300-VW, and SURREAL datasets, even though the underlying network structure is very simple.

Citations (17)

View on Semantic Scholar

Summary

The paper introduces a Procrustes-based loss function that automatically infers rotations without explicit estimation.
The paper presents a deep learning framework that integrates NRSfM techniques, achieving lower MPJPE on benchmarks like Human 3.6M.
The paper demonstrates robust performance in handling non-rigid objects and missing data, enhancing practical 3D reconstruction capabilities.

Overview of Procrustean Regression Networks for Learning 3D Structure from 2D Annotations

The paper "Procrustean Regression Networks: Learning 3D Structure of Non-Rigid Objects from 2D Annotations" introduces an innovative framework for training neural networks to infer 3D structural information from 2D inputs. This approach is particularly applied to scenarios involving non-rigid objects, where the task is inherently underconstrained compared to rigid-body scenarios. The proposed framework leverages a synergy between methods of non-rigid structure-from-motion (NRSfM) and deep learning, aiming to overcome the challenges associated with estimating both rotation and deformation.

Methodological Innovation

The central challenge addressed by the paper is the decomposition of motion and deformation. Traditional methods often require both components to be regressed independently, which can be a source of ambiguity and error. The novel contribution here is the introduction of a loss function predicated upon Procrustean alignment, which facilitates the automatic determination of suitable rotations without explicit rotation estimation. Key to this process is a reformulation of the NRSfM problem in terms of Procrustes analysis, where shapes are rigidly aligned to minimize discrepancy to an optimal reference shape.

The framework defines its main cost function consisting of two terms: a reprojection error term and a regularization term enforcing low-rank structure on the aligned shapes. By exploiting the analytical derivability of Procrustes-based alignments, the authors successfully integrate the benefits of powerful NRSfM estimation with the efficiency and generalization abilities of neural networks.

Empirical Evaluation

The Procrustean Regression Network (PRN) demonstrates superior performance in experiments across several datasets, including Human 3.6M, 300-VW, and SURREAL. Not only does PRN solve sequences more efficiently than existing NRSfM methodologies by associating shape learning to a learning paradigm that generalizes across tasks, but it also surpasses other cutting-edge neural approaches, such as C3DPO. Specifically, for Human 3.6M, the network achieves a lower mean per joint position error (MPJPE) under various input settings—from ground truth 2D projections to detected keypoints.

The model's robustness is further substantiated for challenging scenarios with missing data and when employed in dense 3D reconstructions, which exemplifies its flexibility and robustness across diverse domains of non-rigid objects.

Implications and Future Work

Procrustean Regression Networks provide a vital linkage between NRSfM algorithms and deep learning techniques, offering avenues for enhanced generalization and error resistance without intensive 3D ground truth data requirements. The integration of Procrustean methods not only refines accuracy but also simplifies the architectural complexity of solutions required for motion-deformation ambiguity resolution.

Looking towards future developments, this framework may be expanded by incorporating heatmap-based CNN architectures or perspective projection models to further augment its applicability to real-world problems involving occlusions and complex camera models. Its application could extend beyond human poses and faces, potentially aiding in fields like biomechanics, animated film production, and more challenging machine vision tasks.

Conclusion

The paper effectively demonstrates a sophisticated yet straightforward approach to solving the complexities of 3D reconstruction from 2D annotations. By marrying Procrustes concepts with contemporary deep learning frameworks, the authors offer a potent tool to the vast domain of non-rigid object analysis. This work stands to significantly influence subsequent research and models dealing with dimensionality estimation tasks.

PDF Markdown

Related Papers

YouTube

Show All Videos