Spherical Regression: Learning Viewpoints, Surface Normals and 3D Rotations on n-Spheres (1904.05404v1)

Published 10 Apr 2019 in cs.CV and cs.LG

Abstract: Many computer vision challenges require continuous outputs, but tend to be solved by discrete classification. The reason is classification's natural containment within a probability $n$-simplex, as defined by the popular softmax activation function. Regular regression lacks such a closed geometry, leading to unstable training and convergence to suboptimal local minima. Starting from this insight we revisit regression in convolutional neural networks. We observe many continuous output problems in computer vision are naturally contained in closed geometrical manifolds, like the Euler angles in viewpoint estimation or the normals in surface normal estimation. A natural framework for posing such continuous output problems are $n$-spheres, which are naturally closed geometric manifolds defined in the $\mathbb{R}^{(n+1)}$ space. By introducing a spherical exponential mapping on $n$-spheres at the regression output, we obtain well-behaved gradients, leading to stable training. We show how our spherical regression can be utilized for several computer vision challenges, specifically viewpoint estimation, surface normal estimation and 3D rotation estimation. For all these problems our experiments demonstrate the benefit of spherical regression. All paper resources are available at https://github.com/leoshine/Spherical_Regression.

Citations (64)

View on Semantic Scholar

Summary

The paper introduces a novel spherical exponential activation function that constrains outputs on n-spheres to enhance regression performance in 3D vision tasks.
It establishes a general framework that transforms continuous regression into a stable process for estimating viewpoints, surface normals, and 3D rotations.
Empirical results on multiple datasets demonstrate significantly improved metrics compared to traditional regression and classification methodologies.

Spherical Regression: Learning Viewpoints, Surface Normals, and 3D Rotations on n-Spheres

The paper "Spherical Regression: Learning Viewpoints, Surface Normals, and 3D Rotations on n-Spheres" by Shuai Liao, Efstratios Gavves, and Cees G. M. Snoek introduces a pioneering approach to continuous output problems in computer vision through the framework of spherical regression. This work tackles the limitations inherent in traditional regression methods and moves towards leveraging closed geometric manifolds such as n-spheres.

Overview

Continuous output problems are prevalent in computer vision, ranging from viewpoint estimation to surface normal and 3D rotation estimation. Existing solutions primarily rely on classification paradigms, despite the intrinsically continuous nature of these tasks. The authors address the paradox wherein regression—a more natural fit—is underutilized due to training instabilities and convergence issues rooted in the lack of a constrained output space.

Contributions

The paper posits that many of these vision problems can be transformed and resolved within the construct of n-spheres, where outputs naturally exist on closed geometric manifolds. The work extends the use of spherical exponential mappings to improve regression stability by constraining gradients similarly to classification scenarios. Notably, this constraint is achieved without directly relying on the raw embedded output, circumventing the downsides evidenced in conventional regression methods.

The key contributions of the paper are multi-faceted:

Spherical Exponential Activation Function: A novel activation function that ensures outputs reside on n-spheres, thereby stabilizing training and improving performance across multiple applications.
General Framework for Spherical Regression: The authors delineate how their spherical exponential function can serve divergent tasks such as Euler angles in viewpoint estimation or normals in surface normal estimation.
Empirical Validation Across Tasks: The paper rigorously evaluates spherical regression on viewpoint estimation, surface normals, and 3D rotations, showing marked improvements over existing techniques.

Numerical Results and Implications

Numerical experiments conducted by the authors demonstrate the advantages of spherical regression, evidenced by improved metrics across the board. Specifically, in the context of viewpoint estimation on Pascal3D+, the spherical regression model achieved superior medians and accuracies compared to state-of-the-art methods rooted in angle discretization.

In surface normal estimation on the NYU Depth v2 dataset, applying spherical regression leads to higher precision in detecting correct normals, particularly at finer granular levels of evaluation. Additionally, for the newly introduced task of 3D rotation estimation on the ModelNet10-SO3 dataset, spherical regression utilizing quaternions shows significant gains over direct regression, reiterating the importance of utilizing geometrically structured manifolds.

Theoretical and Practical Implications

The theoretical implications of this work are profound, suggesting a new standard for utilizing geometric constraints in regression problems traditionally handled with classifications. The framework provided aids in bypassing limitations associated with unconstrained gradient updates, promising not only enhanced stability but also improved learning accuracy. This could be particularly beneficial in scenarios requiring instantaneous and precise angle estimation, such as autonomous driving and robot navigation.

Practically, the anticipation is that, with these results, more computer vision models will adopt spherical regression particularly for tasks involving continuous manifolds. Furthermore, the work potentially inspires future exploration into additional manifold-constrained regression approaches, opening avenues for broadened applicability in other domains.

Future Directions

Promising future directions include expanding spherical regression to additional n-sphere related applications, such as motion tracking across a manifold of poses. Investigating hybrid models that combine spherical with other geometric manifold mappings might further enrich model performance. Moreover, unraveling the complete potential of these methods in unsupervised or semi-supervised learning settings could revolutionize the handling of continuous outputs in vision applications.

In conclusion, this paper contributes substantially to the enhancement of regression techniques by embedding them within the mathematically and geometrically rich framework of n-spheres, and it sets a compelling precedent for subsequent research in the field of geometric deep learning.