Shape, Illumination, and Reflectance from Shading

Published 7 Oct 2020 in cs.CV | (2010.03592v1)

Abstract: A fundamental problem in computer vision is that of inferring the intrinsic, 3D structure of the world from flat, 2D images of that world. Traditional methods for recovering scene properties such as shape, reflectance, or illumination rely on multiple observations of the same scene to overconstrain the problem. Recovering these same properties from a single image seems almost impossible in comparison -- there are an infinite number of shapes, paint, and lights that exactly reproduce a single image. However, certain explanations are more likely than others: surfaces tend to be smooth, paint tends to be uniform, and illumination tends to be natural. We therefore pose this problem as one of statistical inference, and define an optimization problem that searches for the most likely explanation of a single image. Our technique can be viewed as a superset of several classic computer vision problems (shape-from-shading, intrinsic images, color constancy, illumination estimation, etc) and outperforms all previous solutions to those constituent problems.

Abstract PDF Upgrade to Chat

Authors (2)

Citations (699)

View on Semantic Scholar

Summary

The paper introduces SIRFS, a unified statistical framework for inferring shape, reflectance, and illumination from a single 2D image.
It applies L-BFGS optimization with tailored priors on shape smoothness, reflectance parsimony, and spherical-harmonic illumination to enhance scene reconstruction.
Experiments on the MIT-Berkeley dataset demonstrate superior performance over classical approaches, promising advances in photorealistic rendering and augmented reality.

Overview of "Shape, Illumination, and Reflectance from Shading"

The paper "Shape, Illumination, and Reflectance from Shading," authored by Jonathan T. Barron and Jitendra Malik, explores the complex problem of inferring intrinsic 3D scene properties—specifically shape, reflectance, and illumination—from a single 2D image. This task is often highly underconstrained, yet it is crucial in advancing various applications within computer vision.

Problem Formulation and Methodology

The authors approach the problem through a statistical inference framework, leveraging an optimization problem to identify the most likely explanations for a given single image. Their method, termed "Shape, Illumination, and Reflectance from Shading" (SIRFS), integrates and extends several classic computer vision problems, namely shape-from-shading, intrinsic images, color constancy, and illumination estimation.

The optimization problem can be summarized by the equation: $\underset{Z, L}{\operatorname{minimize}} \quad g(I - S(Z, L)) + f(Z) + h(L)$ Here, $Z$ represents the depth map, $L$ the spherical-harmonic model for illumination, and $I$ the input log-image. The function $S(Z, L)$ acts as a rendering engine that computes the log-shading image from $Z$ and $L$ . The terms $f(Z)$ , $g(R)$ , and $h(L)$ are priors over shape, reflectance, and illumination, respectively.

Priors and Statistical Regularities

Reflectance Priors

Reflectance priors capture the properties of natural images:

Smoothness: The differences between nearby log-intensity pixels often exhibit sparse, small changes.
Parsimony: Reflectance images typically have low entropy, implying that a small set of reflectance values (or colors) often capture the variability.
Absolute Reflectance: A smooth spline is fit to model prefered colors in log-reflectance space, accounting for dependencies across RGB channels.

Shape Priors

Priors over shape focus on geometric regularities:

Smoothness: The paper employs a constraint on the local variation of mean curvature, which is more rotation and scale-invariant than alternatives like the Laplacian.
Surface Isotropy: This prior enforces the assumption that shapes are just as likely to face any direction, resembling a "fronto-parallel" preference.
Occluding Contour: At the object's silhouette, normals tend to face outward, a fact incorporated using a heavy-tailed cost function.

Illumination Priors

Priors over illumination consist of fitting a multivariate Gaussian to spherical-harmonic coefficients. This involves both gray-scale and color illumination settings to model real-world variability.

Optimization and Computational Efficiency

Optimization is primarily carried out using L-BFGS within a multiscale optimization framework to overcome local minima. Critically, a series of transformations and approximations are applied to make the computation of cost functions efficient—most notably for quadratic entropy, which applies methods akin to the bilateral grid.

Experimental Validation

The authors introduce the MIT-Berkeley Intrinsic Images dataset, an enhanced version of a pre-existing dataset with additional ground-truth for shapes and illumination, relevant for evaluating their algorithm.

Three primary settings were tested:

Grayscale under Laboratory Illumination
Color under Laboratory Illumination
Color under Natural Illumination

SIRFS outperformed several baselines: classic intrinsic image methods followed by shape-from-shading, naive methods, and simple priors.

Implications and Future Directions

The significance of SIRFS is underscored by its unified, statistically grounded approach. By simultaneously considering multiple intrinsic scene properties and their interdependencies, SIRFS mitigates the shortcomings of past piecewise methods.

Practically, this approach can revolutionize fields such as photorealistic rendering, object detection, and augmented reality, where understanding and manipulating the intrinsic properties of scenes are vital. Theoretically, it provides a scaffold upon which more intricate models—incorporating specularities, mutual illumination or occlusion—can be built.

For future research, incorporating class-specific priors informed by object recognition could further improve accuracy. Additionally, a more sophisticated handling of spatially-varying illumination and non-Lambertian surfaces would extend the algorithm's applicability to a broader range of real-world scenarios.

In conclusion, the methodology and insights presented in this paper represent a substantial step forward in the pursuit of accurately recovering rich scene attributes from minimal input, promising significant advancements in both theoretical and practical applications within computer vision.

Markdown Report Issue