Seeing the World through Your Eyes (2306.09348v2)

Published 15 Jun 2023 in cs.CV

Abstract: The reflective nature of the human eye is an underappreciated source of information about what the world around us looks like. By imaging the eyes of a moving person, we can collect multiple views of a scene outside the camera's direct line of sight through the reflections in the eyes. In this paper, we reconstruct a 3D scene beyond the camera's line of sight using portrait images containing eye reflections. This task is challenging due to 1) the difficulty of accurately estimating eye poses and 2) the entangled appearance of the eye iris and the scene reflections. Our method jointly refines the cornea poses, the radiance field depicting the scene, and the observer's eye iris texture. We further propose a simple regularization prior on the iris texture pattern to improve reconstruction quality. Through various experiments on synthetic and real-world captures featuring people with varied eye colors, we demonstrate the feasibility of our approach to recover 3D scenes using eye reflections.

References (55)

Citations (13)

View on Semantic Scholar

Summary

The paper presents a novel approach that harnesses human eye reflections with neural radiance fields to reconstruct 3D scenes.
It refines cornea poses while employing a radial prior-based texture decomposition to disentangle iris patterns from environmental reflections.
Experimental validation on synthetic and real-world data shows improved image similarity and perceptual quality, confirmed by SSIM and LPIPS metrics.

Overview of "Seeing the World through Your Eyes"

The paper "Seeing the World through Your Eyes" by Hadi Alzayer, Kevin Zhang, Brandon Feng, Christopher Metzler, and Jia-Bin Huang presents a novel approach to reconstructing 3D scenes using reflections captured from human eyes. Leveraging the natural reflective properties of the human cornea, this work introduces a method to approximate the observer's view of a scene in three dimensions without requiring the camera to be directly oriented toward the scene. The approach exploits the cornea as a catadioptric system, and effectively turns it into a mirror, capturing reflections of the surrounding environment.

Methodology

The authors utilize a stationary camera setup to capture the reflections from a person's eyes as they naturally move their heads. The primary challenges addressed include estimating accurate cornea poses and resolving the mixed signal of eye iris textures and scene reflections. The proposed method refines the cornea poses, reconstructs the 3D scene via neural radiance fields (NeRF), and models the observer's iris texture concurrently.

Radiance Field Reconstruction: The method modifies standard NeRFs to use reflections from the cornea, treating eye reflections as indirect views of the scene. It calculates rays that reflect off the cornea surface, thereby allowing the NeRF to synthesize views and reconstruct the 3D scene from these non-traditional angles.
Texture Decomposition: Recognizing the challenging entanglement of detailed iris textures and scene reflections, the approach introduces a 2D texture decomposition field. This field capitalizes on a radial prior to help isolate and mitigate the confounding influence of the iris pattern on the scene rendering.
Cornea Pose Refinement: Accurate pose estimation of the cornea is crucial for viable multi-view reconstruction. An optimization strategy refines the initial pose guesses, adjusting for the intrinsic difficulties of eye localization and head movement variability.

Experimental Validation

The paper validates the proposed methodology with synthetic and real-world experiments. Synthetic data tests in controlled environments demonstrate the robustness of the approach against pose estimation noise. In real-world setups, the method captures reflections using a static camera and real-time head movements, successfully reconstructing scenes under varied lighting and cornea visibility conditions.

Key numerical results underscore the effectiveness of texture decomposition and pose optimization in enhancing reconstruction fidelity. Quantitative metrics (SSIM and LPIPS) highlight improvements in image similarity and perceptual quality when employing these enhancements.

Implications and Future Directions

This work presents significant implications for extending the capabilities of non-line-of-sight imaging using biological features. By reducing reliance on specialized equipment and leveraging ubiquitous human interactions, such as looking, the approach has potential applications in surveillance, entertainment, and augmented reality. The methodology envisions further advancements in integrating accidently captured imagery into coherent physical reconstructions, offering new insights into passive imaging systems in dynamic environments.

Future research could address the limitations of controlled settings by exploring broader applications, such as dynamic movement scenarios and diverse environmental conditions. Enhancements in iris texture modeling and pose estimation could foster even more robust solutions for various practical settings. The integration of machine learning with implicit scene understanding represents a promising avenue for bridging gaps between vision technology and natural human behavior.

PDF Markdown

Related Papers

YouTube

Show All Videos