3D Morphable Face Models -- Past, Present and Future (1909.01815v2)

Published 3 Sep 2019 in cs.CV, cs.GR, and cs.LG

Abstract: In this paper, we provide a detailed survey of 3D Morphable Face Models over the 20 years since they were first proposed. The challenges in building and applying these models, namely capture, modeling, image formation, and image analysis, are still active research topics, and we review the state-of-the-art in each of these areas. We also look ahead, identifying unsolved challenges, proposing directions for future research and highlighting the broad range of current and future applications.

Citations (391)

View on Semantic Scholar

Summary

The paper details a historical evolution of 3DMMs from PCA-based models to advanced deep learning approaches.
It introduces refined methodologies including multilinear models, CNN integrations, and UV texture mapping for enhanced facial reconstruction.
The study emphasizes practical applications in biometric authentication, forensics, virtual reality, and neuropsychological research.

3D Morphable Face Models: Past, Present, and Future

3D Morphable Face Models (3DMMs) play a crucial role in computational modeling of human faces, providing a comprehensive statistical framework for analyzing and synthesizing face shape and appearance. Since their introduction over two decades ago, these models have continually evolved, maintaining significant relevance in areas such as computer vision, graphics, and even psychology.

Figure 1: The visual abstract of the seminal work by Blanz and Vetter in 1999 which proposed a statistical model for 3D face reconstruction from 2D images.

Historical Context and Development

The concept of 3DMMs was first introduced by Blanz and Vetter in 1999, revolutionizing the ability to reconstruct 3D face shapes from 2D images by representing faces as linear combinations of example shapes. The initial model utilized principal component analysis (PCA) to create a parameter space of faces, accounting for variations in identity and expression separately.

The initial application was in static face analysis, but over the years, improvements have integrated dynamic capture and performance-based modeling, incorporating expressions and even texture details. Researchers focused on improving the accuracy and realism of these models by increasing the quantity and diversity of the training datasets.

Figure 2: The analysis-by-synthesis pipeline used by Blanz and Vetter for reconstruction from a single image.

Current Methodologies

Recent advancements have expanded upon the original model by incorporating both linear and nonlinear modeling techniques, enhancing the ability to capture fine details. The use of multilinear models and sparse representation methods has allowed for better handling of person-specific characteristics and deformities.

One significant challenge remaining is the efficient reconstruction of texture details. Recent work often employs deep learning strategies to generate convincing facial textures, emphasizing the utility of 3DMMs in style transfer and face reenactment scenarios.

The integration of photogrammetric capture techniques has improved the geometric fidelity of the models. Techniques such as UV mapping have enabled more accurate texture representation by aligning textures to a standardized 2D domain.

Figure 3: Capture of intrinsic face properties using a hybrid geometric/photometric method.

Deep Learning Integration

The introduction of deep learning has substantially influenced the development and application of 3DMMs. Techniques like convolutional neural networks (CNNs) have been harnessed to learn the mapping from 2D images directly to 3DMM parameters, offering robust performance under various lighting and pose conditions.

Emergent methods also explore the direct learning of 3DMMs from large-scale natural image datasets. Such methodologies leverage differentiable rendering techniques to enforce photo-realistic reconstruction without the need for paired 3D ground truths.

Figure 4: The relationship between classical analysis-by-synthesis and deep learning approaches.

Applications and Impact

3DMMs serve a wide range of applications spanning from biometric authentication and forensics to entertainment and communication. The use in virtual and augmented reality to create realistic avatars exemplifies the model's impact, facilitating more immersive user experiences.

In forensic science, 3DMMs assist in facial reconstruction from skull remains, proving invaluable in both anthropological studies and criminal investigations.

In neuropsychological research, 3DMMs contribute to understanding facial perception by simulating plausible but non-existent faces for cognitive studies.

Conclusion

3D Morphable Face Models continue to be a pivotal tool in face modeling technology. While numerous challenges, such as achieving realistic texture synthesis and scaling to diverse face databases, remain, the integration of machine learning techniques heralds ongoing improvements in model fidelity and application breadth. The future of 3DMMs lies in refining these complex models to capture even subtler variations in human facial anatomy and appearance across varied demographic and environmental conditions.

PDF Markdown

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Explain it Like I'm 14

What is this paper about?

This paper looks back at 20 years of “3D Morphable Face Models” (often called 3DMMs). A 3DMM is a computer model that can create and edit realistic 3D faces. Think of it like a smart face generator: it learns from many real faces and then can make new ones, change expressions, or match a face in a photo to a 3D shape.

The authors review how these models are built, how they are used to analyze images, what the current best methods are, and what challenges still remain. They also point to exciting future directions and real‑world uses.

Big questions the paper explores

How do we capture 3D face data from real people well enough to build a model?
How do we organize and “align” face data so the model knows that the same points (like the tip of the nose) match across different faces?
How do we model the shape of faces, their expressions, and their skin appearance separately?
How do we render (draw) a realistic 2D image from a 3D face, and how do we reverse that to recover 3D from a single photo?
How do modern deep learning techniques connect with 3DMMs?
What makes these models useful, and what ethical risks come with them?

How do researchers paper and use 3D Morphable Face Models?

The basic idea

Two key ideas power 3DMMs:

Dense correspondence: all faces in the dataset are lined up point‑by‑point. That means “vertex 12345” on every face is the same spot (say, a point on the left eyelid). Because every face has the same structure, you can “mix” them meaningfully.
Separation of factors: the model tries to separate what belongs to the person (face shape and true skin color) from outside things (camera and lighting). This lets you, for example, change the lighting without changing the person’s actual skin tone.

The model often uses statistics (like finding the “average face” and the main ways faces vary) to create a compact “face space.” You can imagine sliders that control the nose width, cheekbone height, or jaw shape—those sliders are the model’s parameters.

Building a 3D face dataset (capture)

Researchers capture faces in 3D and record appearance so the model has examples to learn from. Here are the main ways, explained in everyday terms:

Geometric methods: measure the 3D shape directly.
- Active methods (they shine light or patterns): lasers, structured light (projecting coded patterns), or time‑of‑flight sensors (they time how long light takes to bounce back).
- Passive methods (no special light): multi‑camera photogrammetry, which uses many photos from different angles to reconstruct shape.
Photometric methods: measure how the surface reflects light to estimate surface orientation (which way the skin “points”) and then integrate to get shape. A “light stage” surrounds a person with many lights to capture fine skin details.
- “Surface normals” are directions the skin points; imagine tiny arrows standing on the skin telling which way each spot faces.
Hybrid methods: combine both. For example, use multi‑view geometry for overall shape and photometric data for fine wrinkles and pores. This reduces low‑frequency errors (big shape bias) while adding high‑frequency details (tiny bumps and lines).
Appearance capture: record intrinsic skin color (often called “albedo”) separately from shading and highlights.
- “Albedo” is the true color of the skin without lighting effects.
- Polarized light setups help separate matte (diffuse) skin color from shiny (specular) reflections.

Some face parts need special tricks (eyes, teeth, hair, tongue, jaw), and dynamic capture records faces moving through expressions over time.

Modeling faces

Once you have aligned 3D face scans (same mesh structure and matching points), you build models that capture different kinds of variation:

Shape model (identity): how different people’s face shapes differ. Often a statistical model finds the main “directions” faces vary (like principal components), which become sliders.
Expression model: how a single person’s face changes when they smile, frown, or speak. Models can add expression changes on top of identity shape (additive) or blend in more complex ways (multiplicative or nonlinear).
Appearance model: a model of skin color and sometimes shine/specular effects, separate from lighting, so faces can be “relit” under new lights.

A core step is correspondence: making sure every mesh vertex means the same anatomical spot across all faces, so the statistics are meaningful.

Turning 3D into 2D pictures (image synthesis)

Computer graphics renders a 3D face into a 2D image by simulating the camera and lights. This includes perspective projection (how things look smaller when farther away), types of lights (ambient, directional), and surface reflectance models (like Phong, which adds shiny highlights). This lets a model generate realistic face images.

Going from a photo back to 3D (analysis‑by‑synthesis)

To estimate a 3D face from a single photo, the model:

Guesses the 3D face parameters (shape, expression, appearance) and camera/lighting.
Renders the guess into a 2D image.
Compares it to the real photo.
Adjusts the guess to make the render match the photo better.

This loop is powerful but hard: multiple explanations can fit the same picture (e.g., lighting vs. skin color), and optimization can get stuck in “local optima” (a nearby but wrong solution).

Deep learning meets 3DMMs

Recently, neural networks have learned to estimate 3DMM parameters directly from 2D images, often using huge photo datasets. Deep learning can also help build better models by learning statistics from more data and handling “in‑the‑wild” images (uncontrolled lighting and pose).

What did the survey find and why it matters?

3DMMs have had lasting impact: they started as groundbreaking research, faded a bit, and came roaring back with deep learning. Today they sit inside many state‑of‑the‑art systems for face analysis, animation, and re‑enactment.
There’s a big gap between what capture systems can do and what public datasets provide. Many public 3D face datasets are moderate quality and lack detailed appearance, which limits how good models can be.
Getting dense correspondence and separating intrinsic skin color from lighting are still central technical challenges.
Fitting 3D models to 2D images is hard because of ambiguities (shape vs. camera, lighting vs. skin) and can be computationally heavy.
Applications are broad: movies and games (realistic faces and expressions), AR/VR avatars, video conferencing, medical analysis, security, and research into human perception.
Ethical concerns are serious: highly realistic face models can be used to create convincing fake images and videos. Privacy, consent (especially for minors), dataset bias (skin tone, age, gender, ethnicity), and fair sampling are major issues the community must address.

Implications and future impact

This research lays the foundation for more lifelike digital humans and better tools that understand faces from photos and videos. In the next decade, expect:

More accurate and general models (covering full heads, hair, teeth, eyes) that work across ages and ethnicities.
Stronger ties to deep learning for faster, more reliable 3D recovery from everyday photos and even single smartphone shots.
Better ways to separate identity, expression, and lighting, making relighting, animation, and cross‑domain editing (like face reenactment) more controllable.
Broader, higher‑quality public datasets—if the community can solve practical, ethical, and legal hurdles.

At the same time, society will need clear rules and tools to prevent misuse (for example, deepfakes) and to protect people’s identities and privacy. The paper argues that computer vision and graphics researchers should work with ethicists, lawyers, and policymakers to guide responsible progress.

View Paper Prompt View All Prompts

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

Continue Learning

Authors (13)

Collections

YouTube

Show All Videos

3D Morphable Face Models -- Past, Present and Future (1909.01815v2)

Summary

3D Morphable Face Models: Past, Present, and Future

Historical Context and Development

Current Methodologies

Deep Learning Integration

Applications and Impact

Conclusion

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

What is this paper about?

Big questions the paper explores

How do researchers paper and use 3D Morphable Face Models?

The basic idea

Building a 3D face dataset (capture)

Modeling faces

Turning 3D into 2D pictures (image synthesis)

Going from a photo back to 3D (analysis‑by‑synthesis)

Deep learning meets 3DMMs

What did the survey find and why it matters?

Implications and future impact

Open Problems

Continue Learning

Related Papers

Authors (13)

Collections

YouTube