3D Morphable Face Models -- Past, Present and Future (1909.01815v2)
Abstract: In this paper, we provide a detailed survey of 3D Morphable Face Models over the 20 years since they were first proposed. The challenges in building and applying these models, namely capture, modeling, image formation, and image analysis, are still active research topics, and we review the state-of-the-art in each of these areas. We also look ahead, identifying unsolved challenges, proposing directions for future research and highlighting the broad range of current and future applications.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
What is this paper about?
This paper looks back at 20 years of “3D Morphable Face Models” (often called 3DMMs). A 3DMM is a computer model that can create and edit realistic 3D faces. Think of it like a smart face generator: it learns from many real faces and then can make new ones, change expressions, or match a face in a photo to a 3D shape.
The authors review how these models are built, how they are used to analyze images, what the current best methods are, and what challenges still remain. They also point to exciting future directions and real‑world uses.
Big questions the paper explores
- How do we capture 3D face data from real people well enough to build a model?
- How do we organize and “align” face data so the model knows that the same points (like the tip of the nose) match across different faces?
- How do we model the shape of faces, their expressions, and their skin appearance separately?
- How do we render (draw) a realistic 2D image from a 3D face, and how do we reverse that to recover 3D from a single photo?
- How do modern deep learning techniques connect with 3DMMs?
- What makes these models useful, and what ethical risks come with them?
How do researchers paper and use 3D Morphable Face Models?
The basic idea
Two key ideas power 3DMMs:
- Dense correspondence: all faces in the dataset are lined up point‑by‑point. That means “vertex 12345” on every face is the same spot (say, a point on the left eyelid). Because every face has the same structure, you can “mix” them meaningfully.
- Separation of factors: the model tries to separate what belongs to the person (face shape and true skin color) from outside things (camera and lighting). This lets you, for example, change the lighting without changing the person’s actual skin tone.
The model often uses statistics (like finding the “average face” and the main ways faces vary) to create a compact “face space.” You can imagine sliders that control the nose width, cheekbone height, or jaw shape—those sliders are the model’s parameters.
Building a 3D face dataset (capture)
Researchers capture faces in 3D and record appearance so the model has examples to learn from. Here are the main ways, explained in everyday terms:
- Geometric methods: measure the 3D shape directly.
- Active methods (they shine light or patterns): lasers, structured light (projecting coded patterns), or time‑of‑flight sensors (they time how long light takes to bounce back).
- Passive methods (no special light): multi‑camera photogrammetry, which uses many photos from different angles to reconstruct shape.
- Photometric methods: measure how the surface reflects light to estimate surface orientation (which way the skin “points”) and then integrate to get shape. A “light stage” surrounds a person with many lights to capture fine skin details.
- “Surface normals” are directions the skin points; imagine tiny arrows standing on the skin telling which way each spot faces.
- Hybrid methods: combine both. For example, use multi‑view geometry for overall shape and photometric data for fine wrinkles and pores. This reduces low‑frequency errors (big shape bias) while adding high‑frequency details (tiny bumps and lines).
- Appearance capture: record intrinsic skin color (often called “albedo”) separately from shading and highlights.
- “Albedo” is the true color of the skin without lighting effects.
- Polarized light setups help separate matte (diffuse) skin color from shiny (specular) reflections.
Some face parts need special tricks (eyes, teeth, hair, tongue, jaw), and dynamic capture records faces moving through expressions over time.
Modeling faces
Once you have aligned 3D face scans (same mesh structure and matching points), you build models that capture different kinds of variation:
- Shape model (identity): how different people’s face shapes differ. Often a statistical model finds the main “directions” faces vary (like principal components), which become sliders.
- Expression model: how a single person’s face changes when they smile, frown, or speak. Models can add expression changes on top of identity shape (additive) or blend in more complex ways (multiplicative or nonlinear).
- Appearance model: a model of skin color and sometimes shine/specular effects, separate from lighting, so faces can be “relit” under new lights.
A core step is correspondence: making sure every mesh vertex means the same anatomical spot across all faces, so the statistics are meaningful.
Turning 3D into 2D pictures (image synthesis)
Computer graphics renders a 3D face into a 2D image by simulating the camera and lights. This includes perspective projection (how things look smaller when farther away), types of lights (ambient, directional), and surface reflectance models (like Phong, which adds shiny highlights). This lets a model generate realistic face images.
Going from a photo back to 3D (analysis‑by‑synthesis)
To estimate a 3D face from a single photo, the model:
- Guesses the 3D face parameters (shape, expression, appearance) and camera/lighting.
- Renders the guess into a 2D image.
- Compares it to the real photo.
- Adjusts the guess to make the render match the photo better.
This loop is powerful but hard: multiple explanations can fit the same picture (e.g., lighting vs. skin color), and optimization can get stuck in “local optima” (a nearby but wrong solution).
Deep learning meets 3DMMs
Recently, neural networks have learned to estimate 3DMM parameters directly from 2D images, often using huge photo datasets. Deep learning can also help build better models by learning statistics from more data and handling “in‑the‑wild” images (uncontrolled lighting and pose).
What did the survey find and why it matters?
- 3DMMs have had lasting impact: they started as groundbreaking research, faded a bit, and came roaring back with deep learning. Today they sit inside many state‑of‑the‑art systems for face analysis, animation, and re‑enactment.
- There’s a big gap between what capture systems can do and what public datasets provide. Many public 3D face datasets are moderate quality and lack detailed appearance, which limits how good models can be.
- Getting dense correspondence and separating intrinsic skin color from lighting are still central technical challenges.
- Fitting 3D models to 2D images is hard because of ambiguities (shape vs. camera, lighting vs. skin) and can be computationally heavy.
- Applications are broad: movies and games (realistic faces and expressions), AR/VR avatars, video conferencing, medical analysis, security, and research into human perception.
- Ethical concerns are serious: highly realistic face models can be used to create convincing fake images and videos. Privacy, consent (especially for minors), dataset bias (skin tone, age, gender, ethnicity), and fair sampling are major issues the community must address.
Implications and future impact
This research lays the foundation for more lifelike digital humans and better tools that understand faces from photos and videos. In the next decade, expect:
- More accurate and general models (covering full heads, hair, teeth, eyes) that work across ages and ethnicities.
- Stronger ties to deep learning for faster, more reliable 3D recovery from everyday photos and even single smartphone shots.
- Better ways to separate identity, expression, and lighting, making relighting, animation, and cross‑domain editing (like face reenactment) more controllable.
- Broader, higher‑quality public datasets—if the community can solve practical, ethical, and legal hurdles.
At the same time, society will need clear rules and tools to prevent misuse (for example, deepfakes) and to protect people’s identities and privacy. The paper argues that computer vision and graphics researchers should work with ethicists, lawyers, and policymakers to guide responsible progress.
Collections
Sign up for free to add this paper to one or more collections.







