Papers
Topics
Authors
Recent
2000 character limit reached

Virtual Highlight Synthesis Pipeline

Updated 17 December 2025
  • Virtual Highlight Synthesis Pipeline is a physically driven method that creates synthetic specular highlights on RGB images using monocular geometry and 3D back-projection.
  • It employs a Blinn–Phong model enhanced with Schlick’s Fresnel approximation alongside randomized lighting to simulate a wide range of realistic specular effects.
  • The pipeline augments images by compositing synthetic highlights and masking pre-existing high-luminance regions, thereby improving supervision for highlight removal networks.

A Virtual Highlight Synthesis Pipeline is a physically motivated rendering and augmentation procedure designed to generate synthetic, photorealistic specular highlights on real RGB images, supporting the supervised training of highlight removal networks without paired ground truth. It provides physically plausible supervision by leveraging monocular geometry predictions, differentiable specular models, randomized lighting, and selective masking of existing highlight regions. This approach is implemented in the UnReflectAnything framework for RGB-only highlight removal (Rota et al., 10 Dec 2025).

1. Pipeline Stages and Purpose

The Virtual Highlight Synthesis Pipeline procedurally produces, from a single input RGB image I\mathbf{I} (with linear space values in [0,1][0,1]), a synthetic per-pixel specular highlight map HH and an augmented image Is\mathbf{I}^s, in which synthetic highlights have been alpha-composited atop the original content. The key stages are as follows:

  • Monocular Geometry Estimation: Predict scene depth D(u,v)D(u,v), normals n(u,v)\mathbf{n}(u,v), and camera intrinsics K\mathbf{K} for each pixel using an off-the-shelf monocular geometry network.
  • 3D Reconstruction: Back-project each pixel to 3D world coordinates X\mathbf{X} via X=D(p)K1p\mathbf{X} = D(p)\,\mathbf{K}^{-1}p for p=(u,v,1)p = (u,v,1)^\top.
  • Randomized Lighting Sampling: Randomly sample point-light parameters: position L\mathbf{L}, shininess SS, and intensity scale KHK_H, ensuring highlight variability.
  • Physically Based Specular Rendering: Synthesize per-pixel highlight intensity HH using a Blinn–Phong lobe modulated with Schlick’s Fresnel approximation.
  • Image Compositing: Overlay the synthesized highlights onto the input image to obtain Is\mathbf{I}^s.
  • Dataset Highlight Masking: Detect and mask high-luminance (pre-existing) highlights in the input to prevent unsuitable supervision.

This mechanism generates synthetic highlight-image pairs for use in training, circumventing the lack of ground-truth diffuse/specular separations in real-world imagery.

2. Monocular Geometry and Back-Projection

Monocular geometry estimation is critical for physically plausible rendering. The pipeline employs a pretrained network (e.g., MoGe-2) which outputs dense depth D(u,v)R+D(u,v)\in\mathbb{R}_+, surface normals n(u,v)R3\mathbf{n}(u,v)\in\mathbb{R}^3 (unit-length), and 3×33\times 3 intrinsic calibration matrix K\mathbf{K}.

For each image pixel p=(u,v,1)p=(u,v,1)^\top, 3D coordinates X\mathbf{X} are computed as:

X=D(p)  K1pR3,\mathbf{X} = D(p)\;\mathbf{K}^{-1} p \in \mathbb{R}^3,

with per-pixel view direction normalized as:

v=XX.\mathbf{v} = \frac{\mathbf{X}}{\|\mathbf{X}\|}.

This step allows differentiable mapping from image to 3D geometry, supporting per-pixel physically-based rendering.

3. Physically Inspired Specular Modeling

The highlight generation employs a Blinn–Phong specular model extended with Fresnel modulation for realism:

  • Half-vector Calculation:

l=LXLX,h=v+lv+l\mathbf{l} = \frac{\mathbf{L} - \mathbf{X}}{\|\mathbf{L} - \mathbf{X}\|},\quad \mathbf{h} = \frac{\mathbf{v} + \mathbf{l}}{\|\mathbf{v} + \mathbf{l}\|}

  • Schlick's Fresnel Term:

R(θ)=R0+(1R0)(1(vh))5R(\theta) = R_0 + (1 - R_0)\,\bigl(1 - (\mathbf{v}\cdot \mathbf{h})\bigr)^5

with R0R_0 (reflectance at normal incidence) set to $0.04$ by default.

  • Specular Intensity:

H=KH  R(θ)  (max(0,nh))SH = K_H \; R(\theta) \; \bigl(\max(0, \mathbf{n}\cdot \mathbf{h})\bigr)^S

where KHK_H is the highlight strength and SS the shininess exponent.

This model allows pixel-wise synthesis of realistic specular highlights under programmable illumination and surface properties.

4. Lighting Randomization and Sampling Strategies

Synthetic highlight appearance diversity is achieved by randomizing illumination and material parameters within specified ranges:

  • Light Position L\mathbf{L}: Sampled uniformly within a camera-space bounding box (e.g., x,y[1,1]x,y\in[-1,1] m, z[0.5,3]z\in[0.5,3] m, in front of camera).
  • Shininess Exponent SS: Drawn from U(50,200)\mathcal{U}(50,200), spanning broad to concentrated highlights.
  • Highlight Intensity KHK_H: Sampled from U(0.1,0.5)\mathcal{U}(0.1,0.5).

This approach causes the synthesized highlights to vary from small glints to broad, soft specular regions, which improves the diversity and robustness of downstream learning.

5. Image Compositing and Highlight Masking

Unlike full rendering, only the specular lobe is synthesized and alpha-composited with the original RGB image; there is no diffuse lobe or environment map:

Is=(1H)I+H(I+KH13)\mathbf{I}^s = (1 - H)\,\mathbf{I} + H \bigl(\mathbf{I} + K_H \mathbf{1}_3\bigr)

Here, HH serves as both an intensity and alpha map. The compositing ensures the highlight is physically integrated, preserving both color structure and highlight prominence.

High-luminance regions in the input (>>0.95 in luminance) are detected and masked as “dataset highlights”; such pixels are excluded from supervision, ensuring the network does not learn to treat ground-truth highlights as part of diffuse content.

6. Algorithmic Summary and Key Equations

The procedure is formally summarized in the following pseudocode, reflecting each principal stage:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
1. D, n  MonocularGeometryNetwork(I)
2. For each pixel (u,v):
     p  [u,v,1]^T
     X(u,v) = D(u,v) · K^{-1} p
     v̂(u,v) = normalize(X(u,v))
3. Sample:
     L  UniformBox([x_min..x_max],[y_min..y_max],[z_min..z_max])
     S  Uniform(S_min,S_max)
     K_H  Uniform(KH_min,KH_max)
4. For each pixel:
     l̂ = normalize(L  X)
     h  = normalize(v̂ + l̂)
     R_F = R0 + (1R0)·(1max(0, v̂·h))^5
     α  = max(0, n·h)
     H(u,v) = K_H · R_F · α^S
5. Composite:
     Iˢ = (1H)I + H(I + K_H)
6. Detect dataset highlights:
     mask_dataset = luminance(I)>τ_L
7. Output:
     Highlight map H
     Augmented RGB Iˢ
     Supervision masks (exclude mask_dataset)

All processing is performed in linear RGB space, and highlight compositing does not involve HDR tone-mapping or non-linear color transformation.

7. Application to Supervised Highlight Removal

The synthesized triplets (I,Is,H)(\mathbf{I}, \mathbf{I}^s, H) are used to supervise an RGB-only highlight removal network. The corresponding loss functions and supervision strategies are:

  • Highlight Detector Head: Trained to regress toward the synthetic highlight mask HH (excluding masked dataset highlights) using combined soft-Dice, L1, and Total Variation losses.
  • Diffuse Reconstruction: The network inpaints feature patches beneath synthetic and dataset highlights, matching features (E(I)E(\mathbf{I})) and enforcing losses (L1 and cosine) in the inpainted regions.
  • RGB Decoder: Fine-tuned with L1+SSIM loss for diffuse color reconstruction outside highlight regions; smooth seams and suppression of re-emergent specular peaks are explicitly penalized.

Because the input I\mathbf{I} may already contain non-negligible real highlights, masking high-luminance pixels (dataset highlights) in both supervision and loss computation is critical to avoid regressing toward spurious diffuse targets.

Table: Key Components and Parameters

Component Details and Ranges Mathematical Expression
Geometry Estimation Off-the-shelf, e.g., MoGe-2 D(u,v)D(u,v), n(u,v)\mathbf{n}(u,v)
Light Position L\mathbf{L}\sim uniform in [1,1]2×[0.5,3][-1,1]^2 \times [0.5,3] m Sampling space
Shininess Exponent SS U(50,200)\mathcal{U}(50,200) SS
Highlight Scale KHK_H U(0.1,0.5)\mathcal{U}(0.1,0.5) KHK_H
Highlight Detection Mask Luminance threshold >0.95>0.95 maskdataset_{\text{dataset}}

This synthesis procedure enables the use of arbitrary RGB images for the training of models requiring physically motivated and geometrically consistent specular-diffuse separations, providing supervision in the absence of physically captured paired data (Rota et al., 10 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Virtual Highlight Synthesis Pipeline.