FlowIID: Efficient Intrinsic Image Decomposition

Updated 25 January 2026

FlowIID is an intrinsic image decomposition method that separates RGB images into albedo and shading components using a single deterministic pass under the Lambertian model.
It employs a unified VAE encoder-decoder with a UNet backbone and latent flow matcher to ensure consistency, stability, and a reduced parameter count compared to multi-step models.
It achieves competitive performance on benchmarks—such as a 0.0040 albedo MSE on the MIT Intrinsic dataset—making it ideal for real-time relighting and embedded vision applications.

FlowIID is an intrinsic image decomposition (IID) architecture that factorizes an input RGB image $I$ into its albedo (reflectance, $A$ ) and shading (illumination, $S$ ) components under the standard Lambertian image formation model, $I(x) = A(x) \cdot S(x)$ . This decomposition is foundational for applications such as relighting, material editing, and is increasingly deployed as a preprocessing step for higher-level computer vision pipelines. FlowIID introduces a paradigm shift in IID by leveraging latent flow matching in conjunction with a compact Variational Autoencoder (VAE), enabling deterministic, stable, and parameter-efficient decomposition in a single inference pass. FlowIID achieves competitive or superior accuracy relative to state-of-the-art methods with a fraction of the parameter budget, facilitating practical deployment in real-time and resource-constrained environments (Singla et al., 18 Jan 2026).

1. Problem Formulation and Single-Step Decomposition

The primary objective of intrinsic image decomposition is to retrieve the albedo and shading fields such that $I = A \cdot S$ . Traditional approaches either utilize separate networks for albedo and shading—risking output inconsistency—or predict only shading and estimate albedo by elementwise division $A = I / S$ . Modern deep IID methods often rely on multi-step diffusion networks or large, multi-branch CNNs exceeding hundreds of millions of parameters, limiting their applicability in low-latency or embedded scenarios. FlowIID circumvents these inefficiencies by directly predicting a latent representation of shading in a single forward pass through its encoder and UNet backbone. The decoded shading, together with the input image, yields albedo via $A = I / S$ , obviating the need for iterative sampling and bulky architectures while ensuring decomposition consistency.

2. Model Architecture and Workflow

The FlowIID architecture comprises four principal modules:

VAE Encoder–Decoder ( $E$ , $D$ ): The VAE operates on ground-truth shading $s_0 \in \mathbb{R}^{H\times W}$ , encoding it as $z_0 \in \mathbb{R}^{8\times H/8\times W/8}$ and reconstructing shading via $D(z_0)$ .
Image Encoder ( $\text{Enc}$ ): Six down-sampling blocks utilizing Modified Residual Blocks (MRB), extracting multi-scale features from the input image.
UNet Backbone: Two down and two up-pooling blocks with MRB and attention in middle layers, integrating encoder features and latent noise.
Latent Flow Matcher ( $u_\theta(x_t, t)$ ): Responsible for learning the vector field that transfers Gaussian noise to shading latents.

During inference, $\text{Enc}$ processes $I$ to yield a feature map of dimension $256 \times H/8 \times W/8$ , concatenated with latent noise $x_t$ . The result—a $264 \times H/8 \times W/8$ tensor—enters the UNet backbone. Skip connections inject intermediate encoder outputs into corresponding UNet layers. The UNet, guided by the latent flow matcher, produces a latent shading code $\hat{z}_1$ , decoded by $D$ to image space ( $\hat{S}$ ), and albedo is recovered as $A = I / \hat{S}$ .

3. Latent Flow Matching: Mathematical and Training Foundations

Flow matching is formulated as learning a time-continuous vector field $v_t$ that advects samples from a simple Gaussian distribution $p_0$ to complex latent targets $p_1$ (shading codes). Specifically, for $t \in [0,1]$ :

ODE: $\mathrm{d}x_t = v_t(x_t)\,\mathrm{d}t$
Training Loss:

$\mathcal{L}_{\mathrm{flow}} = \mathbb{E}_{t\sim U[0,1],\,x_t}\,\|u_\theta(x_t, t) - v_t\|_2^2$

where,

$x_t = (1 - (1-\sigma_{\min})t)x_0 + t x_1,\quad v_t = x_1 - (1 - \sigma_{\min})x_0$

At inference, $x_0 \sim \mathcal{N}(0, I)$ is numerically integrated using a single Euler step, generating $\hat{z}_1$ for decoding.

4. VAE Latent Encoding, Decoding, and Loss Functions

VAE training on shading $s_0$ involves encoding to $z_0 = E(s_0)$ and decoding via $D(z_0)$ . The objective comprises:

Reconstruction Loss:

$\mathcal{L}_{\mathrm{rec}} = \|\hat{s}_0 - s_0\|_2^2$

Perceptual Loss: $\mathcal{L}_{\mathrm{perc}}$ (VGG-based feature loss)
KL Divergence: $\mathcal{L}_{\mathrm{KL}}$
Adversarial Loss: $\mathcal{L}_{\mathrm{adv}}$ (lightweight discriminator)

Total loss for the first 90 epochs (no adversary):

$\mathcal{L}_{\mathrm{VAE}} = \mathcal{L}_{\mathrm{rec}} + 0.005\,\mathcal{L}_{\mathrm{KL}} + \mathcal{L}_{\mathrm{perc}}$

For the subsequent 200 epochs (with adversarial tuning):

$\mathcal{L}_{\mathrm{VAE}} = \mathcal{L}_{\mathrm{rec}} + 0.005\,\mathcal{L}_{\mathrm{KL}} + \mathcal{L}_{\mathrm{perc}} + 0.1\,\mathcal{L}_{\mathrm{adv}}$

5. Parameter Efficiency and Comparative Analysis

FlowIID achieves substantial gains in parameter efficiency:

Model	Parameters (Millions) at Inference
FlowIID	51.7 (58.4 incl. VAEGAN training)
Niid-Net	273.1
Careaga & Aksoy (Intrinsic)	252
Careaga & Aksoy (Colorful)	548
RGB⇆X diffusion	1,280

Despite an order-of-magnitude size reduction, FlowIID matches or surpasses the performance of far heavier models (Singla et al., 18 Jan 2026).

6. Quantitative and Qualitative Performance

On the MIT Intrinsic dataset, FlowIID sets benchmark records for both albedo and shading:

Component	MSE	LMSE	DSSIM
Albedo	0.0040	0.0043	0.0435
Shading	0.0109	0.0119	0.0823

On the ARAP dataset (no ARAP-specific finetuning):

Component	LMSE	RMSE	SSIM
Albedo	0.021	0.108	0.760
Shading	0.022	0.132	0.744

Qualitative side-by-side comparisons with Lettry et al., Niid-Net, and Careaga & Aksoy indicate albedo outputs with preserved color fidelity and low texture bleeding, and shading maps displaying smooth, spatially consistent illumination. This suggests robust separation of reflectance and illumination cues even under compact architectural constraints.

7. Ablation Studies and Design Tradeoffs

Ablation analysis on ARAP demonstrates:

Removing concatenation of encoder features to the UNet input increases albedo LMSE to 0.0242 and decreases SSIM to 0.744.
Increasing UNet depth from four to five MRBs (adding 7.6 million parameters) yields no consistent improvement.

The full model—four MRBs with encoder–UNet concatenation—offers optimal parameter efficiency and best empirical results.

8. Deployment Scenarios and Applications

FlowIID’s single-step, low-parameter inference is well-matched to:

Real-time relighting in mobile AR and game engines
Material editing on embedded and resource-constrained systems
Preprocessing for vision in robotics and autonomous platforms

A plausible implication is increased practical adoption of IID as a standard preprocessing step in low-latency and embedded vision pipelines, given FlowIID’s balance of decomposition fidelity, consistency, and computational footprint.

Markdown Upgrade to Chat

References (1)

FlowIID: Single-Step Intrinsic Image Decomposition via Latent Flow Matching (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to FlowIID.

FlowIID: Efficient Intrinsic Image Decomposition

1. Problem Formulation and Single-Step Decomposition

2. Model Architecture and Workflow

3. Latent Flow Matching: Mathematical and Training Foundations

4. VAE Latent Encoding, Decoding, and Loss Functions

5. Parameter Efficiency and Comparative Analysis

6. Quantitative and Qualitative Performance

7. Ablation Studies and Design Tradeoffs

8. Deployment Scenarios and Applications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

FlowIID: Efficient Intrinsic Image Decomposition

1. Problem Formulation and Single-Step Decomposition

2. Model Architecture and Workflow

3. Latent Flow Matching: Mathematical and Training Foundations

4. VAE Latent Encoding, Decoding, and Loss Functions

5. Parameter Efficiency and Comparative Analysis

6. Quantitative and Qualitative Performance

7. Ablation Studies and Design Tradeoffs

8. Deployment Scenarios and Applications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research