SRFlow: Splatting Rasterization Flow

Updated 17 January 2026

The paper introduces a high-resolution facial optical flow dataset and novel regularization strategies leveraging 3D Gaussian splatting for improved facial motion estimation.
The method employs multi-view capture and FLAME alignment to generate dense ground truth flow fields, achieving up to a 42% reduction in endpoint error over baselines.
SRFlowNet integrates mask, gradient, total variation, and flow difference losses to enhance consistency and reliability in challenging micro-expression recognition scenarios.

Splatting Rasterization Flow (SRFlow) defines a high-resolution facial optical flow dataset and a supporting model regularization framework that leverages 3D Gaussian splatting in differentiable rasterization. The framework targets facial motion analysis tasks, specifically addressing the lack of high-fidelity optical flow ground truth in unconstrained, high-resolution face video, which has hindered progress in both core flow estimation and micro-expression recognition. SRFlow introduces both a carefully constructed data resource and a set of regularization strategies for model training, improving consistency, denoising, and reliability of flow fields in challenging facial motion scenarios (Zhang et al., 10 Jan 2026).

1. Dataset Creation and Ground Truth Generation

SRFlow's dataset construction employs multi-view dynamic facial video from the NeRSemble apparatus, which comprises 16 calibrated cameras and 4,700 synchronized high-frame-rate sequences from 222 subjects. Representative coverage focuses on 27 subjects (157 male, 65 female), spanning a heterogeneous distribution of ages and ethnicities. Motion capture includes a diverse set of facial expressions, speech, hair and head motion.

3D facial geometry is reconstructed via FLAME parametric mesh alignment, generating input for the GaussianAvatar representation. Each face is parametrized as approximately 20,000 continuous 3D Gaussians, each associated with a mesh triangle. Color is encoded via spherical harmonics alongside explicit density and opacity. Differentiable rendering is achieved through alpha-blending:

$C = \sum_{i=1}^n c_i \alpha_i' \prod_{j<i}(1-\alpha_j')$

To construct ground truth dense optical flow, per-Gaussian center coordinates $P_w = (X_w, Y_w, Z_w, 1)^\top$ are projected using camera extrinsics [R|T] and intrinsics. Displacements $(\Delta u_i, \Delta v_i)$ between consecutive frames are composited via alpha-blending:

$O_\text{optical} = [\Delta U, \Delta V]^T = \sum_{i=1}^n [\Delta u_i, \Delta v_i]^T \alpha_i' \prod_{j<i}(1-\alpha_j')$

This yields high-resolution (up to $3208 \times 2200$ ) ground truth flow fields. The dataset comprises 11,161 image pairs, split into 6791 (train), 1212 (val), and 3158 (test). Augmentations include random rendering rotations, randomized training crops ( $800\times512$ ), horizontal flips, and color jitter.

2. Model Architecture: SRFlowNet

SRFlowNet, the regularization framework for flow estimation, is built upon the SKFlow backbone, selected following benchmarks of multiple RAFT-style networks (RAFT, GMA, SKFlow, MemFlow, DPFlow, RPKNet) on SRFlow. The pipeline is:

$I_1, I_2 \rightarrow \text{shared encoder} \rightarrow \text{4-level feature pyramids} \rightarrow \text{cost-volume correlation} \rightarrow \text{recurrent update module (4 GRU iterations)} \rightarrow \text{multi-stage flow outputs}~f^0, f^1, ..., f^{n-1}$

At each stage $i$ , the network predicts $f^i$ , supervised by endpoint error and additional regularization terms. No splatting-guidance module is used at inference; splatting is only for ground truth and mask generation during training.

3. Splatting Rasterization Guidance

SRFlow leverages the concept of Gaussian splatting. Each 3D Gaussian, when projected, defines a spatially smooth contribution to both image color and flow:

$G(x, y) = \frac{1}{2\pi \sigma^2}\exp \left( -\frac{(x-\mu_x)^2 + (y-\mu_y)^2}{2\sigma^2} \right)$

Projection and compositing occur in depth-sorted order for both color and displacement. This compositing defines a per-pixel binary mask $P_w = (X_w, Y_w, Z_w, 1)^\top$ 0 (set to 1 if any Gaussian covers $P_w = (X_w, Y_w, Z_w, 1)^\top$ 1; else 0), marking flow-confidence regions. Extended masks $P_w = (X_w, Y_w, Z_w, 1)^\top$ 2, incorporating spatial gradients of $P_w = (X_w, Y_w, Z_w, 1)^\top$ 3, are used to focus regularization at motion boundaries.

4. Regularization Strategies

SRFlowNet introduces four regularization losses in addition to standard endpoint error (EPE):

Mask-based Loss ( $P_w = (X_w, Y_w, Z_w, 1)^\top$ 4):

$P_w = (X_w, Y_w, Z_w, 1)^\top$ 5

Restricts supervision to valid-flow regions.

Gradient-based Loss ( $P_w = (X_w, Y_w, Z_w, 1)^\top$ 6):

$P_w = (X_w, Y_w, Z_w, 1)^\top$ 7

Gradients computed via normalized Sobel filters:

$P_w = (X_w, Y_w, Z_w, 1)^\top$ 8

Total Variation Regularization (TVR):

$P_w = (X_w, Y_w, Z_w, 1)^\top$ 9

Flow Difference Regularization (FDR):

$(\Delta u_i, \Delta v_i)$ 0

$(\Delta u_i, \Delta v_i)$ 1

MIGAR and IGVAR: Losses that modulate regularization according to image gradient magnitude or variance, e.g., via:

$(\Delta u_i, \Delta v_i)$ 2

$(\Delta u_i, \Delta v_i)$ 3

IGVAR uses gradient variance within masked regions as weighting.

These losses collectively address noise suppression in texture-less and repetitive-pattern regions while preserving sharp boundaries and subtle facial motion.

5. Evaluation, Metrics, and Results

SRFlow and SRFlowNet are evaluated using standard endpoint error (EPE):

$(\Delta u_i, \Delta v_i)$ 4

and macro/micro-averaged F1-score on composite micro-expression benchmarks. Key empirical results:

Pretrained SKFlow on generic data: EPE = 0.5081
SKFlow + SRFlow retrained: EPE = 0.3998 (21% reduction)
Best adaptation (MemFlow+SRFlow): EPE = 0.2953 (42% reduction)
Composite micro-expression F1:
- SKFlow baseline: 0.4733
- SRFlowNet (TVR): 0.6947 (48% improvement)

Qualitative results indicate smoother, more coherent flow fields, especially around eyes and mouth, compared with RAFT, FlowNet, and SSA architectures.

Method	EPE	Macro-F1
SKFlow (baseline)	0.5081	0.4733
SKFlow + SRFlow	0.3998	–
MemFlow + SRFlow	0.2953	–
SRFlowNet (TVR)	–	0.6947
SRFlowNet (IGVAR)	–	0.6912

6. Ablation and Analysis

Ablation studies performed on the SRFlow test set with the SKFlow backbone demonstrate the effect of individual regularizers:

TVR increases EPE ( $(\Delta u_i, \Delta v_i)$ 5) but boosts F1-score ( $(\Delta u_i, \Delta v_i)$ 6).
FDR yields the best EPE ( $(\Delta u_i, \Delta v_i)$ 7), while IGVAR and MIGAR perform comparably. A plausible implication is that choice of regularizer may be tuned for specific application domains: TVR for downstream micro-expression recognition, FDR or MIGAR for pixelwise flow accuracy.

Visualizations confirm SRFlowNet variants yield denoised and structurally consistent flow fields, particularly in previously ambiguous or low-texture regions.

7. Limitations and Future Directions

Limitations include the granularity mismatch between high-resolution SRFlow ground truth ( $(\Delta u_i, \Delta v_i)$ 8) and typical low-resolution micro-expression inputs ( $(\Delta u_i, \Delta v_i)$ 9). Regularizer selection is performed per backbone, without exploring combinations; stacking multiple regularizers is cautioned due to the risk of oversmoothing.

Future research avenues include:

High-resolution micro-expression video dataset construction,
Micro-expression recognition models with architectures capable of leveraging SRFlow-level flow detail,
Hybrid use of 3D Gaussian motion during inference,
Adaptive or combined regularization schemes for optimally balancing smoothness and detail.

SRFlow establishes a new standard for both facial optical flow annotation and model supervision, with demonstrated substantial impact on downstream facial motion understanding (Zhang et al., 10 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

SRFlow: A Dataset and Regularization Model for High-Resolution Facial Optical Flow via Splatting Rasterization (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Splatting Rasterization Flow (SRFlow).