Papers
Topics
Authors
Recent
Search
2000 character limit reached

SRFlow: Splatting Rasterization Flow

Updated 17 January 2026
  • The paper introduces a high-resolution facial optical flow dataset and novel regularization strategies leveraging 3D Gaussian splatting for improved facial motion estimation.
  • The method employs multi-view capture and FLAME alignment to generate dense ground truth flow fields, achieving up to a 42% reduction in endpoint error over baselines.
  • SRFlowNet integrates mask, gradient, total variation, and flow difference losses to enhance consistency and reliability in challenging micro-expression recognition scenarios.

Splatting Rasterization Flow (SRFlow) defines a high-resolution facial optical flow dataset and a supporting model regularization framework that leverages 3D Gaussian splatting in differentiable rasterization. The framework targets facial motion analysis tasks, specifically addressing the lack of high-fidelity optical flow ground truth in unconstrained, high-resolution face video, which has hindered progress in both core flow estimation and micro-expression recognition. SRFlow introduces both a carefully constructed data resource and a set of regularization strategies for model training, improving consistency, denoising, and reliability of flow fields in challenging facial motion scenarios (Zhang et al., 10 Jan 2026).

1. Dataset Creation and Ground Truth Generation

SRFlow's dataset construction employs multi-view dynamic facial video from the NeRSemble apparatus, which comprises 16 calibrated cameras and 4,700 synchronized high-frame-rate sequences from 222 subjects. Representative coverage focuses on 27 subjects (157 male, 65 female), spanning a heterogeneous distribution of ages and ethnicities. Motion capture includes a diverse set of facial expressions, speech, hair and head motion.

3D facial geometry is reconstructed via FLAME parametric mesh alignment, generating input for the GaussianAvatar representation. Each face is parametrized as approximately 20,000 continuous 3D Gaussians, each associated with a mesh triangle. Color is encoded via spherical harmonics alongside explicit density and opacity. Differentiable rendering is achieved through alpha-blending:

C=i=1nciαij<i(1αj)C = \sum_{i=1}^n c_i \alpha_i' \prod_{j<i}(1-\alpha_j')

To construct ground truth dense optical flow, per-Gaussian center coordinates Pw=(Xw,Yw,Zw,1)P_w = (X_w, Y_w, Z_w, 1)^\top are projected using camera extrinsics [R|T] and intrinsics. Displacements (Δui,Δvi)(\Delta u_i, \Delta v_i) between consecutive frames are composited via alpha-blending:

Ooptical=[ΔU,ΔV]T=i=1n[Δui,Δvi]Tαij<i(1αj)O_\text{optical} = [\Delta U, \Delta V]^T = \sum_{i=1}^n [\Delta u_i, \Delta v_i]^T \alpha_i' \prod_{j<i}(1-\alpha_j')

This yields high-resolution (up to 3208×22003208 \times 2200) ground truth flow fields. The dataset comprises 11,161 image pairs, split into 6791 (train), 1212 (val), and 3158 (test). Augmentations include random rendering rotations, randomized training crops (800×512800\times512), horizontal flips, and color jitter.

2. Model Architecture: SRFlowNet

SRFlowNet, the regularization framework for flow estimation, is built upon the SKFlow backbone, selected following benchmarks of multiple RAFT-style networks (RAFT, GMA, SKFlow, MemFlow, DPFlow, RPKNet) on SRFlow. The pipeline is:

I1,I2shared encoder4-level feature pyramidscost-volume correlationrecurrent update module (4 GRU iterations)multi-stage flow outputs f0,f1,...,fn1I_1, I_2 \rightarrow \text{shared encoder} \rightarrow \text{4-level feature pyramids} \rightarrow \text{cost-volume correlation} \rightarrow \text{recurrent update module (4 GRU iterations)} \rightarrow \text{multi-stage flow outputs}~f^0, f^1, ..., f^{n-1}

At each stage ii, the network predicts fif^i, supervised by endpoint error and additional regularization terms. No splatting-guidance module is used at inference; splatting is only for ground truth and mask generation during training.

3. Splatting Rasterization Guidance

SRFlow leverages the concept of Gaussian splatting. Each 3D Gaussian, when projected, defines a spatially smooth contribution to both image color and flow:

G(x,y)=12πσ2exp((xμx)2+(yμy)22σ2)G(x, y) = \frac{1}{2\pi \sigma^2}\exp \left( -\frac{(x-\mu_x)^2 + (y-\mu_y)^2}{2\sigma^2} \right)

Projection and compositing occur in depth-sorted order for both color and displacement. This compositing defines a per-pixel binary mask Mbg(x,y)M_{bg}(x,y) (set to 1 if any Gaussian covers (x,y)(x, y); else 0), marking flow-confidence regions. Extended masks MtotalM_\text{total}, incorporating spatial gradients of MbgM_{bg}, are used to focus regularization at motion boundaries.

4. Regularization Strategies

SRFlowNet introduces four regularization losses in addition to standard endpoint error (EPE):

  • Mask-based Loss (LmaskL_\text{mask}):

Lmask=i=0n1Mbg(fifi,)22L_\text{mask} = \sum_{i=0}^{n-1} \| M_{bg} \odot (f^i - f^{i,*}) \|_2^2

Restricts supervision to valid-flow regions.

  • Gradient-based Loss (LgradL_{grad}):

Lgrad=i=0n1(Mbgfi)(Mbgfi,)1L_{grad} = \sum_{i=0}^{n-1} \| \nabla(M_{bg} \odot f^i) - \nabla(M_{bg} \odot f^{i,*}) \|_1

Gradients computed via normalized Sobel filters:

Kx=18[10+1 20+2 10+1],  Ky=KxK_x = \frac{1}{8} \begin{bmatrix} -1 & 0 & +1\ -2 & 0 & +2\ -1 & 0 & +1 \end{bmatrix} ,~~ K_y = K_x^\top

  • Total Variation Regularization (TVR):

R(c)=1HWx,y(xc+yc),LTVR=λNiγni1(R(ui)+R(vi))R(c) = \frac{1}{HW} \sum_{x, y} (|\nabla_x c| + |\nabla_y c|),\quad L_{TVR} = \lambda_N \sum_i \gamma^{n-i-1} (R(u^i) + R(v^i))

  • Flow Difference Regularization (FDR):

Dx(f)=f(x,y+1)f(x,y), Dy(f)=f(x+1,y)f(x,y)D_x(f) = f(x, y+1) - f(x, y),~D_y(f) = f(x+1, y) - f(x, y)

LFDR=λNiγni1(Dx(fi)Mbgx+Dy(fi)Mbgy)L_{FDR} = \lambda_N \sum_i \gamma^{n-i-1} (D_x(f^i) \odot M_{bg}^x + D_y(f^i) \odot M_{bg}^y)

  • MIGAR and IGVAR: Losses that modulate regularization according to image gradient magnitude or variance, e.g., via:

GI1(x,y)=Sobel(I1);   base =exp{1HWGI1},  w(x,y)=baseGI1(x,y)G_{I_1}(x, y) = \text{Sobel}(I_1)\,;\;\text{ base } = \exp\{\frac{1}{HW}\sum G_{I_1}\},\; w(x, y) = \text{base}^{-G_{I_1}(x, y)}

LMIGAR=iγni1Rwpp(ui)+Rwpp(vi)L_{MIGAR} = \sum_i \gamma^{n-i-1} R_{wpp}(u^i) + R_{wpp}(v^i)

IGVAR uses gradient variance within masked regions as weighting.

These losses collectively address noise suppression in texture-less and repetitive-pattern regions while preserving sharp boundaries and subtle facial motion.

5. Evaluation, Metrics, and Results

SRFlow and SRFlowNet are evaluated using standard endpoint error (EPE):

EPE=1Ni=1Nfifi2EPE = \frac{1}{N} \sum_{i=1}^N \| f_i - f_i^* \|_2

and macro/micro-averaged F1-score on composite micro-expression benchmarks. Key empirical results:

  • Pretrained SKFlow on generic data: EPE = 0.5081
  • SKFlow + SRFlow retrained: EPE = 0.3998 (21% reduction)
  • Best adaptation (MemFlow+SRFlow): EPE = 0.2953 (42% reduction)
  • Composite micro-expression F1:
    • SKFlow baseline: 0.4733
    • SRFlowNet (TVR): 0.6947 (48% improvement)

Qualitative results indicate smoother, more coherent flow fields, especially around eyes and mouth, compared with RAFT, FlowNet, and SSA architectures.

Method EPE Macro-F1
SKFlow (baseline) 0.5081 0.4733
SKFlow + SRFlow 0.3998
MemFlow + SRFlow 0.2953
SRFlowNet (TVR) 0.6947
SRFlowNet (IGVAR) 0.6912

6. Ablation and Analysis

Ablation studies performed on the SRFlow test set with the SKFlow backbone demonstrate the effect of individual regularizers:

  • TVR increases EPE ($0.4056$) but boosts F1-score ($0.6947$).
  • FDR yields the best EPE ($0.3946$), while IGVAR and MIGAR perform comparably. A plausible implication is that choice of regularizer may be tuned for specific application domains: TVR for downstream micro-expression recognition, FDR or MIGAR for pixelwise flow accuracy.

Visualizations confirm SRFlowNet variants yield denoised and structurally consistent flow fields, particularly in previously ambiguous or low-texture regions.

7. Limitations and Future Directions

Limitations include the granularity mismatch between high-resolution SRFlow ground truth (2200×32082200 \times 3208) and typical low-resolution micro-expression inputs (112×112112 \times 112). Regularizer selection is performed per backbone, without exploring combinations; stacking multiple regularizers is cautioned due to the risk of oversmoothing.

Future research avenues include:

  • High-resolution micro-expression video dataset construction,
  • Micro-expression recognition models with architectures capable of leveraging SRFlow-level flow detail,
  • Hybrid use of 3D Gaussian motion during inference,
  • Adaptive or combined regularization schemes for optimally balancing smoothness and detail.

SRFlow establishes a new standard for both facial optical flow annotation and model supervision, with demonstrated substantial impact on downstream facial motion understanding (Zhang et al., 10 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Splatting Rasterization Flow (SRFlow).