SRFlowNet: Splatting Rasterization Guided Flow

Updated 17 January 2026

The paper introduces a novel training guidance mechanism using Gaussian splatting rasterization to generate pixel-accurate, view-dependent facial optical flow supervision.
The paper employs the SKFlow backbone integrated with facial-specific regularization losses, improving flow estimation by suppressing noise and reducing large-scale errors.
The paper demonstrates state-of-the-art performance in both optical flow accuracy and micro-expression recognition, validated on a high-resolution, multi-view SRFlow dataset.

Splatting Rasterization Guided FlowNet (SRFlowNet) is a facial optical flow model designed to estimate high-resolution, fine-grained facial motion from video frames by leveraging supervision derived from 3D Gaussian splatting rasterization. Built upon the SKFlow backbone, SRFlowNet introduces a suite of facial-specific regularization losses that effectively suppress high-frequency noise and large-scale errors, particularly in texture-less or repetitive-pattern facial regions. It is trained on the SRFlow dataset, which provides pixel-accurate, high-resolution optical flow ground truth generated by projecting 3D Gaussian splats and compositing their motion contributions via depth-sorted alpha blending. The approach enables SRFlowNet to achieve state-of-the-art accuracy in both optical flow estimation and downstream micro-expression recognition tasks, particularly in scenarios requiring the capture of subtle, high-resolution facial dynamics (Zhang et al., 10 Jan 2026).

1. Gaussian Splatting Rasterization: Generation of Optical Flow Supervision

SRFlowNet is "guided" during training by pixel-level flows derived from a process termed Gaussian splatting rasterization. A "Gaussian splat" is a 3D volumetric primitive parameterized by color $c_i$ , opacity $\alpha_i$ , and a placement frame tied to a surface mesh triangle. In the reconstruction pipeline (e.g., GaussianAvatar), a human head is modeled as a dense cloud of such Gaussians, which deform with facial expressions.

The novel Flow Rasterizer extends standard splatting rendering by tracking the displacement of each splat's 3D center across two frames. The centers are projected into image space using computed extrinsic and perspective matrices, resulting in a per-splat pixel displacement $(\Delta u_i, \Delta v_i)$ . The motion field for each pixel is then composited by weighted alpha blending:

$O_\mathrm{optical}(u,v) = \sum_{i=1}^n [\Delta u_i, \Delta v_i]^T \cdot \alpha_i' \cdot \prod_{j<i}(1 - \alpha_j')$

This produces dense, high-fidelity, and view-dependent ground-truth facial flow fields, which form the SRFlow dataset’s optical flow supervision.

2. Network Architecture and Integration of Splatting Guidance

SRFlowNet adopts the SKFlow backbone without architectural modification. The SKFlow design consists of:

Encoder: A feature pyramid is constructed from input frames $I_1$ , $I_2 \in \mathbb{R}^{3 \times H \times W}$ using strided convolutions (initially $7 \times 7$ and successive $3 \times 3$ ) and skip connections. The result is a 6-level spatial pyramid with fixed channel sizes (typically 256 per level, at half-resolution downscaling).
Correlation Module: At each level, global all-pair feature correlations are computed using learned super-kernels, facilitating robust pixelwise matching.
Update Operator: An iterative GRU-like update module refines the flow estimate $f^i$ across $n$ recurrent stages (commonly $n=6$ ), leveraging the current correlation, context features, and residual flow head.
Output Decoder: Flow predictions $(u,v) \in \mathbb{R}^{2 \times H \times W}$ are produced via bilinear upsampling and convolutional refinement.

Importantly, no special rasterization module is present within SRFlowNet itself. "Guidance" refers to supervision with SRFlow’s Gaussian-splatting-derived ground truth during training rather than architectural integration.

3. Facial-Specific Regularization Losses

Beyond the standard endpoint error ( $L_\mathrm{EPE}$ ), SRFlowNet introduces four specialized regularization losses to address the unique challenges of facial flow, most notably oversmoothing and edge artifacts in low-texture facial regions. All regularizers rely on per-pixel face masks $M_{bg}$ and are applied at each update stage with a decay factor $\gamma$ and weight $\lambda_N = 0.05$ .

Total Variation Regularization (TVR): Imposes Sobel-based spatial smoothness on flow channels $u, v$ :

$R(c) = \frac{1}{HW} \sum_{x,y} (|\nabla_x c(x,y)| + |\nabla_y c(x,y)|)$

The total TV loss is aggregated across stages: $L_\mathrm{TVR}$ .

Flow Difference Regularization (FDR): Enforces axis-aligned forward difference smoothing within the facial mask, favoring conservative correction of abrupt flow transitions:

$L_\mathrm{FDR} = \lambda_N \sum_{i=0}^{n-1} \gamma^{n-i-1} [D_x(u^i) \odot M_{bg}^x + D_y(v^i) \odot M_{bg}^y]$

Mean Image Gradient Activation Regularization (MIGAR): Reweights the TV loss by the local image gradient, computed via averaged Sobel norms over $I_1$ . This adaptively penalizes flow complexity in low-texture regions, encouraging spatially adaptive regularization.
Image Gradient Variance Activation Regularization (IGVAR): Similar to MIGAR, but the base reweighting value is linked to the variance of image gradients within the face mask, enhancing adaptability to different facial appearances.

A comparative summary of the loss terms appears below:

Regularization	Smoothness Basis	Local Adaptation Mechanism
TVR	Sobel, isotropic	None
FDR	Axis differences	Facial mask only
MIGAR	Sobel, isotropic	Image gradient magnitude reweighting
IGVAR	Sobel, isotropic	Masked image gradient variance

4. Dataset Construction and Annotation Protocol

The SRFlow dataset comprises 11,161 high-resolution frame pairs ( $2200 \times 3208$ px) captured using multi-view setups (16 synchronized cameras) from 27 subjects sourced from NeRSemble. Subject sequences involve diverse facial actions, speech, and pose changes.

3D mesh alignments use VHAP, followed by mesh-to-Gaussian cloud reconstruction with GaussianAvatar. Synthetic camera perturbations mixture of front-facing and oblique views. The Flow Rasterizer generates dense, view-dependent, two-channel optical flow, pixel-level binary masks (face region), and all relevant camera parameters.

Dataset splits: 6,791 training pairs, 1,212 validation, 3,158 test.

5. Training Procedures and Implementation Details

All optical flow models (SRFlowNet and baselines) are trained using the following protocol:

Hardware: Dual RTX A6000 Ada GPUs (total 96 GB VRAM)
Batch size: 8
Input: Random $800 \times 512$ crops
Data augmentation: Random cropping only
Optimizer: Adam-style (SKFlow defaults)
Learning rate: $1.25 \times 10^{-4}$ , 45 epochs without decay

For downstream micro-expression recognition (Off-TANet), composite datasets (SAMM, CASME II, SMIC) are used. Optical-strain maps computed between onset/apex frames are input at $112 \times 112$ crop size. Training lasts up to 200 epochs, sampling the maximum cross-validation average.

6. Quantitative Evaluation and Performance

Optical Flow Evaluation: Metrics include end-point error (EPE, px), px1/px3/px5 accuracy (fraction with flow error $<$ 1/3/5 px), weighted F1 (F1-ALL, lower is better), and WAUC (area under the EPE-threshold curve).

Model	EPE	F1-ALL	WAUC
Pretrained MemFlow	0.5081	3.0071	80.39%
Pretrained SKFlow	0.5361	3.2159	80.35%
SKFlow+SRFlow	0.3998	0.6722	83.83%
MemFlow+SRFlow	0.2953	0.3502	86.97%
DPFlow+SRFlow	0.3348	0.3961	88.80%

Micro-Expression Recognition (Off-TANet, Composite):

Model	F $_{1\mu}$	G $_M$
Baseline SKFlow	0.5906	0.3181
SKFlow+SRFlow	0.7736	0.5083
SRFlowNet-TVR	0.7781	0.5402

Relative gains: Up to 42% reduction in EPE and 48% increase in $F_{1\mu}$ demonstrate the impact of Gaussian splatting supervision and facial regularization.

7. Ablation Analysis and Empirical Observations

Ablation studies compare each regularizer in isolation (on top of SKFlow+SRFlow):

FDR achieves lowest EPE and best WAUC but may suppress subtle facial motion, slightly degrading micro-expression recognition.
MIGAR exhibits the lowest F1-ALL, indicative of strong detail preservation; however, it can over-smooth in low-texture regions.
TVR and IGVAR achieve the best balanced F1 and G $_M$ .
Qualitative: Retrained models, especially SRFlowNet-FDR and SRFlowNet-MIGAR, yield visually faithful flow fields, capturing coherent movement in lips, eyes, and eyebrows with minimal spurious edges.

A plausible implication is that spatially adaptive regularization, conditioned on structural image features, enables more reliable estimation in regions with low or repetitive texture while avoiding the introduction of artificial detail.

SRFlowNet, trained on SRFlow supervisory signals, consistently outperforms standard SKFlow and other pretrained benchmarks in fine-grained facial flow estimation and micro-expression analysis, establishing a new reference point for high-resolution, mask- and gradient-aware optical flow modeling in facial analysis (Zhang et al., 10 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

SRFlow: A Dataset and Regularization Model for High-Resolution Facial Optical Flow via Splatting Rasterization (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Splatting Rasterization Guided FlowNet (SRFlowNet).

SRFlowNet: Splatting Rasterization Guided Flow

1. Gaussian Splatting Rasterization: Generation of Optical Flow Supervision

2. Network Architecture and Integration of Splatting Guidance

3. Facial-Specific Regularization Losses

4. Dataset Construction and Annotation Protocol

5. Training Procedures and Implementation Details

6. Quantitative Evaluation and Performance

7. Ablation Analysis and Empirical Observations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

SRFlowNet: Splatting Rasterization Guided Flow

1. Gaussian Splatting Rasterization: Generation of Optical Flow Supervision

2. Network Architecture and Integration of Splatting Guidance

3. Facial-Specific Regularization Losses

4. Dataset Construction and Annotation Protocol

5. Training Procedures and Implementation Details

6. Quantitative Evaluation and Performance

7. Ablation Analysis and Empirical Observations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research