SRFlow: Splatting Rasterization Flow
- The paper introduces a high-resolution facial optical flow dataset and novel regularization strategies leveraging 3D Gaussian splatting for improved facial motion estimation.
- The method employs multi-view capture and FLAME alignment to generate dense ground truth flow fields, achieving up to a 42% reduction in endpoint error over baselines.
- SRFlowNet integrates mask, gradient, total variation, and flow difference losses to enhance consistency and reliability in challenging micro-expression recognition scenarios.
Splatting Rasterization Flow (SRFlow) defines a high-resolution facial optical flow dataset and a supporting model regularization framework that leverages 3D Gaussian splatting in differentiable rasterization. The framework targets facial motion analysis tasks, specifically addressing the lack of high-fidelity optical flow ground truth in unconstrained, high-resolution face video, which has hindered progress in both core flow estimation and micro-expression recognition. SRFlow introduces both a carefully constructed data resource and a set of regularization strategies for model training, improving consistency, denoising, and reliability of flow fields in challenging facial motion scenarios (Zhang et al., 10 Jan 2026).
1. Dataset Creation and Ground Truth Generation
SRFlow's dataset construction employs multi-view dynamic facial video from the NeRSemble apparatus, which comprises 16 calibrated cameras and 4,700 synchronized high-frame-rate sequences from 222 subjects. Representative coverage focuses on 27 subjects (157 male, 65 female), spanning a heterogeneous distribution of ages and ethnicities. Motion capture includes a diverse set of facial expressions, speech, hair and head motion.
3D facial geometry is reconstructed via FLAME parametric mesh alignment, generating input for the GaussianAvatar representation. Each face is parametrized as approximately 20,000 continuous 3D Gaussians, each associated with a mesh triangle. Color is encoded via spherical harmonics alongside explicit density and opacity. Differentiable rendering is achieved through alpha-blending:
To construct ground truth dense optical flow, per-Gaussian center coordinates are projected using camera extrinsics [R|T] and intrinsics. Displacements between consecutive frames are composited via alpha-blending:
This yields high-resolution (up to ) ground truth flow fields. The dataset comprises 11,161 image pairs, split into 6791 (train), 1212 (val), and 3158 (test). Augmentations include random rendering rotations, randomized training crops (), horizontal flips, and color jitter.
2. Model Architecture: SRFlowNet
SRFlowNet, the regularization framework for flow estimation, is built upon the SKFlow backbone, selected following benchmarks of multiple RAFT-style networks (RAFT, GMA, SKFlow, MemFlow, DPFlow, RPKNet) on SRFlow. The pipeline is:
At each stage , the network predicts , supervised by endpoint error and additional regularization terms. No splatting-guidance module is used at inference; splatting is only for ground truth and mask generation during training.
3. Splatting Rasterization Guidance
SRFlow leverages the concept of Gaussian splatting. Each 3D Gaussian, when projected, defines a spatially smooth contribution to both image color and flow:
Projection and compositing occur in depth-sorted order for both color and displacement. This compositing defines a per-pixel binary mask (set to 1 if any Gaussian covers ; else 0), marking flow-confidence regions. Extended masks , incorporating spatial gradients of , are used to focus regularization at motion boundaries.
4. Regularization Strategies
SRFlowNet introduces four regularization losses in addition to standard endpoint error (EPE):
- Mask-based Loss ():
Restricts supervision to valid-flow regions.
- Gradient-based Loss ():
Gradients computed via normalized Sobel filters:
- Total Variation Regularization (TVR):
- Flow Difference Regularization (FDR):
- MIGAR and IGVAR: Losses that modulate regularization according to image gradient magnitude or variance, e.g., via:
IGVAR uses gradient variance within masked regions as weighting.
These losses collectively address noise suppression in texture-less and repetitive-pattern regions while preserving sharp boundaries and subtle facial motion.
5. Evaluation, Metrics, and Results
SRFlow and SRFlowNet are evaluated using standard endpoint error (EPE):
and macro/micro-averaged F1-score on composite micro-expression benchmarks. Key empirical results:
- Pretrained SKFlow on generic data: EPE = 0.5081
- SKFlow + SRFlow retrained: EPE = 0.3998 (21% reduction)
- Best adaptation (MemFlow+SRFlow): EPE = 0.2953 (42% reduction)
- Composite micro-expression F1:
- SKFlow baseline: 0.4733
- SRFlowNet (TVR): 0.6947 (48% improvement)
Qualitative results indicate smoother, more coherent flow fields, especially around eyes and mouth, compared with RAFT, FlowNet, and SSA architectures.
| Method | EPE | Macro-F1 |
|---|---|---|
| SKFlow (baseline) | 0.5081 | 0.4733 |
| SKFlow + SRFlow | 0.3998 | – |
| MemFlow + SRFlow | 0.2953 | – |
| SRFlowNet (TVR) | – | 0.6947 |
| SRFlowNet (IGVAR) | – | 0.6912 |
6. Ablation and Analysis
Ablation studies performed on the SRFlow test set with the SKFlow backbone demonstrate the effect of individual regularizers:
- TVR increases EPE ($0.4056$) but boosts F1-score ($0.6947$).
- FDR yields the best EPE ($0.3946$), while IGVAR and MIGAR perform comparably. A plausible implication is that choice of regularizer may be tuned for specific application domains: TVR for downstream micro-expression recognition, FDR or MIGAR for pixelwise flow accuracy.
Visualizations confirm SRFlowNet variants yield denoised and structurally consistent flow fields, particularly in previously ambiguous or low-texture regions.
7. Limitations and Future Directions
Limitations include the granularity mismatch between high-resolution SRFlow ground truth () and typical low-resolution micro-expression inputs (). Regularizer selection is performed per backbone, without exploring combinations; stacking multiple regularizers is cautioned due to the risk of oversmoothing.
Future research avenues include:
- High-resolution micro-expression video dataset construction,
- Micro-expression recognition models with architectures capable of leveraging SRFlow-level flow detail,
- Hybrid use of 3D Gaussian motion during inference,
- Adaptive or combined regularization schemes for optimally balancing smoothness and detail.
SRFlow establishes a new standard for both facial optical flow annotation and model supervision, with demonstrated substantial impact on downstream facial motion understanding (Zhang et al., 10 Jan 2026).