Papers
Topics
Authors
Recent
Search
2000 character limit reached

Hierarchical Warping & Occlusion-Aware Noise Suppression

Updated 18 November 2025
  • The paper introduces a pyramidal coarse-to-fine architecture that efficiently captures large displacements and refines flow iteratively.
  • It employs a novel sampling-based correlation layer that bypasses interpolation artifacts, effectively mitigating ghosting in feature warping.
  • The method integrates explicit occlusion-aware cost reweighting within a shared decoder, yielding significant performance gains on benchmarks like Sintel and KITTI.

Hierarchical Warping and Occlusion-Aware Noise Suppression refers to architectural and algorithmic strategies for optical flow estimation networks, focused on addressing the challenges posed by feature warping artifacts (notably ghosting) and ambiguous matches in occluded regions. These methods are exemplified by the OAS-Net (Occlusion Aware Sampling Network), which replaces traditional warping-based correlation with a sampling-based alternative and integrates explicit occlusion-aware cost reweighting. This combination suppresses noise propagated by occlusions and interpolation, yielding superior flow estimates in challenging scenarios (Kong et al., 2021).

1. Pyramidal Coarse-to-Fine Architecture

Hierarchical (pyramidal) processing is foundational in contemporary optical flow estimation. In OAS-Net, a shared two-layer convolutional subnetwork recursively constructs 6-level feature pyramids for both input images, with each level kk representing a spatial downsampling by 2k2^k and increasing channels: [16, 32, 64, 96, 128, 160] for k=16k=1\dots 6.

Flow is estimated progressively from coarse (level 6) to fine (level 1):

  • At level kk, the flow f^k+1{\hat f}^{k+1} and occlusion map Ok+1O^{k+1} are upsampled by 2 (ufu_f, uOu_O).
  • A matching cost volume is computed using sampling-based correlation (see Section 2).
  • The raw cost volume and uOu_O feed into an occlusion-aware module, producing coakc_{oa}^k.
  • A shared decoder operates on 2k2^k0, outputting a flow residual 2k2^k1 and an updated occlusion map 2k2^k2.
  • The refined flow is 2k2^k3.

This multiscale design enables the system to efficiently capture large displacements and refine flow iteratively.

2. Sampling-Based Correlation Layer

The pivotal methodological innovation is the sampling-based correlation. Standard networks such as PWC-Net deploy feature warping—interpolating target features spatially according to the predicted flow—prior to local inner product correlation. OAS-Net, in contrast, eschews explicit warping altogether.

Correlation at each pixel 2k2^k4 and displacement offset 2k2^k5 (with 2k2^k6, 2k2^k7 by default) is computed as: 2k2^k8 where 2k2^k9 denotes channel-wise inner product.

This process samples features from the predicted target locations plus a search window, but does not physically shift or interpolate the grid. Therefore, the operation avoids introducing interpolation artifacts and local inconsistencies.

3. Ghosting and Noise in Feature Warping

Feature warping has a known pathology: ghosting. When multiple source locations are mapped to the same warped target location (frequent in occlusions or fast motions), bilinear interpolation aggregates disparate pixel values, resulting in ambiguous, duplicated features (“ghosts”). This can corrupt cost volume construction and thus flow estimation.

Sampling-based correlation addresses this by querying target features independently at specified locations; there is no many-to-one mixing. The result is a cost volume intrinsically robust to aliasing and less affected by motion boundary artifacts. The methodology never physically alters the target feature grid, which precludes the formation of local ghosts.

4. Occlusion-Aware Cost Volume Reweighting

Occluded regions are prone to unreliable matches, as true correspondences do not exist. OAS-Net introduces an explicit occlusion-awareness mechanism:

  • Each pyramid level maintains an occlusion-awareness map k=16k=1\dots 60, estimating the non-occlusion likelihood, upsampled for current use (k=16k=1\dots 61).
  • Complementary weights are defined: k=16k=1\dots 62, k=16k=1\dots 63.
  • The raw cost volume k=16k=1\dots 64 is reweighted to produce k=16k=1\dots 65 and k=16k=1\dots 66 via elementwise products.
  • Two dedicated 2D convolutions (k=16k=1\dots 67, k=16k=1\dots 68) are applied, followed by merging and leaky-ReLU activation: k=16k=1\dots 69 This splitting enables the network to learn different matching filters for visible and occluded regions, akin to a learned self-attention mechanism over the cost volume.

5. Shared Decoder for Flow and Occlusion Estimation

For architectural compactness and consistency, the same decoder is shared across all pyramid levels. This module comprises an 8-layer U-shaped sequence of kk0 convolutions (channels: [128→128→128→128→128→96→64→32]), splitting into two prediction heads:

  • A flow head—predicting 2-channel residual flow kk1
  • An occlusion head—outputting kk2 as a sigmoid map constrained to kk3

Sharing the decoder reinforces hierarchical consistency and decreases network complexity.

6. Optimization and Learning

OAS-Net is trained using a multi-scale kk4 endpoint error loss (identical to PWC-Net): kk5 Here, kk6 is the downsampled ground-truth flow at level kk7. The occlusion map kk8 is learned implicitly—no explicit ground-truth masks or occlusion-specific losses or regularizers are incorporated.

7. Empirical Performance and Impact

Ablation demonstrates the significance of both the sampling-based correlation and the occlusion module. For Sintel Final/KITTI 2012:

  • Warping, no occlusion: 4.05/4.62
  • Warping, occlusion: 3.98/4.37
  • Sampling, no occlusion: 3.86/4.44
  • Sampling, occlusion: 3.79/4.11

Switching from warping to sampling yields a 4.7% drop in Sintel Final EPE. Incorporating occlusion awareness improves KITTI by 5.4%. Combining both yields the largest improvement: 6.4% (Sintel Final) and 11.0% (KITTI 2012).

On public benchmarks, OAS-Net (6.16M parameters, 0.03 s/frame) achieves:

  • Sintel Clean test EPE: 3.65 (among best for lightweight networks)
  • Sintel Final test EPE: 5.01 (comparable to PWC-Net/IRR-PWC)
  • KITTI 2012 test EPE: 1.4 (ties state-of-the-art)

A plausible implication is that hierarchical warping avoidance combined with explicit occlusion-aware noise suppression constitutes an effective paradigm for robust and efficient optical flow estimation, particularly in lightweight network deployments and scenarios with significant occlusions and fast motions (Kong et al., 2021).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Hierarchical Warping and Occlusion-Aware Noise Suppression.