Papers
Topics
Authors
Recent
Search
2000 character limit reached

NCUP: Normalized Convolution Upsampler

Updated 1 January 2026
  • NCUP is a novel upsampling technique that frames the task as a sparse interpolation problem using forward mapping and normalized convolution.
  • It integrates a lightweight weight estimation network with a U-Net architecture to selectively inpaint missing flow information while maintaining detail.
  • Experimental results demonstrate that NCUP achieves state-of-the-art performance, reducing AEPE and preserving edge fidelity with significantly fewer parameters.

The Normalized Convolution UPsampler (NCUP) is a parameter-efficient, joint upsampling technique for optical flow estimation networks that formulates upsampling as a sparse interpolation problem and solves it using normalized convolutional neural networks. NCUP is designed to produce full-resolution optical flow predictions during training and inference, allowing for the preservation of fine-scale motion details while avoiding the blurring, edge leakage, and semantic region mixing commonly associated with standard bilinear interpolation. The method integrates with both coarse-to-fine and recurrent optical flow architectures, achieves state-of-the-art performance, and generalizes robustly across datasets (Eldesokey et al., 2021).

1. Mathematical Foundations of Normalized Convolution

The upsampling problem is framed as learning a mapping θ\theta that transforms a low-resolution 2D flow field, ILRI_{LR}, into a dense high-resolution field, IHRI_{HR}, with support from ancillary guidance gg (e.g., image RGB or CNN features). Instead of traditional backward (e.g., bilinear) interpolation, NCUP applies a forward mapping by projecting each ILR(x,y)I_{LR}(x',y') sample to its nearest high-resolution integer coordinates (x,y)=(round(sx),round(sy))(x,y) = (\operatorname{round}(s x'), \operatorname{round}(s y')) for upsampling factor ss.

This projection yields a sparse high-resolution grid I~HRRH×W×C\tilde{I}_{HR} \in \mathbb{R}^{H \times W \times C} and an associated confidence (weight) map w0[0,1]H×W×Cw^0 \in [0,1]^{H \times W \times C}, which is nonzero only where samples are available. A cascade of LL normalized convolution layers then fills in missing entries. Each layer \ell computes: IHR(x)=mIHR1(xm)w1(xm)a(m)mw1(xm)a(m)I_{HR}^\ell(x) = \frac{\sum_m I_{HR}^{\ell-1}(x-m) \cdot w^{\ell-1}(x-m) \cdot a^\ell(m)}{\sum_m w^{\ell-1}(x-m) \cdot a^\ell(m)}

w(x)=mw1(xm)a(m)ma(m)w^\ell(x) = \frac{\sum_m w^{\ell-1}(x-m) \cdot a^\ell(m)}{\sum_m a^\ell(m)}

where a(m)a^\ell(m) is a learned r×rr \times r kernel (typically r=3r=3), shared spatially and per-channel. This confidence-weighted interpolation framework maintains data fidelity and prevents estimator drift in regions with low confidence.

2. Sparse Interpolation Formulation

NCUP’s distinctive approach interprets upsampling as sparse interpolation. Bilinear interpolation, a backward mapping, guarantees full density at the cost of blurred edges and motion mixing across object boundaries. In contrast, the forward-map and sparse-interpolate scheme ensures that known low-res flow values align precisely with their correct high-res locations, leaving large holes with zero confidence for the network to inpaint. The normalized convolution mechanism uses the explicit confidence map to selectively guide interpolation, restricting information propagation to semantically coherent regions and adaptively respecting localized boundaries.

3. NCUP Network Architecture

NCUP consists of two specialized submodules:

  • (A) Weight Estimation Network Φ\Phi: This lightweight pixelwise MLP takes as input a concatenation of the low-resolution flow ILRI_{LR} and guidance gLRg_{LR} (either RGB or deep features). It comprises two 3×33 \times 3 convolutions with batch normalization and ReLU activations, followed by a 1×11 \times 1 convolution with a sigmoid activation. Output is wLR(0,1)H/s×W/s×2w_{LR} \in (0,1)^{H/s \times W/s \times 2}, subsequently forward-mapped to wHR0w^0_{HR}.
    • Typical channel settings are (16,8)(16,8) for RGB guidance and (64,32)(64,32) for deep guidance.
  • (B) Normalized-Convolution U-Net: This backbone operates on the high-resolution sparse grid, consuming IHR0I^0_{HR} and wHR0w^0_{HR}. It consists of a two-scale (one down, one up) U-Net with all convolutions as normalized 3×33 \times 3, pooling performed via confidence normalization (not max-pooling), and skip connections. The interpolation network contains only 224 parameters. Combined with Φ\Phi, the upsampler totals approximately 2k parameters.

4. Integration into Optical Flow Networks

4.1 Coarse-to-Fine Networks (e.g., PWC-Net, FlowNetS)

In these pipelines, flow is typically predicted at ¼ resolution with multi-scale 2\ell_2 loss supervision and upsampled via bilinear interpolation at test time. With NCUP, the upsampling module is attached at ¼ resolution. At each training iteration, NCUP upsamples to full resolution and a new term is added to the pyramid loss: L=p{1,3,4,5,6,7}αpfpfGTp22L = \sum_{p \in \{1,3,4,5,6,7\}} \alpha_p \|f^p - f^p_{GT}\|_2^2 using α1=0.02\alpha_1 = 0.02 for full-res supervision. This enables end-to-end training and propagates full-res gradients throughout the network.

4.2 Recurrent Networks (e.g., RAFT)

RAFT’s published implementation uses a 3×3 “convex combination” upsampler (∼500k parameters). NCUP replaces this by forward-mapping from ⅛ to ¼ res followed by NCUP from ¼ to full, guided by both the low-res flow and the recurrent state. All other training details match the standard RAFT schedule.

5. Quantitative Performance and Comparison

NCUP demonstrates marked improvements in endpoint error at lower computational cost:

Method AEPE (PWC-Net, FlyingChairs) Upsampler Parameter Count
Bilinear 1.58 (−6.5%)
DJIF 1.51 (−10.6%) 56k
PAC 1.50 (−11.2%) 183k
ConvComb 1.52 (−10.0%) 44k
NCUP 1.46 (−13.6%) 2k

On FlowNetS (FlyingChairs), NCUP achieves 2.13 AEPE (−15.8%) versus baseline 2.53. For RAFT, training on Chairs+Things, NCUP reduces KITTI AEPE from 5.04 to 4.83, with a 6.3% drop in Sintel “Final” pass. After full finetuning, NCUP improves Sintel Final test AEPE from 2.86 to 2.69, with virtually no impact on KITTI test scores, despite 7.5% fewer upsampler parameters.

Runtime overhead of NCUP is <5 ms/frame (1024×436 inputs, 1080 Ti), matching bilinear interpolation and outpacing DJIF and PAC when backpropagation is enabled. The following table summarizes key metrics:

Upsampler Params (PWC) AEPE (Chairs, PWC) Params (RAFT) AEPE (Sintel Final, RAFT)
Bilinear 1.58
ConvComb 44k 1.52 500k 2.86
NCUP 2k 1.46 100k 2.69

6. Ablation Studies and Learned Behaviors

Ablation reveals the following:

  • Weight Estimation: Sigmoid activation yields lowest AEPE (1.46); replacing it with Softplus increases AEPE to 1.48. Using full-res guidance for weight prediction degrades AEPE (1.75) and causes memory issues. Flow input is essential in Φ\Phi; omitting it worsens AEPE to 1.52.
  • Interpolation Network: Adding an extra scale slightly worsens AEPE (1.49). Using max-pooling instead of confidence pooling gives only marginally higher AEPE (1.48).
  • Loss Weighting: Best results use α1=0.02\alpha_1=0.02 for full-res loss; significant deviations yield worse AEPE (∼1.48).
  • Learned Weights: Predicted w0w^0 maps concentrate near image edges and fine structures, segmenting object regions and enabling edge-aware interpolation. In flat regions, weights are uniform and interpolation tends to averaging.

7. Mechanisms for Preserving Flow Detail

NCUP’s explicit use of confidence maps allows each normalized convolution layer to enforce data fidelity where confidence is high and rely on interpolation elsewhere. Its multi-scale U-Net structure introduces both local and moderately expansive receptive fields for correcting localized and extended flow artifacts. End-to-end integration ensures feature extraction remains sensitive to full-resolution supervision, channeling loss gradients via NCUP to optimize the entire network stack. This synergy yields sharper, artifact-minimal, and semantically faithful flow fields with consistent 4–14% reductions in endpoint error, while using orders of magnitude fewer parameters compared to prior upsamplers (Eldesokey et al., 2021).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Prompt Upsampler.