Adaptive Polyphase Upsampling
- Adaptive Polyphase Upsampling (APS-U) is a non-linear upsampling method that restores subgrid alignment to achieve perfect shift equivariance in CNN encoder–decoder models.
- It reintroduces stored polyphase offsets from the downsampling phase, overcoming misalignment issues common in traditional upsampling techniques.
- Integration of APS-U in architectures like U-Net enhances image reconstruction fidelity, particularly in medical imaging, by maintaining precise spatial shifts.
Adaptive Polyphase Upsampling (APS-U) is a non-linear upsampling technique developed for convolutional neural networks (CNNs) to guarantee perfect shift equivariance, particularly in symmetric encoder–decoder architectures. APS-U is designed to complement Adaptive Polyphase Downsampling (APS-D), which provides perfect shift invariance in classification, to create architectures in which a discrete input shift results in an equivalent shift of the output, ensuring that image-to-image tasks such as medical image reconstruction are robust to input translations. Unlike conventional upsampling methods, APS-U requires precise alignment with the subgrid (“polyphase”) selection performed during downsampling, thereby resolving the canonical misalignment issue introduced by standard striding.
1. Motivation and Problem Context
Convolutional layers are inherently shift-equivariant, but strided downsampling (e.g., pooling, stride- convolutions) introduces spatial aliasing, breaking this property. In classification, Adaptive Polyphase Downsampling (APS-D) achieves shift invariance by selecting, for each downsampling patch, the polyphase component with maximal norm. However, for dense prediction or image reconstruction, mere invariance is insufficient—a shift in the input must result in a precisely shifted output. Without a mechanism to track and restore polyphase offsets dropped by the encoder, upsampling cannot reconstruct the correct output position; this gap is filled by APS-U, which reintroduces features at exactly the positions determined during APS-D, achieving perfect shift equivariance in the encoder–decoder pipeline (Chaman et al., 2021).
2. Mathematical Formulation
For stride-$2$ down/up sampling, denote a 1D signal and its two polyphase components: APS-D selects an index : where is a discrete shift.
APS-U restores the original grid offset during upsampling: with the standard zero-insertion upsampler. For general and higher dimensions, selection and reinsertion are performed per-axis.
The following equivariance proposition is central: $2$0 guaranteeing that APS-D followed by APS-U is a perfectly shift-equivariant block.
3. Implementation and Integration into Architectures
APS-D and APS-U are paired in each down–up block of U-Net and similar models. Upon downsampling, the encoder computes all $2$1 polyphase components for each spatial block, selects the one with maximal $2$2 norm, and stores the chosen phase index. The decoder then applies classical upsampling, followed by a spatial shift by the stored index, thus matching the encoder's polyphase grid. Skip connections require phase alignment via APS-U prior to merging.
No additional learnable parameters are introduced; APS-U consists of zero-insertion followed by an integer shift, and the computational cost is nearly identical to traditional methods. Boundary handling may use circular, zero, or reflection padding, but circular padding is optimal for pure equivariance tests.
| Operation | Baseline | APS-D/APS-U |
|---|---|---|
| Downsampling | Stride-$2$3, no phase record | Phase index + record |
| Upsampling | Nearest/bilinear | Zero-insert + shift |
| Added Parameters | None | None |
4. Theoretical Guarantees of Shift Equivariance
Let $2$4 denote a U-Net parameterized with APS-D and APS-U at each level. For any spatial shift $2$5,
$2$6
This is established by induction: each convolution remains shift-equivariant, and each APS-D/U pair commutes with $2$7 in the manner prescribed above (Chaman et al., 2021). If the polyphase selection process ever yields ties (identical $2$8 norms), the index is not unique, but this is a probability-zero event in practice; arbitrary or secondary-tiebreaking suffices.
5. Empirical Evaluation and Results
Experiments conducted on MRI (fastMRI) and CT (LoDoPaB-CT) datasets used U-Net (encoders: [64,128,256,512,1024]) with four APS-D/U layers. Performance was measured in equivariance (NMSE, SSIM between $2$9 and 0) and reconstruction quality (PSNR, SSIM, NMSE).
Key findings based on (Chaman et al., 2021):
- Shift Equivariance: APS-D/U yields NMSE ≈ 1–2 and SSIM = 3 (machine precision), outperforming anti-aliasing and data augmentation, both in-distribution and on out-of-distribution (ImageNet) samples.
- Reconstruction Fidelity: PSNR, SSIM, and NMSE for unshifted inputs are on par with all baselines (difference at most ±0.2 dB).
- Worst-Case PSNR Drop: Baseline U-Net may degrade by up to –4 dB; APS-D/U blocks reduce this to 4 dB.
- Out-of-Distribution Robustness: Equivariance is preserved even when applying models trained on medical images to natural images.
6. Comparison with Related Approaches
Traditional anti-aliasing (low-pass filtering), data augmentation (random shifts), and heuristic methods such as fixed polyphase pooling partially ameliorate shift inconsistency but do not guarantee exact equivariance or generalise to out-of-distribution inputs. APS-U’s nonlinearity and explicit phase restoration yield stronger guarantees. Learnable approaches such as Learnable Polyphase Upsampling (LPU) (Rojas-Gomez et al., 2022) generalize the deterministic index selection by learning phase offsets via end-to-end training, but the core principle—synchronizing down- and upsampling indices for shift equivariance—remains fundamentally shared.
| Method | Equivariance Guarantee | Parameters | Inference Cost |
|---|---|---|---|
| Bilinear/NN upsampl. | None | None | Baseline |
| LPF (low-pass) | Approximate | None | Slight incr. |
| APS-D/APS-U | Exact | None | Baseline |
| LPU (learned) | Exact (circular shift) | Logits Net | +~1% (ResNet) |
7. Limitations and Future Directions
APS-U, as described, is formulated for stride-2 down/up–sampling and requires the storage and retrieval of polyphase indices. Its extension to higher strides (5) and arbitrary lattice structures is conceptually straightforward but incurs combinatorial growth in index bookkeeping. The method presumes stride and non-overlapping patches, with extensions to dilated convolutions and other forms requiring further investigation.
Potential directions include incorporating APS-U into group-equivariant networks for applications requiring invariance or equivariance to a wider set of symmetries beyond translation, and deploying it in other dense prediction settings such as deblurring, super-resolution, or generative decoder models (Chaman et al., 2021).
References
- Truly shift-equivariant convolutional neural networks with adaptive polyphase upsampling (Chaman et al., 2021)
- Learnable Polyphase Sampling for Shift Invariant and Equivariant Convolutional Networks (Rojas-Gomez et al., 2022)