PixelDCL: Artifact-Free Upsampling
- PixelDCL is a neural module that replaces standard deconvolutions by sequentially conditioning pixel groups to enforce local dependencies and eliminate checkerboard artifacts.
- It minimizes artifacts by generating adjacent pixels with shared computational paths, reducing parameters while maintaining efficient up-sampling.
- Integrating PixelDCL into models like U-Net, VAE, and GAN improves segmentation accuracy and image generation quality with a marginal computational overhead.
A Pixel Deconvolutional Layer (PixelDCL) is a learnable neural module that replaces standard deconvolutional (also known as transposed convolution) layers for up-sampling in deep architectures. PixelDCL establishes explicit computational dependencies among adjacent pixels in the output feature map, thereby eliminating checkerboard artifacts that arise in classical deconvolutions. It achieves this by sequentially generating output pixel groups (branches), conditioning each on previously generated branches and, optionally, the original input. PixelDCL can be integrated into existing models—such as U-Net, DeepLab, VAE, and GANs—as a drop-in replacement, and supports efficient parallelized implementation with a minor computational overhead (Gao et al., 2017).
1. Problem Definition: Checkerboard Artifacts in Classical Deconvolution
Standard deconvolutions or transposed convolutions for up-sampling employ a single kernel reshaped into several smaller kernels when upsampling by a stride of 2. The output activations are constructed by interleaving the results: where indicates convolution and denotes “periodic shuffle + sum.” Each is computed independently. As a result, in , spatially adjacent output pixels may originate from independent computational branches, lacking cross-pixel dependency. This causes checkerboard patterns—regular artifacts visible across up-sampled feature maps and, by extension, in the outputs of models for segmentation and generation (Gao et al., 2017).
2. PixelDCL Formulation: Sequential Conditioning and Branch Design
PixelDCL remedies the lack of dependency by generating the output through a sequence of interdependent branches:
- iPixelDCL (input PixelDCL): Each branch is conditioned on all earlier branches as well as the original input:
This enforces a full sequential dependency structure.
- PixelDCL (simplified): Empirical results demonstrate that conditioning only on preceding branches (omitting the input from all but the first) suffices:
Branches are grouped such that adjacent output pixels share a computational path, ensuring all locality dependencies are directly encoded.
3. Architectural and Parameterization Details
The following architectural components are salient:
- Upsample factor leads to four sub-maps .
- Parameter Count: To match a deconvolution,
- DCL: one kernel ($36$ parameters)
- iPixelDCL: four kernels ( parameters)
- PixelDCL: two and two kernels (total parameters, fewer than standard)
- Stride: The method generalizes to higher upsampling factors with added branch stages.
A practical pseudocode for PixelDCL (stride=2) is:
1 2 3 4 5 6 7 |
def PixelDCL2x(X): F1 = Conv3x3(X, k1) F2 = Conv3x3(F1, k2) F3 = Conv2x2([F1, F2], k3) F4 = Conv2x2([F1, F2, F3], k4) Y = PS([F1, F2, F3, F4]) return Y |
4. Efficient Implementation via Masked Convolutions
Naïve implementation of PixelDCL as a sequential process incurs prohibitive computational cost. To mitigate this, the authors introduce an efficiency technique:
- Branch Merging: Branches 3 and 4 depend only on and are mutually independent. These can be merged into a single masked convolution (inspired by PixelCNN masking), yielding both branches simultaneously.
- Dilation: Once and are available, they can be dilated and summed to form a large map matching the output size.
- Parallelization: Steps—dilating , computing the masked convolution for , and final summation—can be executed in parallel on GPU.
- Wall-clock Time: The resulting implementation achieves up-sampling in almost the same time as a classical deconvolution, with a ~20–30% overhead.
5. Empirical Performance
Semantic Segmentation
PixelDCL and iPixelDCL were evaluated on PASCAL VOC 2012 and MSCOCO 2015 in U-Net and DeepLab-ResNet base architectures using pixel accuracy and mean IoU metrics. Results showed:
| Model | Up-sample | PASCAL12 Acc. | PASCAL12 IoU | COCO15 Acc. | COCO15 IoU |
|---|---|---|---|---|---|
| U-Net (from scratch) | DCL | 0.8162 | 0.4152 | 0.8093 | 0.3498 |
| iPixelDCL | 0.8171 | 0.4488 | 0.8092 | 0.3602 | |
| PixelDCL | 0.8226 | 0.4560 | 0.8116 | 0.3718 | |
| DeepLab-ResNet (finetune) | DCL | 0.9296 | 0.7270 | — | — |
| iPixelDCL | 0.9345 | 0.7386 | — | — | |
| PixelDCL | 0.9313 | 0.7356 | — | — |
All PixelDCL variants surpass regular deconvolutional layers in mean IoU.
Image Generation
For generative tasks (e.g., VAE decoders on CelebA), standard deconvolutions produce strong checkerboard artifacts; PixelDCL-substituted architectures yield artifact-free outputs with no degradation in negative log-likelihood and visually improved FID-style metrics.
Computational Overhead
Measured on Pascal VOC 2012 with a Tesla K40:
| Model | Training (10 epochs) | Inference (2109 images) |
|---|---|---|
| U-Net + DCL | 365 min 26 s | 2 min 42 s |
| U-Net + iPixelDCL | 511 min 19 s | 4 min 13 s |
| U-Net + PixelDCL | 464 min 31 s | 3 min 27 s |
The increase in computational demand is modest and can be minimized by masked convolution optimization (Gao et al., 2017).
6. Deployment and Practical Considerations
- Replacement: PixelDCL is a drop-in substitute for any deconvolutional layer of stride 2 or higher (additional branches are added for higher upsample factors).
- Parameter Matching: Select sub-kernel sizes such that total parameters match or undercut the baseline deconvolution.
- Optimization: Apply “masked conv” implementation for maximal efficiency.
- Performance Expectation: Slightly increased runtime in exchange for artifact-free upsampling and improved qualitative and quantitative performance.
- Integration: Direct replacement in U-Net, VAE, GAN, and DeepLab configurations eradicates checkerboard patterns without post-processing.
7. Summary and Significance
PixelDCL addresses the longstanding checkerboard artifact problem in transposed convolution layers via explicit modeling of local pixel dependencies. Through both full and simplified sequential branch conditioning, PixelDCL enforces spatially coherent up-sampling at minimal added computational and parameter cost. Empirical results demonstrate improved segmentation and generative accuracy across standard benchmarks. The design requires no architectural overhaul and leverages efficient masked convolution techniques for practical deployment (Gao et al., 2017).