Papers
Topics
Authors
Recent
Search
2000 character limit reached

PixelDCL: Artifact-Free Upsampling

Updated 30 March 2026
  • PixelDCL is a neural module that replaces standard deconvolutions by sequentially conditioning pixel groups to enforce local dependencies and eliminate checkerboard artifacts.
  • It minimizes artifacts by generating adjacent pixels with shared computational paths, reducing parameters while maintaining efficient up-sampling.
  • Integrating PixelDCL into models like U-Net, VAE, and GAN improves segmentation accuracy and image generation quality with a marginal computational overhead.

A Pixel Deconvolutional Layer (PixelDCL) is a learnable neural module that replaces standard deconvolutional (also known as transposed convolution) layers for up-sampling in deep architectures. PixelDCL establishes explicit computational dependencies among adjacent pixels in the output feature map, thereby eliminating checkerboard artifacts that arise in classical deconvolutions. It achieves this by sequentially generating output pixel groups (branches), conditioning each on previously generated branches and, optionally, the original input. PixelDCL can be integrated into existing models—such as U-Net, DeepLab, VAE, and GANs—as a drop-in replacement, and supports efficient parallelized implementation with a minor computational overhead (Gao et al., 2017).

1. Problem Definition: Checkerboard Artifacts in Classical Deconvolution

Standard deconvolutions or transposed convolutions for up-sampling employ a single (2s)×(2s)(2s)\times(2s) kernel WW reshaped into several smaller s×ss\times s kernels {k1,...,k4}\{k_1, ..., k_4\} when upsampling by a stride of 2. The output activations YR2h×2w×coutY\in\mathbb R^{2h\times2w\times c_{\text{out}}} are constructed by interleaving the results: Y=PS{Xk1,Xk2,Xk3,Xk4}Y = \mathcal{PS}\left\{X \circledast k_1, X \circledast k_2, X \circledast k_3, X \circledast k_4\right\} where \circledast indicates convolution and PS\mathcal{PS} denotes “periodic shuffle + sum.” Each Fi=XkiF_i = X\circledast k_i is computed independently. As a result, in YY, spatially adjacent output pixels may originate from independent computational branches, lacking cross-pixel dependency. This causes checkerboard patterns—regular artifacts visible across up-sampled feature maps and, by extension, in the outputs of models for segmentation and generation (Gao et al., 2017).

2. PixelDCL Formulation: Sequential Conditioning and Branch Design

PixelDCL remedies the lack of dependency by generating the output through a sequence of interdependent branches:

  • iPixelDCL (input PixelDCL): Each branch is conditioned on all earlier branches as well as the original input:

F1=Xk1 F2=[X,F1]k2 F3=[X,F1,F2]k3 F4=[X,F1,F2,F3]k4 Y=PS{F1,F2,F3,F4}\begin{aligned} F_1 &= X \circledast k_1\ F_2 &= [X, F_1]\circledast k_2\ F_3 &= [X, F_1, F_2]\circledast k_3\ F_4 &= [X, F_1, F_2, F_3]\circledast k_4\ Y &= \mathcal{PS}\{F_1, F_2, F_3, F_4\} \end{aligned}

This enforces a full sequential dependency structure.

  • PixelDCL (simplified): Empirical results demonstrate that conditioning only on preceding branches (omitting the input from all but the first) suffices:

F1=Xk1 F2=F1k2 F3=[F1,F2]k3 F4=[F1,F2,F3]k4 Y=PS{F1,F2,F3,F4}\begin{aligned} F_1 &= X \circledast k_1\ F_2 &= F_1 \circledast k_2\ F_3 &= [F_1, F_2] \circledast k_3\ F_4 &= [F_1, F_2, F_3] \circledast k_4\ Y &= \mathcal{PS}\{F_1, F_2, F_3, F_4\} \end{aligned}

Branches are grouped such that adjacent output pixels share a computational path, ensuring all locality dependencies are directly encoded.

3. Architectural and Parameterization Details

The following architectural components are salient:

  • Upsample factor r=2r=2 leads to four sub-maps F1,F2,F3,F4F_1, F_2, F_3, F_4.
  • Parameter Count: To match a 6×66 \times 6 deconvolution,
    • DCL: one 6×66 \times 6 kernel ($36$ parameters)
    • iPixelDCL: four 3×33 \times 3 kernels (4×9=364 \times 9 = 36 parameters)
    • PixelDCL: two 3×33 \times 3 and two 2×22 \times 2 kernels (total 2×9+2×4=262 \times 9 + 2 \times 4 = 26 parameters, fewer than standard)
  • Stride: The method generalizes to higher upsampling factors with added branch stages.

A practical pseudocode for PixelDCL (stride=2) is:

1
2
3
4
5
6
7
def PixelDCL2x(X):
    F1 = Conv3x3(X, k1)
    F2 = Conv3x3(F1, k2)
    F3 = Conv2x2([F1, F2], k3)
    F4 = Conv2x2([F1, F2, F3], k4)
    Y  = PS([F1, F2, F3, F4])
    return Y
The design ensures that the adjacency structure in the upsampled output is respected by the convolutional computation (Gao et al., 2017).

4. Efficient Implementation via Masked Convolutions

Naïve implementation of PixelDCL as a sequential process incurs prohibitive computational cost. To mitigate this, the authors introduce an efficiency technique:

  • Branch Merging: Branches 3 and 4 depend only on F1,F2F_1, F_2 and are mutually independent. These can be merged into a single masked 3×33\times3 convolution (inspired by PixelCNN masking), yielding both branches simultaneously.
  • Dilation: Once F1F_1 and F2F_2 are available, they can be dilated and summed to form a large map matching the output size.
  • Parallelization: Steps—dilating F1,F2F_1, F_2, computing the masked convolution for F3,F4F_3, F_4, and final summation—can be executed in parallel on GPU.
  • Wall-clock Time: The resulting implementation achieves up-sampling in almost the same time as a classical deconvolution, with a ~20–30% overhead.

5. Empirical Performance

Semantic Segmentation

PixelDCL and iPixelDCL were evaluated on PASCAL VOC 2012 and MSCOCO 2015 in U-Net and DeepLab-ResNet base architectures using pixel accuracy and mean IoU metrics. Results showed:

Model Up-sample PASCAL12 Acc. PASCAL12 IoU COCO15 Acc. COCO15 IoU
U-Net (from scratch) DCL 0.8162 0.4152 0.8093 0.3498
iPixelDCL 0.8171 0.4488 0.8092 0.3602
PixelDCL 0.8226 0.4560 0.8116 0.3718
DeepLab-ResNet (finetune) DCL 0.9296 0.7270
iPixelDCL 0.9345 0.7386
PixelDCL 0.9313 0.7356

All PixelDCL variants surpass regular deconvolutional layers in mean IoU.

Image Generation

For generative tasks (e.g., VAE decoders on CelebA), standard deconvolutions produce strong checkerboard artifacts; PixelDCL-substituted architectures yield artifact-free outputs with no degradation in negative log-likelihood and visually improved FID-style metrics.

Computational Overhead

Measured on Pascal VOC 2012 with a Tesla K40:

Model Training (10 epochs) Inference (2109 images)
U-Net + DCL 365 min 26 s 2 min 42 s
U-Net + iPixelDCL 511 min 19 s 4 min 13 s
U-Net + PixelDCL 464 min 31 s 3 min 27 s

The increase in computational demand is modest and can be minimized by masked convolution optimization (Gao et al., 2017).

6. Deployment and Practical Considerations

  • Replacement: PixelDCL is a drop-in substitute for any deconvolutional layer of stride 2 or higher (additional branches are added for higher upsample factors).
  • Parameter Matching: Select sub-kernel sizes such that total parameters match or undercut the baseline deconvolution.
  • Optimization: Apply “masked conv” implementation for maximal efficiency.
  • Performance Expectation: Slightly increased runtime in exchange for artifact-free upsampling and improved qualitative and quantitative performance.
  • Integration: Direct replacement in U-Net, VAE, GAN, and DeepLab configurations eradicates checkerboard patterns without post-processing.

7. Summary and Significance

PixelDCL addresses the longstanding checkerboard artifact problem in transposed convolution layers via explicit modeling of local pixel dependencies. Through both full and simplified sequential branch conditioning, PixelDCL enforces spatially coherent up-sampling at minimal added computational and parameter cost. Empirical results demonstrate improved segmentation and generative accuracy across standard benchmarks. The design requires no architectural overhaul and leverages efficient masked convolution techniques for practical deployment (Gao et al., 2017).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Pixel Deconvolutional Layers (PixelDCL).