PixelDCL: Artifact-Free Upsampling

Updated 30 March 2026

PixelDCL is a neural module that replaces standard deconvolutions by sequentially conditioning pixel groups to enforce local dependencies and eliminate checkerboard artifacts.
It minimizes artifacts by generating adjacent pixels with shared computational paths, reducing parameters while maintaining efficient up-sampling.
Integrating PixelDCL into models like U-Net, VAE, and GAN improves segmentation accuracy and image generation quality with a marginal computational overhead.

A Pixel Deconvolutional Layer (PixelDCL) is a learnable neural module that replaces standard deconvolutional (also known as transposed convolution) layers for up-sampling in deep architectures. PixelDCL establishes explicit computational dependencies among adjacent pixels in the output feature map, thereby eliminating checkerboard artifacts that arise in classical deconvolutions. It achieves this by sequentially generating output pixel groups (branches), conditioning each on previously generated branches and, optionally, the original input. PixelDCL can be integrated into existing models—such as U-Net, DeepLab, VAE, and GANs—as a drop-in replacement, and supports efficient parallelized implementation with a minor computational overhead (Gao et al., 2017).

1. Problem Definition: Checkerboard Artifacts in Classical Deconvolution

Standard deconvolutions or transposed convolutions for up-sampling employ a single $(2s)\times(2s)$ kernel $W$ reshaped into several smaller $s\times s$ kernels $\{k_1, ..., k_4\}$ when upsampling by a stride of 2. The output activations $Y\in\mathbb R^{2h\times2w\times c_{\text{out}}}$ are constructed by interleaving the results: $Y = \mathcal{PS}\left\{X \circledast k_1, X \circledast k_2, X \circledast k_3, X \circledast k_4\right\}$ where $\circledast$ indicates convolution and $\mathcal{PS}$ denotes “periodic shuffle + sum.” Each $F_i = X\circledast k_i$ is computed independently. As a result, in $Y$ , spatially adjacent output pixels may originate from independent computational branches, lacking cross-pixel dependency. This causes checkerboard patterns—regular artifacts visible across up-sampled feature maps and, by extension, in the outputs of models for segmentation and generation (Gao et al., 2017).

2. PixelDCL Formulation: Sequential Conditioning and Branch Design

PixelDCL remedies the lack of dependency by generating the output through a sequence of interdependent branches:

iPixelDCL (input PixelDCL): Each branch is conditioned on all earlier branches as well as the original input:

$\begin{aligned} F_1 &= X \circledast k_1\ F_2 &= [X, F_1]\circledast k_2\ F_3 &= [X, F_1, F_2]\circledast k_3\ F_4 &= [X, F_1, F_2, F_3]\circledast k_4\ Y &= \mathcal{PS}\{F_1, F_2, F_3, F_4\} \end{aligned}$

This enforces a full sequential dependency structure.

PixelDCL (simplified): Empirical results demonstrate that conditioning only on preceding branches (omitting the input from all but the first) suffices:

$\begin{aligned} F_1 &= X \circledast k_1\ F_2 &= F_1 \circledast k_2\ F_3 &= [F_1, F_2] \circledast k_3\ F_4 &= [F_1, F_2, F_3] \circledast k_4\ Y &= \mathcal{PS}\{F_1, F_2, F_3, F_4\} \end{aligned}$

Branches are grouped such that adjacent output pixels share a computational path, ensuring all locality dependencies are directly encoded.

3. Architectural and Parameterization Details

The following architectural components are salient:

Upsample factor $r=2$ leads to four sub-maps $F_1, F_2, F_3, F_4$ .
Parameter Count: To match a $6 \times 6$ $6 \times 6$ deconvolution,
- DCL: one $6 \times 6$ kernel ($36$ parameters)
- iPixelDCL: four $3 \times 3$ kernels ( $4 \times 9 = 36$ parameters)
- PixelDCL: two $3 \times 3$ and two $2 \times 2$ kernels (total $2 \times 9 + 2 \times 4 = 26$ parameters, fewer than standard)
Stride: The method generalizes to higher upsampling factors with added branch stages.

A practical pseudocode for PixelDCL (stride=2) is:

def PixelDCL2x(X):
    F1 = Conv3x3(X, k1)
    F2 = Conv3x3(F1, k2)
    F3 = Conv2x2([F1, F2], k3)
    F4 = Conv2x2([F1, F2, F3], k4)
    Y  = PS([F1, F2, F3, F4])
    return Y

The design ensures that the adjacency structure in the upsampled output is respected by the convolutional computation (Gao et al., 2017).

4. Efficient Implementation via Masked Convolutions

Naïve implementation of PixelDCL as a sequential process incurs prohibitive computational cost. To mitigate this, the authors introduce an efficiency technique:

Branch Merging: Branches 3 and 4 depend only on $F_1, F_2$ and are mutually independent. These can be merged into a single masked $3\times3$ convolution (inspired by PixelCNN masking), yielding both branches simultaneously.
Dilation: Once $F_1$ and $F_2$ are available, they can be dilated and summed to form a large map matching the output size.
Parallelization: Steps—dilating $F_1, F_2$ , computing the masked convolution for $F_3, F_4$ , and final summation—can be executed in parallel on GPU.
Wall-clock Time: The resulting implementation achieves up-sampling in almost the same time as a classical deconvolution, with a ~20–30% overhead.

5. Empirical Performance

Semantic Segmentation

PixelDCL and iPixelDCL were evaluated on PASCAL VOC 2012 and MSCOCO 2015 in U-Net and DeepLab-ResNet base architectures using pixel accuracy and mean IoU metrics. Results showed:

Model	Up-sample	PASCAL12 Acc.	PASCAL12 IoU	COCO15 Acc.	COCO15 IoU
U-Net (from scratch)	DCL	0.8162	0.4152	0.8093	0.3498
	iPixelDCL	0.8171	0.4488	0.8092	0.3602
	PixelDCL	0.8226	0.4560	0.8116	0.3718
DeepLab-ResNet (finetune)	DCL	0.9296	0.7270	—	—
	iPixelDCL	0.9345	0.7386	—	—
	PixelDCL	0.9313	0.7356	—	—

All PixelDCL variants surpass regular deconvolutional layers in mean IoU.

Image Generation

For generative tasks (e.g., VAE decoders on CelebA), standard deconvolutions produce strong checkerboard artifacts; PixelDCL-substituted architectures yield artifact-free outputs with no degradation in negative log-likelihood and visually improved FID-style metrics.

Computational Overhead

Measured on Pascal VOC 2012 with a Tesla K40:

Model	Training (10 epochs)	Inference (2109 images)
U-Net + DCL	365 min 26 s	2 min 42 s
U-Net + iPixelDCL	511 min 19 s	4 min 13 s
U-Net + PixelDCL	464 min 31 s	3 min 27 s

The increase in computational demand is modest and can be minimized by masked convolution optimization (Gao et al., 2017).

6. Deployment and Practical Considerations

Replacement: PixelDCL is a drop-in substitute for any deconvolutional layer of stride 2 or higher (additional branches are added for higher upsample factors).
Parameter Matching: Select sub-kernel sizes such that total parameters match or undercut the baseline deconvolution.
Optimization: Apply “masked conv” implementation for maximal efficiency.
Performance Expectation: Slightly increased runtime in exchange for artifact-free upsampling and improved qualitative and quantitative performance.
Integration: Direct replacement in U-Net, VAE, GAN, and DeepLab configurations eradicates checkerboard patterns without post-processing.

7. Summary and Significance

PixelDCL addresses the longstanding checkerboard artifact problem in transposed convolution layers via explicit modeling of local pixel dependencies. Through both full and simplified sequential branch conditioning, PixelDCL enforces spatially coherent up-sampling at minimal added computational and parameter cost. Empirical results demonstrate improved segmentation and generative accuracy across standard benchmarks. The design requires no architectural overhaul and leverages efficient masked convolution techniques for practical deployment (Gao et al., 2017).

Markdown Report Issue Upgrade to Chat

References (1)

Pixel Deconvolutional Networks (2017)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Pixel Deconvolutional Layers (PixelDCL).