- The paper introduces a novel initialization strategy called ICNR to mitigate checkerboard artifacts in sub-pixel convolution layers without compromising modeling capabilities.
- Checkerboard artifacts in sub-pixel convolution layers result from deconvolution overlap and random initialization, a problem less prevalent in resize convolution which has different trade-offs.
- Experimental results show that ICNR initialization leads to faster convergence, lower error minima, and significantly improved artifact mitigation in tasks like super-resolution compared to standard methods.
A Note on Sub-Pixel Convolution and Strategies to Mitigate Checkerboard Artifacts
This paper addresses the prevalent issue of checkerboard artifacts in the outputs of convolutional neural networks, specifically those arising from sub-pixel convolution layers, also known as deconvolution layers. These layers are integral to tasks requiring upscaling of input resolution in CNNs, such as super-resolution and generative modeling. Normal methods to combat these artifacts have traditionally included post-processing and architectural design changes; however, these are often not wholly effective or efficient.
Checkerboard artifacts arise primarily due to two issues: deconvolution overlap and random initialization. Deconvolution overlap occurs when the kernel size is not perfectly divisible by the stride, leading to uneven contributions across high-resolution (HR) feature maps. The second problem, random initialization, even affects evenly divisible kernel sizes, causing artifacts due to the independent initialization of sub-kernels.
The authors revisit the differences and connections between sub-pixel convolution and resize convolution layers. Sub-pixel convolution involves a standard convolution followed by a periodic reshuffling of outputs, while resize convolution employs nearest-neighbor interpolation followed by a convolution at full resolution. Each method has its trade-offs. Resize convolutions tend to eliminate the checkerboard artifact problem but may lack the modeling power due to their reduced trainability. Conversely, sub-pixel convolutions offer enhanced modeling power but are prone to artifacts.
A novel initialization strategy is proposed, termed 'convolution NN resize' (ICNR). This method aligns the initialization of sub-pixel convolution layers with that of nearest-neighbor resize operations, aiming to circumvent checkerboard artifacts without reducing the network's modeling capabilities. By repeating sub-kernel sets during initialization, they ensure consistency in how each pixel in high-resolution outputs is generated from low-resolution inputs.
Experimental validations confirm the efficacy of the ICNR approach, demonstrating it yields competitive super-resolution performance with improved artifact mitigation when compared to standard orthogonal initialization. Training loss metrics and mean squared error evaluations indicate that networks initialized with ICNR not only converge faster but also reach lower error minima. Importantly, the smoother sub-pixel convolution kernels achieved through ICNR suggest better modeling potential at early training stages.
The analysis suggests future avenues, such as integrating more advanced upscaling functions in place of nearest-neighbor interpolation, which could pair with the new initialization schema to enhance upscaling performance further.
In conclusion, the ICNR initialization scheme presents a significant advancement over existing methods by eliminating checkerboard artifacts post-initialization while retaining the enhanced modeling power of sub-pixel convolutional layers. This strategy could prove beneficial in various applications reliant on image generation, providing a critical improvement for enhancing the visual quality of CNN upscaled images.