Free-Form Image Inpainting with Gated Convolution (1806.03589v2)

Published 10 Jun 2018 in cs.CV, cs.GR, and cs.LG

Abstract: We present a generative image inpainting system to complete images with free-form mask and guidance. The system is based on gated convolutions learned from millions of images without additional labelling efforts. The proposed gated convolution solves the issue of vanilla convolution that treats all input pixels as valid ones, generalizes partial convolution by providing a learnable dynamic feature selection mechanism for each channel at each spatial location across all layers. Moreover, as free-form masks may appear anywhere in images with any shape, global and local GANs designed for a single rectangular mask are not applicable. Thus, we also present a patch-based GAN loss, named SN-PatchGAN, by applying spectral-normalized discriminator on dense image patches. SN-PatchGAN is simple in formulation, fast and stable in training. Results on automatic image inpainting and user-guided extension demonstrate that our system generates higher-quality and more flexible results than previous methods. Our system helps user quickly remove distracting objects, modify image layouts, clear watermarks and edit faces. Code, demo and models are available at: https://github.com/JiahuiYu/generative_inpainting

Authors (6)

Jiahui Yu (65 papers)
Zhe Lin (163 papers)
Jimei Yang (58 papers)
Xiaohui Shen (67 papers)
Xin Lu (165 papers)
Thomas Huang (48 papers)

Citations (1,619)

View on Semantic Scholar

Summary

The paper’s main contribution is the introduction of gated convolutions that adaptively select features per spatial location for superior inpainting.
It presents SN-PatchGAN, a spectral-normalized patch-based discriminator that stabilizes adversarial training for localized image reconstruction.
Extensive experiments on Places2 and CelebA-HQ show lower reconstruction errors and seamless editing for free-form masks.

Free-Form Image Inpainting with Gated Convolution

The paper "Free-Form Image Inpainting with Gated Convolution" by Jiahui Yu et al. presents a novel approach to image inpainting, focusing on the challenges introduced by free-form masks. Traditional inpainting methods like PatchMatch and vanilla convolution-based deep learning frameworks face significant limitations when handling irregular, non-rectangular masks. The authors propose two primary contributions to address these challenges: gated convolutions and SN-PatchGAN.

Gated Convolutions

The core innovation of the paper lies in the introduction of gated convolutions, designed to dynamically select features for each channel independently at each spatial location. This mechanism is a significant improvement over vanilla convolutions, which uniformly apply convolutional filters across all input pixels, leading to artifacts when dealing with inpainting tasks. The gated convolution approach replaces the rule-based mask-update step used in partial convolutions with a learnable mechanism, allowing the network to adaptively determine the importance of features and mask values at every layer, improving inpainting quality especially for free-form masks.

The gated convolution is formulated as two parallel convolutions that compute a feature map and a gating map, respectively. The gating map is used to modulate the feature map, enabling a soft selection of features based on the input. This approach is advantageous over the partial convolution method, which uses a hard binary mask and does not account for nuanced feature representations across different spatial locations and channels.

SN-PatchGAN

In addition to gated convolutions, the paper introduces SN-PatchGAN, a variant of generative adversarial networks (GANs) that incorporates spectral normalization to stabilize training. The spectral-normalized discriminator operates on dense image patches, making it well-suited for free-form inpainting tasks where masks can appear anywhere in the image with arbitrary shapes. This patch-based GAN directly computes hinge loss over each point of the output map, providing a more localized and stable adversarial training process.

Experimental Results

The authors validate their approach using extensive experiments on the Places2 and CelebA-HQ datasets. Quantitative metrics such as mean $\ell_1$ and $\ell_2$ errors demonstrate superior performance over several baseline methods including PatchMatch, Global{content}Local, ContextAttention, and PartialConv. Specifically, the proposed method achieves lower reconstruction errors, highlighting its ability to generate more visually coherent and semantically plausible inpainting results.

The qualitative results further confirm the advantages of the proposed approach. The free-form inpainting system effectively handles complex scenes and faces, producing seamless inpainting that aligns well with the surrounding context. The user-guided inpainting results exhibit high adaptability, faithfully following sparse sketches provided as additional input.

Practical Implications

The implications of this research are significant for practical image editing applications. The ability to handle free-form masks makes the inpainting system highly versatile, supporting tasks such as object removal, layout modification, watermark removal, and facial editing. Moreover, the seamless integration of user guidance through sketches enhances the practical usability of the system, allowing for creative interactive editing.

Future Directions

Future research could explore several directions extending from this work. Enhancing the efficiency of gated convolution operations in real-time applications would be beneficial. Moreover, investigating the integration of other forms of user guidance, such as high-level semantic maps or reference images, could further expand the applicability of the inpainting system. Finally, exploring broader applications in video inpainting and other visual domains could unveil new opportunities for this approach.

In conclusion, the paper makes a substantial contribution to the field of image inpainting by addressing the limitations of existing methods with innovative solutions in the form of gated convolutions and SN-PatchGAN. The demonstrated improvements in both quantitative metrics and qualitative visual results underline the effectiveness of the proposed approach in handling free-form masks, thus opening new avenues for advanced image editing tools.

PDF Markdown

Related Papers

YouTube

Show All Videos