Resolution-robust Large Mask Inpainting with Fourier Convolutions (2109.07161v2)

Published 15 Sep 2021 in cs.CV and eess.IV

Abstract: Modern image inpainting systems, despite the significant progress, often struggle with large missing areas, complex geometric structures, and high-resolution images. We find that one of the main reasons for that is the lack of an effective receptive field in both the inpainting network and the loss function. To alleviate this issue, we propose a new method called large mask inpainting (LaMa). LaMa is based on i) a new inpainting network architecture that uses fast Fourier convolutions (FFCs), which have the image-wide receptive field; ii) a high receptive field perceptual loss; iii) large training masks, which unlocks the potential of the first two components. Our inpainting network improves the state-of-the-art across a range of datasets and achieves excellent performance even in challenging scenarios, e.g. completion of periodic structures. Our model generalizes surprisingly well to resolutions that are higher than those seen at train time, and achieves this at lower parameter&time costs than the competitive baselines. The code is available at \url{https://github.com/saic-mdal/lama}.

Citations (672)

View on Semantic Scholar

Summary

The paper’s main contribution is a novel inpainting model that leverages fast Fourier convolutions to achieve a wide receptive field early in the network.
It demonstrates superior performance in restoring large, complex masked areas by integrating a high receptive field perceptual loss with innovative training mask generation.
Experimental results confirm enhanced efficiency and generalization on high-resolution images, achieving improvements in both perceptual quality and parameter usage.

Resolution-robust Large Mask Inpainting with Fourier Convolutions

This paper presents a novel approach to image inpainting, specifically addressing the challenges of large mask inpainting with a method called LaMa. The paper identifies key limitations in existing inpainting methods, namely their struggles with large missing areas, complex geometric structures, and handling high-resolution images. The primary contribution is a new network architecture utilizing fast Fourier convolutions (FFCs), which allow for an image-wide receptive field early in the processing pipeline. This is coupled with a high receptive field perceptual loss and an innovative approach to training mask generation.

Key Components and Architecture

The LaMa method is distinguished by the integration of FFCs, which leverage both local and global information through channel-wise FFTs. This approach enhances the receptive field significantly, even in initial network layers, promoting better parameter efficiency and perceptual quality. This is especially beneficial for high-resolution inpainting, which typically requires comprehensive contextual understanding. The use of FFCs demonstrates superior performance in capturing periodic structures, a common shortfall in previous convolution-based models.

Loss Functions

The paper introduces a high receptive field perceptual loss (HRF PL), which uses a segmentation network backbone to encourage global consistency and capture structural semantics. This loss is combined with adversarial loss and a discriminator-based perceptual loss, ensuring that the generated inpainting maintains local detail fidelity. Through careful ablation studies, the authors show that the choice of an HRF perceptual loss is critical for successful inpainting of large masked areas.

Experimental Results

LaMa's performance was rigorously tested against several baselines across datasets like Places and CelebA-HQ. The results indicate LaMa's superiority, particularly in dealing with wide masks and high-resolution imagery, while requiring fewer parameters than most competitors. The results were confirmed by both quantitative metrics such as FID and LPIPS, as well as a user paper evaluating perceptual quality.

Generalization and Practical Implications

One of the remarkable findings is LaMa's ability to generalize to high-resolution images that were not used during training. This suggests that the model's design, particularly the use of FFCs, imparts a degree of scale invariance, reducing the data and computational demands typically associated with high-resolution model training. This insight offers promising implications for practical applications where computational resources are constrained.

Future Directions

Future research could explore integrating Transformers, as noted by the authors, to further enhance receptive field characteristics. Additionally, investigating different architectures and loss functions might expand the capability of inpainting models to handle even more diverse visual contexts and complex structural fills.

Overall, the LaMa approach provides a significant advance in the efficiency and capability of resolution-robust image inpainting, showcasing a pathway for further explorations in efficient high-resolution computer vision models.

Related Papers

GitHub

GitHub - advimman/lama: 🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022 (7,406 stars)

Tweets

https://twitter.com/abhii_298/status/1807181855772529006

YouTube

Show All Videos