Image Inpainting via Conditional Texture and Structure Dual Generation (2108.09760v2)

Published 22 Aug 2021 in cs.CV

Abstract: Deep generative approaches have recently made considerable progress in image inpainting by introducing structure priors. Due to the lack of proper interaction with image texture during structure reconstruction, however, current solutions are incompetent in handling the cases with large corruptions, and they generally suffer from distorted results. In this paper, we propose a novel two-stream network for image inpainting, which models the structure-constrained texture synthesis and texture-guided structure reconstruction in a coupled manner so that they better leverage each other for more plausible generation. Furthermore, to enhance the global consistency, a Bi-directional Gated Feature Fusion (Bi-GFF) module is designed to exchange and combine the structure and texture information and a Contextual Feature Aggregation (CFA) module is developed to refine the generated contents by region affinity learning and multi-scale feature aggregation. Qualitative and quantitative experiments on the CelebA, Paris StreetView and Places2 datasets demonstrate the superiority of the proposed method. Our code is available at https://github.com/Xiefan-Guo/CTSDG.

References (40)

Citations (172)

View on Semantic Scholar

Summary

The paper introduces a novel two-stream GAN architecture that dual-generates image structure and texture for enhanced restoration of large corruptions.
The Bi-GFF and CFA modules enable bi-directional feature fusion and multi-scale contextual aggregation to improve the coherence of restored images.
Experimental results on CelebA, Paris StreetView, and Places2 demonstrate superior performance over state-of-the-art methods using metrics like LPIPS, PSNR, and SSIM.

Image Inpainting via Conditional Texture and Structure Dual Generation

The paper "Image Inpainting via Conditional Texture and Structure Dual Generation" by Xiefan Guo, Hongyu Yang, and Di Huang introduces a sophisticated approach to the task of image inpainting, focusing on the integration of structure and texture information to better tackle complex and large corruptions in images. The proposed method extends traditional inpainting techniques by employing a novel two-stream network architecture that effectively facilitates the dual generation of image structures and textures, aiming to produce more visually plausible and semantically consistent inpainting results.

Methodology Overview

The core of the proposed approach is a two-stream generative adversarial network (GAN), which builds upon the concept of structure-constrained texture synthesis and texture-guided structure reconstruction. This dual generation tasks are designed to complement each other, leveraging the synergistic interactions between texture and structure components within the network. The generator consists of two parallel branches, each responsible for one of the subtasks, which are combined by a two-branch discriminator that evaluates the outputs for both realism and consistency.

Key innovations within this framework include the Bi-directional Gated Feature Fusion (Bi-GFF) module and the Contextual Feature Aggregation (CFA) module. The Bi-GFF module refines the consistency between structure and texture features by integrating them bi-directionally through soft gating mechanisms, thus enhancing the generation's coherence. The CFA module captures long-range dependencies and aggregates features at multiple scales to ensure detailed and context-aware inpainting, which is crucial for managing large missing image regions with complex patterns.

Experimental Evaluation

The effectiveness of the proposed model is rigorously evaluated on standard datasets, including CelebA, Paris StreetView, and Places2, where it demonstrates superior performance both qualitatively and quantitatively. Metrics such as LPIPS, PSNR, and SSIM indicate that the method outperforms contemporary state-of-the-art approaches, such as EdgeConnect, MED, and PatchMatch, particularly in handling large corruptions. Visual comparisons highlight the method's ability to restore both the overall structure and fine textures more accurately than its predecessors.

Implications and Future Directions

The dual generation strategy outlined in this work opens new avenues for more intelligent design of inpainting networks that effectively intertwine structure and texture synthesis tasks. This approach not only improves the visual authenticity of the inpainted regions but also addresses limitations observed in single-stream or structurally driven models. The paper’s findings suggest potential improvements in applications such as photo editing, object removal, and image restoration, where maintaining both local texture fidelity and global structural integrity is essential.

Future research could explore extending this dual generation framework to other domains beyond image inpainting, such as video restoration or 3D model reconstruction, where the integration of multi-source information is increasingly critical. Additionally, further refinements in the network architecture, perhaps through attention mechanisms or transformer models, could enhance the capability of dual methods to efficiently handle even more complex scenes.

In conclusion, the dual generation approach presented in this paper highlights a significant step forward in the field of image inpainting, encouraging further exploration into sophisticated network designs that balance and integrate diverse visual elements for comprehensive scene restoration.

PDF Markdown

Related Papers

GitHub

GitHub - xiefan-guo/ctsdg: [ICCV 2021] Image Inpainting via Conditional Texture and Structure Dual Generation (184 stars)