HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models (2312.14091v3)

Published 21 Dec 2023 in cs.CV

Abstract: Recent progress in text-guided image inpainting, based on the unprecedented success of text-to-image diffusion models, has led to exceptionally realistic and visually plausible results. However, there is still significant potential for improvement in current text-to-image inpainting models, particularly in better aligning the inpainted area with user prompts and performing high-resolution inpainting. Therefore, we introduce HD-Painter, a training free approach that accurately follows prompts and coherently scales to high resolution image inpainting. To this end, we design the Prompt-Aware Introverted Attention (PAIntA) layer enhancing self-attention scores by prompt information resulting in better text aligned generations. To further improve the prompt coherence we introduce the Reweighting Attention Score Guidance (RASG) mechanism seamlessly integrating a post-hoc sampling strategy into the general form of DDIM to prevent out-of-distribution latent shifts. Moreover, HD-Painter allows extension to larger scales by introducing a specialized super-resolution technique customized for inpainting, enabling the completion of missing regions in images of up to 2K resolution. Our experiments demonstrate that HD-Painter surpasses existing state-of-the-art approaches quantitatively and qualitatively across multiple metrics and a user study. Code is publicly available at: https://github.com/Picsart-AI-Research/HD-Painter

References (47)

Citations (17)

View on Semantic Scholar

Summary

The paper introduces HD-Painter, a high-resolution inpainting model that leverages novel PAIntA and RASG mechanisms to enhance text-image alignment, achieving a significant accuracy improvement.
The PAIntA layer dynamically adjusts attention scores based on the relevance of user prompts, ensuring that inpainted regions coherently reflect the provided textual guidance.
The model scales to 2K resolution using a specialized super-resolution module, seamlessly integrating inpainted content with surrounding image details.

Introducing HD-Painter

High-Resolution Image Inpainting

The process of image inpainting involves filling missing regions within an image in a consistent and visually plausible way. One of the challenges with current inpainting models lies in ensuring the filled content is well-aligned with user prompts, particularly in high-resolution images. The paper introduces HD-Painter, an approach that improves the quality of text-guided image inpainting at resolutions as high as 2K, aligning more closely with the intentions of the user's text prompt.

Enhanced Attention with PAIntA

The proposed HD-Painter utilizes a Prompt-Aware Introverted Attention (PAIntA) layer. This innovation augments standard self-attention mechanisms by considering the given textual prompt. It increases or reduces the impact of attention scores based on their relevance to the prompt. This attention modulation improves the coherence between the inpainted area and the textual instructions provided by the user. By focusing on prompt-related aspects of the image, PAIntA reduces the undue impact of the background or adjacent objects that may otherwise overshadow the user’s input.

More Focused Guidance with RASG

To enhance text-alignment even further, the paper introduces a Reweighting Attention Score Guidance (RASG) mechanism. This post-hoc method integrates into the diffusion process, aligning the generation more tightly with the text prompt while preserving the quality of the image. RASG uses gradient-based guidance, reweighted in a way that respects the original latent space distribution. By preventing inappropriate shifts in the sampling process, it ensures that the inpainted regions not only match the text prompt but also remain within the field of what the underlying model was trained to reproduce.

Scaling to Higher Resolutions

HD-Painter also features a specialized super-resolution technique crafted specifically for inpainting applications. This component is crucial for high-resolution image completion, as it intends to leverage the detailed information from known regions. The output from the lower-resolution inpainted image serves as conditional input to a diffusion process for image upscaling, allowing for smooth transitions in up to 2048 × 2048 resolution images.

Performance and Contributions

The experiments showcase HD-Painter's superior performance over existing methods both in qualitative and quantitative evaluations, including a significant accuracy improvement of 61.4% versus 51.9%. This advanced capability stems from combining PAIntA and RASG, both of which are pluggable and can enhance any diffusion-based inpainting model. The paper commits to releasing the code publicly, enabling further research and development in this space.

PDF Markdown

Related Papers

GitHub

GitHub - Picsart-AI-Research/HD-Painter: HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models (238 stars)

Tweets

https://twitter.com/22146921/status/1738330500467740890