High-Resolution Image Harmonization via Collaborative Dual Transformations (2109.06671v2)

Published 14 Sep 2021 in cs.CV

Abstract: Given a composite image, image harmonization aims to adjust the foreground to make it compatible with the background. High-resolution image harmonization is in high demand, but still remains unexplored. Conventional image harmonization methods learn global RGB-to-RGB transformation which could effortlessly scale to high resolution, but ignore diverse local context. Recent deep learning methods learn the dense pixel-to-pixel transformation which could generate harmonious outputs, but are highly constrained in low resolution. In this work, we propose a high-resolution image harmonization network with Collaborative Dual Transformation (CDTNet) to combine pixel-to-pixel transformation and RGB-to-RGB transformation coherently in an end-to-end network. Our CDTNet consists of a low-resolution generator for pixel-to-pixel transformation, a color mapping module for RGB-to-RGB transformation, and a refinement module to take advantage of both. Extensive experiments on high-resolution benchmark dataset and our created high-resolution real composite images demonstrate that our CDTNet strikes a good balance between efficiency and effectiveness. Our used datasets can be found in https://github.com/bcmi/CDTNet-High-Resolution-Image-Harmonization.

Citations (77)

View on Semantic Scholar

Summary

The paper presents CDTNet, a dual transformation network that uses U-Net based pixel adjustments and global RGB mapping to harmonize high-resolution composite images.
It integrates a low-resolution generator, a color mapping module using 3D LUTs, and a refinement module to enhance metrics like MSE, PSNR, and SSIM.
The approach achieves significant computational efficiency by reducing FLOPs and memory usage, enabling scalable, high-quality composite image editing for AR and design.

High-Resolution Image Harmonization via Collaborative Dual Transformations

The paper "High-Resolution Image Harmonization via Collaborative Dual Transformations" presents a novel method for the task of image harmonization. At its core, the paper addresses the challenge of adjusting the foreground of composite images to ensure compatibility with existing backgrounds. This task is non-trivial, especially when dealing with high-resolution images. Previous approaches either employed global RGB-to-RGB transformations, which lack the granularity required to handle diverse local contexts, or applied dense pixel-to-pixel transformations that are generally constrained to low-resolution images.

Proposed Method: Collaborative Dual Transformations

The authors introduce a new framework called CDTNet (Collaborative Dual Transformation Network) that effectively merges the strengths of pixel-to-pixel transformations with RGB-to-RGB transformations. The architecture of CDTNet consists of three primary components:

Low-Resolution Generator: This module performs pixel-to-pixel transformations using a U-Net style architecture on downscaled versions of the composite image. It captures local context and adapts the pixels of the foreground to blend seamlessly into the background.
Color Mapping Module: Responsible for RGB-to-RGB transformation, this component employs a set of basis 3D lookup tables (LUTs) that enable global color mapping transformations. The selection of LUTs is influenced by the encoder features of the low-resolution generator, ensuring image-specific adaptations.
Refinement Module: This lightweight module synthesizes outputs from the low-resolution generator and color mapping module. It refines the harmonization process by integrating distinct outputs to preserve high-resolution detail while ensuring local and global consistency.

The proposed method demonstrates increased efficacy and efficiency in processing high-resolution images, balancing computational cost and memory usage effectively. CDTNet provides high-fidelity outputs that preserve sharpness and detail, addressing edge blurring issues often faced when upscaling low-resolution transformations.

Numerical Performance and Attributes

The authors conduct experiments on both synthetic and real-world composite image datasets. The performance is evaluated in terms of several metrics: MSE, fMSE, PSNR, and SSIM. The results indicate that CDTNet achieves superior performance across these metrics when compared to existing methods. Notably, the simplification of pixel transformation through the deep RGB-to-RGB transformer alone yields competitive results, underscoring its effectiveness.

The network also demonstrates impressive computational efficiency. It reduces FLOPs and memory usage significantly compared to current state-of-the-art methods, particularly for very high-resolution images (e.g., 2048x2048). Such savings are crucial for scalable real-world applications and present significant improvements over existing methodologies.

Implications and Future Directions

The integration of pixel-to-pixel with RGB-to-RGB transformations provides a balanced approach that accommodates local changes without compromising global coherence. This method sets a precedent for future harmonization frameworks that might explore more sophisticated interactions between local and global transformations.

Practically, CDTNet's architecture promises enhancements in graphic design, augmented reality, and digital content creation where rapid, high-quality image composites are required. The framework can also serve as a foundation for subsequent research endeavors in developing even more advanced harmonization techniques, potentially leveraging recent advances in neural network architectures and computational paradigms.

In conclusion, while CDTNet does not outright eliminate the challenges associated with high-resolution image harmonization, it mitigates many existing limitations and broadens the potential for new advancements in the field.

PDF Markdown

Related Papers

GitHub

GitHub - bcmi/CDTNet-High-Resolution-Image-Harmonization: [CVPR 2022] We unify pixel-to-pixel transformation and color-to-color transformation in a coherent framework for high-resolution image harmonization. We also release 100 high-resolution real composite images for evaluation. (110 stars)