Harmonizer: Learning to Perform White-Box Image and Video Harmonization (2207.01322v2)

Published 4 Jul 2022 in cs.CV

Abstract: Recent works on image harmonization solve the problem as a pixel-wise image translation task via large autoencoders. They have unsatisfactory performances and slow inference speeds when dealing with high-resolution images. In this work, we observe that adjusting the input arguments of basic image filters, e.g., brightness and contrast, is sufficient for humans to produce realistic images from the composite ones. Hence, we frame image harmonization as an image-level regression problem to learn the arguments of the filters that humans use for the task. We present a Harmonizer framework for image harmonization. Unlike prior methods that are based on black-box autoencoders, Harmonizer contains a neural network for filter argument prediction and several white-box filters (based on the predicted arguments) for image harmonization. We also introduce a cascade regressor and a dynamic loss strategy for Harmonizer to learn filter arguments more stably and precisely. Since our network only outputs image-level arguments and the filters we used are efficient, Harmonizer is much lighter and faster than existing methods. Comprehensive experiments demonstrate that Harmonizer surpasses existing methods notably, especially with high-resolution inputs. Finally, we apply Harmonizer to video harmonization, which achieves consistent results across frames and 56 fps at 1080P resolution. Code and models are available at: https://github.com/ZHKKKe/Harmonizer.

Citations (54)

View on Semantic Scholar

Summary

The paper introduces a cascaded regression framework that predicts filter arguments for precise and efficient white-box image and video harmonization.
It replaces heavy autoencoder methods with basic image filters to reduce computational overhead while maintaining high-resolution quality.
Experimental results on the iHarmony4 dataset demonstrate significant gains in inference speed, model compactness, and flicker reduction in videos.

Harmonizer: Learning to Perform White-Box Image and Video Harmonization

The paper presents a novel approach to image and video harmonization, introducing the Harmonizer framework that departs from the traditional black-box autoencoder-based methodologies. The authors address the limitations of current techniques, such as unsatisfactory performance with high-resolution imagery and significant computational overheads. Harmonizer frames the harmonization task as an image-level regression problem aimed at predicting filter arguments, which are subsequently applied via efficient white-box filters for processing the input images.

Key Contributions and Methodology

The research highlights several deficiencies in existing image harmonization methods. Primarily, these methods struggle with high-resolution inputs due to their reliance on pixel-wise image translation, leading to oversized models and slow inference speeds. Harmonizer circumvents these issues by leveraging basic image filters that adjust attributes like brightness, contrast, and saturation. The framework employs a cascade regressor and dynamic loss strategy to predict filter arguments stably and precisely. This paradigm allows Harmonizer to operate with reduced computational demands, resulting in a model that is both faster and lighter than its predecessors.

The framework structure involves a neural network composed of a backbone encoder for feature extraction and a regressor for filter argument prediction. Notably, the framework deploys these white-box filters in sequence, guided by the cascade regressor that conditions predictions on preceding filter argument features, promoting harmonious output generation while mitigating inter-filter dependencies.

Experimental Evaluation

Harmonizer's performance is rigorously evaluated against established approaches on the iHarmony4 dataset. Noteworthy improvements are observed with the Harmonizer surpassing current state-of-the-art methods, particularly in high-resolution scenarios. Its ability to directly process higher-resolution inputs without forfeiting accuracy or detail underscores the design's practical advantages in everyday applications. In terms of operational efficiency, Harmonizer achieves substantial gains in model size, inference speed, and memory consumption, establishing its suitability for mobile and real-time applications.

The experiments extend to video harmonization, where Harmonizer's strategy for smoothing filter arguments across frames successfully mitigates flickering effects, a common issue in traditional methods applied frame-by-frame. This aspect showcases Harmonizer's robustness and adaptability in dynamic visual content processing.

Implications and Future Prospects

The implications of this research are significant for the domain of digital image processing, particularly in applications demanding high-fidelity outputs with limited computational resources. By moving away from opaque, resource-intensive models to a more transparent and efficient framework, Harmonizer sets a precedent for future exploration in white-box methodology in AI tasks. The potential for enhancing the capability and understanding of harmonization tasks in the domain of video content further broadens the scope for innovative applications in multimedia editing and augmentation.

Conclusion

In essence, Harmonizer introduces a pragmatic shift in image harmonization paradigms, effectively combining predictive efficiency with operational transparency. This approach not only addresses current limitations in harmonization models but also posits a substantive outline for ongoing research in refining AI frameworks for image and video processing. Future exploration could involve expanding the white-box filter set to address complex color-specific inconsistencies and contextual gaps, potentially extending Harmonizer's applicability across broader and more diverse visual scenarios.

PDF Markdown

Related Papers

YouTube

Show All Videos