Bridging Composite and Real: Towards End-to-end Deep Image Matting

Published 30 Oct 2020 in cs.CV, cs.LG, and eess.IV | (2010.16188v3)

Abstract: Extracting accurate foregrounds from natural images benefits many downstream applications such as film production and augmented reality. However, the furry characteristics and various appearance of the foregrounds, e.g., animal and portrait, challenge existing matting methods, which usually require extra user inputs such as trimap or scribbles. To resolve these problems, we study the distinct roles of semantics and details for image matting and decompose the task into two parallel sub-tasks: high-level semantic segmentation and low-level details matting. Specifically, we propose a novel Glance and Focus Matting network (GFM), which employs a shared encoder and two separate decoders to learn both tasks in a collaborative manner for end-to-end natural image matting. Besides, due to the limitation of available natural images in the matting task, previous methods typically adopt composite images for training and evaluation, which result in limited generalization ability on real-world images. In this paper, we investigate the domain gap issue between composite images and real-world images systematically by conducting comprehensive analyses of various discrepancies between the foreground and background images. We find that a carefully designed composition route RSSN that aims to reduce the discrepancies can lead to a better model with remarkable generalization ability. Furthermore, we provide a benchmark containing 2,000 high-resolution real-world animal images and 10,000 portrait images along with their manually labeled alpha mattes to serve as a test bed for evaluating matting model's generalization ability on real-world images. Comprehensive empirical studies have demonstrated that GFM outperforms state-of-the-art methods and effectively reduces the generalization error. The code and the datasets will be released at https://github.com/JizhiziLi/GFM.

Abstract PDF Upgrade to Chat

Citations (96)

View on Semantic Scholar

Summary

The paper introduces an end-to-end framework that eliminates trimap requirements by decomposing image matting into semantic segmentation and detail refinement tasks.
It employs a dual-decoder architecture with a shared encoder to bridge the domain gap between composite and real images using a novel RSSN approach and BG-20k dataset.
Empirical results on AM-2k and PM-10k benchmarks demonstrate reduced SAD and MSE errors, outperforming state-of-the-art methods in image matting.

Bridging Composite and Real: Towards End-to-end Deep Image Matting

The paper "Bridging Composite and Real: Towards End-to-end Deep Image Matting" addresses the challenge of extracting accurate foregrounds from natural images without requiring trimaps or user inputs, enhancing its applicability in areas such as film production and augmented reality. The authors propose a novel approach through a Glance and Focus Matting network (GFM) that effectively decomposes image matting into high-level semantic segmentation and low-level detail matting tasks.

Methodology and Contributions

Novel Model Architecture: The GFM comprises a shared encoder and two distinct decoders—one for semantic segmentation and another for detail matting. This dual-decoder architecture allows the model to learn collaborative representations, capitalizing on both high-level semantic features and low-level detail features.
Domain Gap Analysis: A significant contribution is the investigation of the domain gap between composite and real-world images. The model addresses discrepancies in resolution, sharpness, noise, and semantics, which are prevalent when using composite datasets for training.
Composition Route and Dataset Creation: To mitigate the domain gap, a new composition route called RSSN is introduced, alongside BG-20k, a large-scale high-resolution background dataset. The RSSN approach considers factors like noise discrepancy and semantic consistency, leading to more robust generalization capabilities.
Benchmark Datasets: Introduction of AM-2k and PM-10k datasets allows for evaluating the generalization capacity of matting models on real-world images, featuring 12,000 high-resolution animal and portrait images with corresponding alpha mattes.

Results and Implications

The empirical studies demonstrate that the GFM outperforms state-of-the-art (SOTA) methods on newly proposed ORI-Track and COMP-Track benchmarks. Notably, the model achieves lower SAD and MSE errors, indicating improved accuracy and detail preservation. The RSSN composition strategy drastically reduces the generalization error compared to traditional methods, strengthening the applicability of the model in real-world scenarios.

Future Directions

The findings highlight several avenues for further research:

Improvement of Details Matting: There remains potential to enhance the matting within transition areas. Future work could involve integrating structure-aware and perceptual losses or refining the focus decoder's architecture.
Domain Adaptation: Exploring domain adaptation techniques can further narrow the performance gap between models trained on composite and real datasets.
Expansion to Diverse Transition Areas: Extending current methodologies to effectively handle semi-transparent or intricate transition areas, such as those seen in complex textures or materials, is a promising challenge.
Collaborative Learning Enhancement: Strengthening the collaborative aspect between segmentation and detail matting decoders to improve overall model robustness.

In conclusion, this research sets a foundational step towards more accurate and autonomous image matting. By bridging the composite-real domain gap and providing robust benchmark datasets, this work paves the way for practical and theoretical advancements in computer vision and image processing.

Markdown

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

Continue Learning

Authors (4)

Collections

GitHub

GitHub - JizhiziLi/GFM: [IJCV 2022] Bridging Composite and Real: Towards End-to-end Deep Image Matting (907 stars)

Bridging Composite and Real: Towards End-to-end Deep Image Matting

Summary

Bridging Composite and Real: Towards End-to-end Deep Image Matting

Methodology and Contributions

Results and Implications

Future Directions

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (4)

Collections

GitHub