MaGGIe: Masked Guided Gradual Human Instance Matting (2404.16035v1)
Abstract: Human matting is a foundation task in image and video processing, where human foreground pixels are extracted from the input. Prior works either improve the accuracy by additional guidance or improve the temporal consistency of a single instance across frames. We propose a new framework MaGGIe, Masked Guided Gradual Human Instance Matting, which predicts alpha mattes progressively for each human instances while maintaining the computational cost, precision, and consistency. Our method leverages modern architectures, including transformer attention and sparse convolution, to output all instance mattes simultaneously without exploding memory and latency. Although keeping constant inference costs in the multiple-instance scenario, our framework achieves robust and versatile performance on our proposed synthesized benchmarks. With the higher quality image and video matting benchmarks, the novel multi-instance synthesis approach from publicly available sources is introduced to increase the generalization of models in real-world scenarios.
- Adobe. Adobe premiere. https://www.adobe.com/products/premiere.html, 2023.
- Apple. Cutouts object ios 16. https://support.apple.com/en-hk/102460, 2023.
- Delving deeper into convolutional networks for learning video representations. arXiv preprint arXiv:1511.06432, 2015.
- Method for removing from an image the background surrounding a selected object, 2000. US Patent 6,134,346.
- Pp-matting: high-accuracy natural image matting. arXiv preprint arXiv:2204.09433, 2022a.
- Robust human matting via semantic guidance. In ACCV, 2022b.
- Masked-attention mask transformer for universal image segmentation. In CVPR, 2022.
- Xmem: Long-term video object segmentation with an atkinson-shiffrin memory model. In ECCV, 2022.
- Natural image matting using deep convolutional neural networks. In ECCV, 2016.
- Spconv Contributors. Spconv: Spatially sparse convolution library. https://github.com/traveller59/spconv, 2022.
- f𝑓fitalic_f, b𝑏bitalic_b, alpha matting. arXiv preprint arXiv:2003.07711, 2020.
- Google. Magic editor in google pixel 8. https://pixel.withgoogle.com/Pixel_8_Pro/use-magic-editor, 2023.
- Deep residual learning for image recognition. In CVPR, 2016.
- Mask r-cnn. In ICCV, 2017.
- Occlusion matting: realistic occlusion handling for augmented reality applications. In 2017 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE, 2017.
- Context-aware image matting for simultaneous foreground and alpha estimation. In ICCV, 2019.
- End-to-end video matting with trimap propagation. In CVPR, 2023.
- Progressive semantic segmentation. In CVPR, 2021.
- Simpson: Simplifying photo cleanup with single-click distracting object segmentation network. In CVPR, 2023.
- Pytorch. Programming with TensorFlow: Solution for Edge Computing Applications, 2021.
- Video mask transfiner for high-quality video instance segmentation. In ECCV, 2022a.
- Modnet: Real-time trimap-free portrait matting via objective decomposition. In AAAI, 2022b.
- Segment anything. In ICCV, 2023.
- Nonlocal matting. In CVPR, 2011.
- A closed-form solution to natural image matting. IEEE TPAMI, 30(2), 2007.
- Privacy-preserving portrait matting. In ACM MM, 2021a.
- Deep automatic natural image matting. In IJCAI, 2021b.
- Vmformer: End-to-end video matting with transformer. arXiv preprint arXiv:2208.12801, 2022a.
- Bridging composite and real: towards end-to-end deep image matting. IJCV, 2022b.
- Video instance matting. In WACV, 2024.
- Natural image matting via guided contextual attention. In AAAI, 2020.
- Adaptive human matting for dynamic videos. In CVPR, 2023.
- Real-time high-resolution background matting. In CVPR, 2021.
- Robust high-resolution video matting with temporal guidance. In WACV, 2022.
- Microsoft coco: Common objects in context. In ECCV, 2014.
- Sparse convolutional neural networks. In CVPR, 2015.
- Indices matter: Learning to index for deep image matting. In CVPR, 2019.
- Video object segmentation using space-time memory networks. In ICCV, 2019.
- Mask-guided matting in the wild. In CVPR, 2023.
- Improving closed and open-vocabulary attribute prediction using transformers. In ECCV, 2022.
- Composing object relations and attributes for image-text matching. In CVPR, 2024.
- Grounded text-to-image synthesis with attention refocusing. In CVPR, 2024.
- Imagenet large scale visual recognition challenge. IJCV, 2015.
- Background matting: The world is your green screen. In CVPR, 2020.
- One-trimap video matting. In ECCV, 2022.
- Deep automatic portrait matting. In ECCV, 2016.
- Semantic image matting. In CVPR, 2021a.
- Deep video matting via spatio-temporal alignment and aggregation. In CVPR, 2021b.
- Human instance matting via mutual guidance and multi-instance refinement. In CVPR, 2022.
- Ultrahigh resolution image/video matting with spatio-temporal sparsity. In CVPR, 2023.
- Attention is all you need. NeurIPS, 30, 2017.
- Video matting via consistency-regularized graph neural networks. In ICCV, 2021.
- Video object matting via hierarchical space-time semantic guidance. In WACV, 2023.
- Deep image matting. In CVPR, 2017.
- Associating objects with transformers for video object segmentation. NeurIPS, 2021.
- Mask guided matting via progressive refinement network. In CVPR, 2021.
- Attention-guided temporally coherent video object matting. In ACM MM, 2021.