Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

End-to-End Human Instance Matting (2403.01510v1)

Published 3 Mar 2024 in cs.CV and cs.AI

Abstract: Human instance matting aims to estimate an alpha matte for each human instance in an image, which is extremely challenging and has rarely been studied so far. Despite some efforts to use instance segmentation to generate a trimap for each instance and apply trimap-based matting methods, the resulting alpha mattes are often inaccurate due to inaccurate segmentation. In addition, this approach is computationally inefficient due to multiple executions of the matting method. To address these problems, this paper proposes a novel End-to-End Human Instance Matting (E2E-HIM) framework for simultaneous multiple instance matting in a more efficient manner. Specifically, a general perception network first extracts image features and decodes instance contexts into latent codes. Then, a united guidance network exploits spatial attention and semantics embedding to generate united semantics guidance, which encodes the locations and semantic correspondences of all instances. Finally, an instance matting network decodes the image features and united semantics guidance to predict all instance-level alpha mattes. In addition, we construct a large-scale human instance matting dataset (HIM-100K) comprising over 100,000 human images with instance alpha matte labels. Experiments on HIM-100K demonstrate the proposed E2E-HIM outperforms the existing methods on human instance matting with 50% lower errors and 5X faster speed (6 instances in a 640X640 image). Experiments on the PPM-100, RWP-636, and P3M datasets demonstrate that E2E-HIM also achieves competitive performance on traditional human matting.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (75)
  1. G. Hu and J. J. Clark, “Instance Segmentation Based Semantic Matting for Compositing Applications,” in Conference on Computer and Robot Vision.   IEEE, 2019, pp. 135–142.
  2. K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN,” TPAMI, vol. 42, no. 2, pp. 386–397, 2020.
  3. N. Xu, B. Price, S. Cohen, and T. Huang, “Deep Image Matting,” in CVPR, 2017.
  4. Y. Sun, C.-K. Tang, and Y.-W. Tai, “Human Instance Matting via Mutual Guidance and Multi-Instance Refinement,” in CVPR, 2022.
  5. M. Forte and F. Pitié, “F, B, Alpha Matting,” arXiv preprint arXiv:2003.07711, 2020.
  6. B. Zhu, Y. Chen, J. Wang, S. Liu, B. Zhang, and M. Tang, “Fast Deep Matting for Portrait Animation on Mobile Phone,” in ACM MM, 2017.
  7. Q. Chen, T. Ge, Y. Xu, Z. Zhang, X. Yang, and K. Gai, “Semantic Human Matting,” in ACM MM, 2018.
  8. T. Wang, S. Liu, Y. Tian, K. Li, and M.-H. Yang, “Video Matting via Consistency-Regularized Graph Neural Networks,” in ICCV, 2021.
  9. Y. Zhang, C. Wang, M. Cui, P. Ren, X. Xie, X.-S. Hua, H. Bao, Q. Huang, and W. Xu, “Attention-Guided Temporally Coherent Video Object Matting,” in ACM MM, 2021.
  10. A. Levin, D. Lischinski, and Y. Weiss, “A Closed-Form Solution to Natural Image Matting,” TPAMI, vol. 30, no. 2, pp. 228–242, 2008.
  11. Q. Chen, D. Li, and C.-K. Tang, “KNN Matting,” TPAMI, vol. 35, no. 9, pp. 2175–2188, 2013.
  12. H. Lu, Y. Dai, C. Shen, and S. Xu, “Indices Matter: Learning to Index for Deep Image Matting,” in ICCV, 2019.
  13. Y. Li and H. Lu, “Natural Image Matting via Guided Contextual Attention,” in AAAI, 2020.
  14. Q. Yu, J. Zhang, H. Zhang, Y. Wang, Z. Lin, N. Xu, Y. Bai, and A. Yuille, “Mask Guided Matting via Progressive Refinement Network,” in CVPR, 2020, pp. 1154–1163.
  15. Z. Yu, X. Li, H. Huang, W. Zheng, and L. Chen, “Cascade Image Matting With Deformable Graph Refinement,” in ICCV, 2021.
  16. Y. Zhang, L. Gong, L. Fan, P. Ren, and W. Xu, “A Late Fusion CNN for Digital Matting,” in CVPR, 2019.
  17. Z. Ke, J. Sun, K. Li, Q. Yan, and R. W. Lau, “MODNet: Real-Time Trimap-Free Portrait Matting via Objective Decomposition,” in AAAI, 2022.
  18. J. Li, J. Zhang, S. J. Maybank, and D. Tao, “Bridging Composite and Real: Towards End-to-end Deep Image Matting,” IJCV, 2022.
  19. X. Shen, T. Xin, H. Gao, Z. Chao, and J. Jia, “Deep Automatic Portrait Matting,” in ECCV, 2016.
  20. J. Li, S. Ma, J. Zhang, and D. Tao, “Privacy-Preserving Portrait Matting,” in ACM MM, ser. MM ’21, 2021, p. 3501–3509.
  21. J. Ren, Y. Yao, B. Lei, M. Cui, and X. Xie, “Structure-Aware Flow Generation for Human Body Reshaping,” in CVPR, 2022, pp. 7754–7763.
  22. B. Chen, H. Fu, X. Chen, K. Zhou, and Y. Zheng, “Single-image Human-body Reshaping with Deep Neural Networks,” arXiv preprint arXiv:2203.10496, 2022.
  23. J. Li, C. Xiong, L. Liu, X. Shu, and S. Yan, “Deep Face Beautification,” in ACM MM, 2015.
  24. S. Velusamy, R. Parihar, R. Kini, and A. Rege, “FabSoften: Face Beautification via Dynamic Skin Smoothing, Guided Feathering, and Texture Restoration,” in CVPRW, 2020.
  25. T. Wei, D. Chen, W. Zhou, J. Liao, H. Zhao, W. Zhang, and N. Yu, “Improved image matting via real-time user clicks and uncertainty estimation,” in CVPR, 2021, pp. 15 374–15 383.
  26. S. D. Yang, B. Wang, W. Li, Y. Lin, and C. He, “Unified interactive image matting,” arXiv preprint arXiv:2205.08324, 2022.
  27. H. Ding, H. Zhang, C. Liu, and X. Jiang, “Deep interactive image matting with feature propagation,” IEEE TIP, vol. 31, pp. 2421–2432, 2022.
  28. R. Zhang, Z. Tian, C. Shen, M. You, and Y. Yan, “Mask Encoding for Single Shot Instance Segmentation,” in CVPR, 2020.
  29. X. Wang, R. Zhang, C. Shen, T. Kong, and L. Li, “Solo: A simple framework for instance segmentation,” TPAMI, vol. 44, no. 11, pp. 8587–8601, 2022.
  30. B. Dong, F. Zeng, T. Wang, X. Zhang, and Y. Wei, “SOLQ: Segmenting Objects by Learning Queries,” in NeurIPS, 2021.
  31. L. Ke, M. Danelljan, X. Li, Y.-W. Tai, C.-K. Tang, and F. Yu, “Mask transfiner for high-quality instance segmentation,” in CVPR, 2022.
  32. K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in CVPR, 2016.
  33. S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path Aggregation Network for Instance Segmentation,” in CVPR, 2018.
  34. D. Bolya, C. Zhou, F. Xiao, and Y. J. Lee, “YOLOACT: Real-Time Instance Segmentation,” in ICCV, 2019.
  35. X. Zhang, H. Li, F. Meng, Z. Song, and L. Xu, “Segmenting beyond the bounding box for instance segmentation,” IEEE TCSVT, vol. 32, no. 2, pp. 704–714, 2022.
  36. D. Bolya, C. Zhou, F. Xiao, and Y. J. Lee, “YOLACT++: Better Real-time Instance Segmentation,” TPAMI, vol. 44, no. 2, pp. 1108–1121, 2022.
  37. X. Chen, R. Girshick, K. He, and P. Dollár, “TensorMask: A Foundation for Dense Object Segmentation,” in ICCV, 2019.
  38. H. Chen, K. Sun, Z. Tian, C. Shen, Y. Huang, and Y. Yan, “BlendMask: Top-down meets bottom-up for instance segmentation,” in CVPR, 2020.
  39. Y. Sun, L. Su, S. Yuan, and H. Meng, “Danet: Dual-branch activation network for small object instance segmentation of ship images,” IEEE TCSVT, pp. 1–1, 2023.
  40. B. De Brabandere, D. Neven, and L. Van Gool, “Semantic Instance Segmentation with a Discriminative Loss,” arXiv preprint arXiv:1708.02551, 2017.
  41. N. Gao, Y. Shan, Y. Wang, X. Zhao, and K. Huang, “SSAP: Single-Shot Instance Segmentation With Affinity Pyramid,” in ICCV, 2017.
  42. Y. Fang, S. Yang, X. Wang, Y. Li, C. Fang, Y. Shan, B. Feng, and W. Liu, “QueryInst: Parallelly Supervised Mask Query for Instance Segmentation,” arXiv preprint arXiv:2105.01928, 2021.
  43. P. Sun, R. Zhang, Y. Jiang, T. Kong, C. Xu, W. Zhan, M. Tomizuka, L. Li, Z. Yuan, C. Wang et al., “Sparse R-CNN: End-to-End Object Detection with Learnable Proposals,” in CVPR, 2021.
  44. Y. Wang, Z. Xu, X. Wang, C. Shen, B. Cheng, H. Shen, and H. Xia, “End-to-End Video Instance Segmentation With Transformers,” in CVPR, 2021.
  45. B. Cheng, A. G. Schwing, and A. Kirillov, “Per-Pixel Classification is Not All You Need for Semantic Segmentation,” in NeurIPS, 2021.
  46. S. Lutz, K. Amplianitis, and A. Smolic, “AlphaGaN: Generative adversarial networks for natural image matting,” in BMVC, 2018.
  47. J. Tang, Y. Aksoy, C. Oztireli, M. Gross, and T. O. Aydin, “Learning-Based Sampling for Natural Image Matting,” in CVPR, 2019.
  48. Q. Liu, H. Xie, S. Zhang, B. Zhong, and R. Ji, “Long-Range Feature Propagating for Natural Image Matting,” in ACM MM, 2021.
  49. Y. Xu, B. Liu, Y. Quan, and H. Ji, “Unsupervised deep background matting using deep matte prior,” IEEE TCSVT, vol. 32, no. 7, pp. 4324–4337, 2022.
  50. F. Zhou, Y. Tian, and Z. Qi, “Attention transfer network for nature image matting,” IEEE TCSVT, vol. 31, no. 6, pp. 2192–2205, 2021.
  51. L. Hu, Y. Kong, J. Li, and X. Li, “Effective local-global transformer for natural image matting,” IEEE TCSVT, pp. 1–1, 2023.
  52. Y. Qiao, Y. Liu, X. Yang, D. Zhou, and X. Wei, “Attention-Guided Hierarchical Structure Aggregation for Image Matting,” in CVPR, 2020.
  53. Y. Zhou, L. Zhou, T. L. Lam, and Y. Xu, “Sampling propagation attention with trimap generation network for natural image matting,” IEEE TCSVT, pp. 1–1, 2023.
  54. B. Peng, M. Zhang, J. Lei, H. Fu, H. Shen, and Q. Huang, “Rgb-d human matting: A real-world benchmark dataset and a baseline method,” IEEE Transactions on Circuits and Systems for Video Technology, 2023.
  55. R. Girshick, “Fast R-CNN,” in ICCV, 2015.
  56. S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” NIPS, vol. 28, 2015.
  57. W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “SSD: Single Shot MultiBox Detector,” in ECCV.   Springer, 2016, pp. 21–37.
  58. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” in CVPR, 2016, pp. 779–788.
  59. N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-End Object Detection with Transformers,” in ECCV, 2020.
  60. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention Is All You Need,” NIPS, vol. 30, 2017.
  61. T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal Loss for Dense Object Detection,” TPAMI, vol. 42, no. 2, pp. 318–327, 2020.
  62. F. Milletari, N. Navab, and S.-A. Ahmadi, “V-net: Fully convolutional neural networks for volumetric medical image segmentation,” in 3DV, 2016.
  63. J. Zhao, J. Li, Y. Cheng, T. Sim, S. Yan, and J. Feng, “Understanding Humans in Crowded Scenes: Deep Nested Adversarial Learning and A New Benchmark for Multi-Human Parsing,” in ACM MM, 2018, p. 792–800.
  64. S. Contributors, “Supervisely person,” https://supervisely.com/, 2021.
  65. P. Contributors, “Pexels.com,” https://www.pexels.com, 2023.
  66. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “PyTorch: An Imperative Style, High-Performance Deep Learning Library,” in NeurIPS, 2019.
  67. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in CVPR, 2009.
  68. K. He, X. Zhang, S. Ren, and J. Sun, “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification,” in ICCV, 2015.
  69. I. Loshchilov and F. Hutter, “Decoupled Weight Decay Regularization,” in ICLR, 2019.
  70. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in ECCV, 2014.
  71. A. Kirillov, Y. Wu, K. He, and R. Girshick, “PointRend: Image segmentation as rendering,” in CVPR, 2020.
  72. Y. Fang, S. Yang, X. Wang, Y. Li, C. Fang, Y. Shan, B. Feng, and W. Liu, “Instances as Queries,” in CVPR, 2021.
  73. J. Liu, Y. Yao, W. Hou, M. Cui, X. Xie, C. Zhang, and X.-s. Hua, “Boosting Semantic Human Matting with Coarse Annotations,” in CVPR, 2020.
  74. Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin Transformer: Hierarchical Vision Transformer using Shifted Windows,” in ICCV, 2021.
  75. Q. Zhang, Y. Xu, J. Zhang, and D. Tao, “ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond,” arXiv preprint arXiv:2202.10108, 2022.
Citations (2)

Summary

We haven't generated a summary for this paper yet.