Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Semantic RGB-D Image Synthesis (2308.11356v2)

Published 22 Aug 2023 in cs.CV and cs.AI

Abstract: Collecting diverse sets of training images for RGB-D semantic image segmentation is not always possible. In particular, when robots need to operate in privacy-sensitive areas like homes, the collection is often limited to a small set of locations. As a consequence, the annotated images lack diversity in appearance and approaches for RGB-D semantic image segmentation tend to overfit the training data. In this paper, we thus introduce semantic RGB-D image synthesis to address this problem. It requires synthesising a realistic-looking RGB-D image for a given semantic label map. Current approaches, however, are uni-modal and cannot cope with multi-modal data. Indeed, we show that extending uni-modal approaches to multi-modal data does not perform well. In this paper, we therefore propose a generator for multi-modal data that separates modal-independent information of the semantic layout from the modal-dependent information that is needed to generate an RGB and a depth image, respectively. Furthermore, we propose a discriminator that ensures semantic consistency between the label maps and the generated images and perceptual similarity between the real and generated images. Our comprehensive experiments demonstrate that the proposed method outperforms previous uni-modal methods by a large margin and that the accuracy of an approach for RGB-D semantic segmentation can be significantly improved by mixing real and generated images during training.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Wasserstein gan, 2017.
  2. The cityscapes dataset for semantic urban scene understanding. In CVPR, 2016.
  3. Depth map prediction from a single image using a multi-scale deep network. In NeurIPS, 2014.
  4. Geometry-consistent generative adversarial networks for one-sided unsupervised domain mapping. In CVPR, 2019.
  5. Ganfit: Generative adversarial network fitting for high fidelity 3d face reconstruction. In CVPR, 2019.
  6. Fast-ganfit: Generative adversarial network for high fidelity 3d face reconstruction. In TPAMI, 2021.
  7. Generative adversarial networks. In arXiv, 2014.
  8. Deep residual learning for image recognition. In CVPR, 2016.
  9. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In arXiv, 2017.
  10. Self-supervised 3d mesh reconstruction from single images. In CVPR, 2021.
  11. Image-to-image translation with conditional adversarial networks. In CVPR, 2017.
  12. Tsit: A simple and versatile framework for image-to-image translation. In ECCV, 2020.
  13. Perceptual losses for real-time style transfer and super-resolution. In ECCV, 2016.
  14. Imagenet classification with deep convolutional neural networks. In NeurIPS, 2012.
  15. Least squares generative adversarial networks. In ICCV, 2017.
  16. Conditional generative adversarial nets. In arXiv, 2014.
  17. Do 2d gans know 3d shape? unsupervised 3d shape reconstruction from 2d image gans. In arXiv, 2020.
  18. Semantic image synthesis with spatially-adaptive normalization. In CVPR, 2019.
  19. Inverting generative adversarial renderer for face reconstruction. In CVPR, 2021.
  20. Unsupervised representation learning with deep convolutional generative adversarial networks. In arXiv, 2015.
  21. Vision transformers for dense prediction. In ICCV, 2021.
  22. U-net: Convolutional networks for biomedical image segmentation. In MICCAI, 2015.
  23. Improved techniques for training gans. In arXiv, 2016.
  24. You only need adversarial supervision for semantic image synthesis. In ICLR, 2021.
  25. Efficient rgb-d semantic segmentation for indoor scene analysis. In ICRA, 2021.
  26. Indoor segmentation and support inference from rgbd images. In ECCV, 2012.
  27. Sun rgb-d: A rgb-d scene understanding benchmark suite. In CVPR, 2015.
  28. OASIS: only adversarial supervision for semantic image synthesis. In IJCV, 2022.
  29. Dual attention gans for semantic image synthesis. In ACM MM, 2020.
  30. Local class-specific and global image-level generative adversarial networks for semantic-guided scene generation. In CVPR, 2020.
  31. Instance normalization: The missing ingredient for fast stylization. In arXiv, 2016.
  32. Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. In JMLR, 2008.
  33. High-resolution image synthesis and semantic manipulation with conditional gans. In CVPR, 2018.
  34. Transgaga: Geometry-aware unsupervised image-to-image translation. In CVPR, 2019.
  35. Resilient binary neural network. In AAAI, 2023.
  36. Ida-det: An information discrepancy-aware distillation for 1-bit detectors. In ECCV, 2022.
  37. Dilated residual networks. In CVPR, 2017.
  38. Pyramid scene parsing network. In CVPR, 2017.
  39. Generative adversarial frontal view to bird view synthesis. In 3DV, 2018.

Summary

We haven't generated a summary for this paper yet.