Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

3D Scene Geometry Estimation from 360$^\circ$ Imagery: A Survey (2401.09252v1)

Published 17 Jan 2024 in cs.CV, cs.AI, cs.GR, and cs.LG

Abstract: This paper provides a comprehensive survey on pioneer and state-of-the-art 3D scene geometry estimation methodologies based on single, two, or multiple images captured under the omnidirectional optics. We first revisit the basic concepts of the spherical camera model, and review the most common acquisition technologies and representation formats suitable for omnidirectional (also called 360$\circ$, spherical or panoramic) images and videos. We then survey monocular layout and depth inference approaches, highlighting the recent advances in learning-based solutions suited for spherical data. The classical stereo matching is then revised on the spherical domain, where methodologies for detecting and describing sparse and dense features become crucial. The stereo matching concepts are then extrapolated for multiple view camera setups, categorizing them among light fields, multi-view stereo, and structure from motion (or visual simultaneous localization and mapping). We also compile and discuss commonly adopted datasets and figures of merit indicated for each purpose and list recent results for completeness. We conclude this paper by pointing out current and future trends.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (229)
  1. Self-supervised Learning of Depth and Camera Motion from 360∘{}^{\circ}start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT Videos. volume 11364, pages 53–68. Asian Conference on Computer Vision, 2018.
  2. M. Schönbein and A. Geiger. Omnidirectional 3D reconstruction in augmented Manhattan worlds. In IEEE International Conference on Intelligent Robots and Systems, 2014.
  3. 3D Reconstruction of Urban Environments Based on Fisheye Stereovision. In IEEE International Conference on Signal Image Technology and Internet Based Systems, pages 36–41, 2012.
  4. Corners for layout: End-to-end layout recovery from 360 images. IEEE Robotics and Automation Letters, pages 1–1, 2020.
  5. Improving spherical photogrammetry using 360∘{}^{\circ}start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT OMNI-Cameras: Use cases and new applications. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 42(2):331–337, 2018.
  6. H. Kim and A. Hilton. Block world reconstruction from spherical stereo image pairs. Computer Vision and Image Understanding, 139:104–121, 2015.
  7. Dense 3D reconstruction from two spherical images via optical flow-based equirectangular epipolar rectification. In IEEE International Conference on Imaging Systems and Techniques, pages 140–145, 2016.
  8. Optical Flow-Based Epipolar Estimation of Spherical Image Pairs for 3D Reconstruction. SICE Journal of Control, Measurement, and System Integration, 10(5):476–485, 2017.
  9. Dense 3d point cloud generation from multiple high-resolution spherical images. In International Conference on Virtual Reality, Archaeology and Cultural Heritage, pages 17–24, 2011.
  10. C. C. Gava and D. Stricker. SPHERA: A unifying structure from motion framework for central projection cameras. International Conference on Computer Vision Theory and Applications, 3:285–293, 2015.
  11. Mobile mapping and visualization of indoor structures to simplify scene understanding and location awareness. In European Conference on Computer Vision, pages 130–145, 2016.
  12. H. Kim and A. Hilton. Planar urban scene reconstruction from spherical images using facade alignment. IEEE Image, Video, and Multidimensional Signal Processing Workshop, pages 1–4, 2013.
  13. Down to Earth: Using Semantics for Robust Hypothesis Selection for the Five-Point Algorithm. In German Conference on Pattern Recognition, volume 10496, pages 389–400, 2017.
  14. Visual Distortions in 360-degree Videos. IEEE Transactions on Circuits and Systems for Video Technology, 30(8):2524–2537, 2020.
  15. Stacked Omnistereo for virtual reality with six degrees of freedom. In IEEE Visual Communications and Image Processing, pages 1–4, 2017.
  16. H. Kim and A. Hilton. 3D Scene Reconstruction from Multiple Spherical Stereo Pairs. International Journal of Computer Vision, 104(1):94–116, 2013.
  17. N. Silberman and R. Fergus. Indoor scene segmentation using a structured light sensor. In IEEE International Conference on Computer Vision Workshops, pages 601–608, 2011.
  18. Rapid free-space mapping from a single omnidirectional camera. In IEEE European Conference on Mobile Robots, pages 1–8, 2015.
  19. H. Kim and A. Hilton. Environment modelling using spherical stereo imaging. In International Conference on Computer Vision Workshops, pages 1534–1541, 2009.
  20. A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms. In IEEE Conference on Computer Vision and Pattern Recognition, volume 1, pages 519–528, 2006.
  21. A survey of structure from motion. Acta Numerica, pages 305–364, 2017.
  22. R. A. Hamzah and H. Ibrahim. Literature survey on stereo vision disparity map algorithms. Journal of Sensors, 2016:1–23, 2016.
  23. D. Kumari and K. Kaur. A survey on stereo matching techniques for 3d vision in image processing. International Journal of Engineering and Manufacturing, 6(4):40–49, 2016.
  24. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. IEEE Workshop on Stereo and Multi-Baseline Vision, 47(1):131–140, 2001.
  25. Building rome in a day. Communications of the ACM, 54(10):105–112, 2011.
  26. Detailed real-time urban 3d reconstruction from video. International Journal of Computer Vision, 78(2-3):143–167, 2008.
  27. All-around depth from small motion with a spherical panoramic camera. In European Conference on Computer Vision, pages 156–172, 2016.
  28. 6-DoF VR videos with a single 360-camera. In IEEE Virtual Reality, pages 37–44, 2017.
  29. On the Performance of DIBR Methods When Using Depth Maps from State-of-the-art Stereo Matching Algorithms. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), number Vi, pages 2272–2276. IEEE, 2019.
  30. Distortion-Aware Convolutional Filters for Dense Prediction in Panoramic Images. European Conference on Computer Vision, pages 732–750, 2018.
  31. D. D. Rees. Panoramic television viewing system, 1970. US Patent No. 3,505,465.
  32. SweepNet: Wide-baseline Omnidirectional Depth Estimation. In International Conference on Robotics and Automation, pages 6073–6079, 2019.
  33. Depth Estimation from Stereoscopic 360-Degree Video. IEEE International Conference on Image Processing, pages 2945–2948, 2018.
  34. Perturbation Analysis of the 8-Point Algorithm: A Case Study for Wide FoV Cameras. In IEEE Conference on Computer Vision and Pattern Recognition, pages 11757–11766, 2019.
  35. Pano Popups: Indoor 3D Reconstruction with a Plane-Aware Network. In 2019 International Conference on 3D Vision (3DV), pages 76–84. IEEE, 2019.
  36. A. Pagani and D. Stricker. Structure from Motion using full spherical panoramic cameras. In IEEE International Conference on Computer Vision Workshops, pages 375–382, 2011.
  37. Spherical View Synthesis for Self-Supervised 360∘{}^{\circ}start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT Depth Estimation. Conference on 3D Vision, pages 690–699, 2019.
  38. Parallax360: Stereoscopic 360∘{}^{\circ}start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT Scene Representation for Head-Motion Parallax. IEEE Transactions on Visualization and Computer Graphics, 24(4):1545–1553, 2018.
  39. Motion parallax for 360∘{}^{\circ}start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT RGBD video. IEEE Transactions on Visualization and Computer Graphics, 25(5):1817–1827, 2019.
  40. 3DoF+ 360 Video Location-Based Asymmetric Down-Sampling for View Synthesis to Immersive VR Video Streaming. Sensors, 18(9), 2018.
  41. Dense 3D Scene Reconstruction from Multiple Spherical Images for 3-DoF+ VR Applications. In IEEE Conference on Virtual Reality and 3D User Interfaces, pages 9–18, 2019.
  42. Standardization status of immersive video coding. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 9(1):5–17, 2019.
  43. Tangent images for mitigating spherical distortion. In IEEE Conference on Computer Vision and Pattern Recognition, 2020.
  44. Scale invariant feature transform on the sphere: Theory and applications. International Journal of Computer Vision, 98(2):217–241, 2012.
  45. Y.-C. Su and K. Grauman. Learning Spherical Convolution for Fast Features from 360\\\backslash\textdegree Imagery. In Conference on Neural Information Processing Systems, pages 529–539. 2017.
  46. Panoramic imaging - A review. Computers and Graphics (Pergamon), 27(3):435–445, 2003.
  47. A state-of-the-art review on mapping and localization of mobile robots using omnidirectional vision sensors. Journal of Sensors, 2017:1–20, 2017.
  48. State-of-the-art in automatic 3d reconstruction of structured indoor environments. Computer Graphics Forum, 39(2), 2020.
  49. Manhattan room layout reconstruction from a single 360∘{}^{\circ}start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT image: A comparative study of state-of-the-art methods. International Journal of Computer Vision, pages 1–22, 2021.
  50. A review of techniques for 3d reconstruction of indoor environments. ISPRS International Journal of Geo-Information, 9(5):330, 2020.
  51. Deep learning-based monocular depth estimation methods - a state-of-the-art review. Sensors, 20(8):2272, 2020.
  52. A. Bhoi. Monocular depth estimation: A survey. CoRR, 2019.
  53. A taxonomy and evaluation of dense light field depth estimation algorithms. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 1795–1812, 2017.
  54. Multi-view stereo: A tutorial. Foundations and Trends® in Computer Graphics and Vision, 9(1-2):1–148, 2015.
  55. Visual simultaneous localization and mapping: a survey. Artificial Intelligence Review, 43(1):55–81, 2012.
  56. Visual SLAM and structure from motion in dynamic environments. ACM Computing Surveys, 51(2):1–36, 2018.
  57. S. Li and K. Fukumori. Spherical stereo for the construction of immersive vr environment. In IEEE Virtual Reality, pages 217–222, 2005.
  58. S. Li. Binocular spherical stereo. IEEE Transactions on Intelligent Transportation Systems, 9(4):589–600, 2008.
  59. Epipolar Geometry Via Rectification of Spherical Images. In Computer Vision/Computer Graphics Collaboration Techniques, volume 4418, pages 461–471. Springer Berlin Heidelberg, 2007.
  60. H. Guan and W. A. P. Smith. Structure-From-Motion in Spherical Video Using the von Mises-Fisher Distribution. IEEE Transactions on Image Processing, 26(2):711–723, 2017.
  61. Spherical light fields. British Machine Vision Conference, (67.1-67.12), 2014.
  62. Two-and three-view geometry for spherical cameras. Workshop on Omnidirectional Vision, Camera Networks and Non-classical Cameras, 105:29–34, 2005.
  63. R. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge, 2003.
  64. 3D reconstruction of structures using spherical cameras with small motion. In International Conference on Control, Automation and Systems, pages 117–122, 2016.
  65. Optimal Essential Matrix Estimation via Inlier-Set Maximization. In European Conference on Computer Vision, number PART 1, pages 111–126. 2014.
  66. R. Hartley. In defense of the eight-point algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(6):580–593, 1997.
  67. D. Nistér. An efficient solution to the five-point relative pose problem. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(6):756–770, 2004.
  68. H. C. Longuet-Higgins. A computer algorithm for reconstructing a scene from two projections. In Readings in Computer Vision, pages 61 – 62. Morgan Kaufmann, 1987.
  69. Direct linear transformation from comparator coordinates into object space in close-range photogrammetry. In Proceedings of the ASP Symposium on Close-Range Photogrammetry, pages 1–18, 1971.
  70. Directions of motion fields are hardly ever ambiguous. International Journal of Computer Vision, 26(1):5–24, 1998.
  71. J. D. Adarve and R. Mahony. Spherepix: A data structure for spherical image processing. IEEE Robotics and Automation Letters, 2(2):483–490, 2017.
  72. S. K. Nayar. Catadioptric Omnidirectional Camera*. In Conference on Computer Vision and Pattern Recognition, pages 482–488, 1997.
  73. Panoramic Imaging: Sensor-Line Cameras and Laser Range-Finders. Wiley, 2008.
  74. Fusing optical flow and stereo in a spherical depth panorama using a single-camera folded catadioptric rig. In IEEE International Conference on Robotics and Automation, pages 3092–3097, 2011.
  75. Panoramic Stereo Videos with a Single Camera. In IEEE Conference on Computer Vision and Pattern Recognition, pages 3755–3763, 2016.
  76. Jump: Virtual Reality Video. ACM Trans. Graph. Article, 3516(1312):978–1, 2016.
  77. Y. Shan and S. Li. Descriptor Matching for a Discrete Spherical Image With a Convolutional Neural Network. IEEE Access, 6:20748–20755, 2018.
  78. Automatic spherical panorama generation with two fisheye images. In World Congress on Intelligent Control and Automation, 2008.
  79. Image stitching for dual fisheye cameras. In IEEE International Conference on Image Processing, pages 3164–3168, 2018.
  80. Deep360Up: A Deep Learning-Based Approach for Automatic VR Image Upright Adjustment. In IEEE Conference on Virtual Reality and 3D User Interfaces, pages 1–8, 2019.
  81. SphereNet: Learning spherical representations for detection and classification in omnidirectional images. European Conference on Computer Vision, pages 525–541, 2018.
  82. SPHORB: A Fast and Robust Binary Feature on the Sphere. International Journal of Computer Vision, 113(2):143–159, 2014.
  83. Dense Scene Reconstruction from Spherical Light Fields. In IEEE International Conference on Image Processing, pages 4178–4182, 2018.
  84. Recognizing scene viewpoint using panoramic place representation. In IEEE Conference on Computer Vision and Pattern Recognition, pages 2695–2702, 2012.
  85. Local Moebius transformations applied to omnidirectional images. Computers & Graphics, 68:77–83, 2017.
  86. Weighted-to-Spherically-Uniform Quality Evaluation for Omnidirectional Video. IEEE Signal Processing Letters, 24(9):1–1, 2017.
  87. Indoor Depth Estimation from Single Spherical Images. In IEEE International Conference on Image Processing, pages 2935–2939, 2018.
  88. Benefit of large field-of-view cameras for visual odometry. IEEE International Conference on Robotics and Automation, pages 801–808, 2016.
  89. Freely Explore the Scene with 360° Field of View. In IEEE Conference on Virtual Reality and 3D User Interfaces, pages 888–889, 2019.
  90. Im2Pano3D: Extrapolating 360° Structure and Semantics Beyond the Field of View. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, volume 1, pages 3847–3856, 2018.
  91. H. Guan and W. A. P. Smith. BRISKS: Binary Features for Spherical Images on a Geodesic Grid. In IEEE Conference on Computer Vision and Pattern Recognition, pages 4886–4894, 2017.
  92. M. Eder and J.-M. Frahm. Convolutions on Spherical Images. IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019.
  93. Spherephd: Applying cnns on 360° images with non-euclidean spherical polyhedron representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–1, 2020.
  94. Deepsphere: a graph-based spherical cnn. In International Conference on Learning Representations, 2020.
  95. Dula-net: A dual-projection network for estimating room layouts from a single RGB panorama. In IEEE Conference on Computer Vision and Pattern Recognition, pages 3363–3372, 2019.
  96. Learning depth from single monocular images. In Advances in Peural Information Processing Systems, pages 1161–1168, 2006.
  97. Depth map prediction from a single image using a multi-scale deep network. In International Conference on Neural Information Processing Systems, pages 2366–2374, 2014.
  98. J. Košecká and W. Zhang. Video compass. In European Conference on Computer Vision, pages 476–490, 2002.
  99. A. Mallya and S. Lazebnik. Learning informative edge maps for indoor scene layout prediction. In IEEE International Conference on Computer Vision, pages 936–944, 2015.
  100. Eliminating the blind spot: Adapting 3D object detection and monocular depth estimation to 360∘{}^{\circ}start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT Panoramic Imagery. European Conference on Computer Vision, pages 812–830, 2018.
  101. Interactive construction of 3d models from panoramic mosaics. In IEEE Conference on Computer Vision and Pattern Recognition, pages 427–433, 1998.
  102. H. Yang and H. Zhang. Modeling room structure from indoor panorama. In ACM SIGGRAPH International Conference on Virtual-Reality Continuum and Its Applications in Industry, pages 47–55, 2014.
  103. H. Jia and S. Li. Estimating structure of indoor scene from a single full-view image. In IEEE International Conference on Robotics and Automation, pages 4851–4858, 2015.
  104. Room reconstruction from a single spherical image by higher-order energy minimization. In International Conference on Pattern Recognition, pages 1768–1773, 2016.
  105. Omnidirectional image capture on mobile devices for fast automatic generation of 2.5d indoor maps. In IEEE Winter Conference on Applications of Computer Vision, pages 1–9, 2016.
  106. Pano2cad: Room layout from a single panorama image. In IEEE Winter Conference on Applications of Computer Vision, pages 354–362, 2017.
  107. Layouts from panoramic images with geometry and deep learning. IEEE Robotics and Automation Letters, 3(4):3153–3160, 2018.
  108. PanoRoom: From the Sphere to the 3D Layout. pages 1–6, 2018.
  109. LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2051–2059, 2018.
  110. HorizonNet: Learning Room Layout with 1D Representation and Pano Stretch Data Augmentation. pages 1047–1056, 2019.
  111. AtlantaNet: Inferring the 3D indoor layout from a single 360 image beyond the Manhattan world assumption. In European Conference on Computer Vision, 2020.
  112. Single-shot cuboids: Geodesics-based end-to-end manhattan aligned layout estimation from spherical panoramas, 2021.
  113. H. Yang and H. Zhang. Efficient 3D Room Shape Recovery from a Single Panorama. In IEEE Conference on Computer Vision and Pattern Recognition, volume 2016-Decem, pages 5422–5430, 2016.
  114. Automatic 3D Indoor Scene Modeling from Single Panorama. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, page 5430. IEEE, 2018.
  115. Hohonet: 360 indoor holistic understanding with latent horizontal features. pages 2573–2582, 2021.
  116. Geometric structure based and regularized depth estimation from 360 indoor imagery. In IEEE Conference on Computer Vision and Pattern Recognition, pages 889–898, 2020.
  117. Led2-net: Monocular 360deg layout estimation via differentiable depth rendering. pages 12956–12965, 2021.
  118. OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas. In European Conference on Computer Vision, pages 453–471. 2018.
  119. Bifuse: Monocular 360 depth estimation via bi-projection fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  120. Unifuse: Unidirectional fusion for 360∘{}^{\circ}start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT panorama depth estimation. IEEE Robotics and Automation Letters, 6(2):1519–1526, 2021.
  121. You only look once: Unified, real-time object detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 779–788, 2016.
  122. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention, pages 234–241, 2015.
  123. Real-Time Panoramic Depth Maps from Omni-directional Stereo Images for 6 DoF Videos in Virtual Reality. IEEE Conference on Virtual Reality and 3D User Interfaces, pages 405–412, 2019.
  124. Omnidirectional CNN for visual place recognition and navigation. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 2341–2348. IEEE, 2018.
  125. Circular convolutional neural networks for panoramic images and laser data. In 2019 IEEE Intelligent Vehicles Symposium (IV), pages 653–660. IEEE, 2019.
  126. Spherical cnns. International Conference on Learning Representations, 2018.
  127. Learning so(3) equivariant representations with spherical cnns. In European Conference on Computer Vision, 2018.
  128. Spherical cnns on unstructured grids. In International Conference on Learning Representations, 2019.
  129. Spherephd: Applying cnns on a spherical polyhedron representation of 360 images. In IEEE Conference on Computer Vision and Pattern Recognition, pages 9181–9189, 2019.
  130. Spherical u-net on cortical surfaces: Methods and applications. In International Conference on Information Processing in Medical Imaging, 2019.
  131. An intriguing failing of convolutional neural networks and the CoordConv solution. In Advances in Neural Information Processing Systems, pages 9605–9616, 2018.
  132. Effective convolutional neural network layers in flow estimation for omni-directional images. In Conference on 3D Vision, pages 671–680, 2019.
  133. Deformable convolutional networks. In IEEE International Conference on Computer Vision, pages 764–773, 2017.
  134. Saliency detection in 360° videos. In European Conference on Computer Vision, pages 504 – 520, 2018.
  135. P. Frossard and R. Khasanova. Graph-based classification of omnidirectional images. In IEEE International Conference on Computer Vision Workshops, pages 860–869, 2017.
  136. H. Yang and H. Zhang. Indoor structure understanding from single 360 cylindrical panoramic image. In International Conference on Computer-Aided Design and Computer Graphics, pages 421–422, 2013.
  137. PanoContext: A whole-room 3D context model for panoramic scene understanding. In European Conference on Computer Vision, 2014.
  138. Efficient graph-based image segmentation. International Journal of Computer Vision, 59(2):167–181, 2004.
  139. R. Cabral and Y. Furukawa. Piecewise planar and compact floorplan reconstruction from images. In IEEE Conference on Computer Vision and Pattern Recognition, pages 628–635, 2014.
  140. Gravity alignment for single panorama depth inference. In Conference on Graphics, Patterns and Images (SIBGRAPI), pages 1–8. IEEE, 2021.
  141. D. Eigen and R. Fergus. Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture. In IEEE International Conference on Computer Vision, pages 2650–2658, 2015.
  142. Attention is all you need. Advances in Neural Information Processing Systems, 30:5998–6008, 2017.
  143. R. Zhang. Making convolutional networks shift-invariant again. In International Conference on Machine Learning, pages 7324–7334, 2019.
  144. Unsupervised Monocular Depth Estimation with Left-Right Consistency. In Conference on Computer Vision and Pattern Recognition, pages 270–279, 2017.
  145. Joint 2D-3D-Semantic Data for Indoor Scene Understanding. 2017.
  146. C. Harris and M. Stephens. A Combined Corner and Edge Detector. Alvey Vision Conference, pages 23.1–23.6, 1988.
  147. D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91–110, 2004.
  148. J.-M. Morel and G. Yu. ASIFT: A New Framework for Fully Affine Invariant Image Comparison. SIAM Journal on Imaging Sciences, 2(2):438–469, 2009.
  149. Surf: Speeded up robust features. In European Conference on Computer Vision, pages 404–417, 2006.
  150. E. Rosten and T. Drummond. Machine learning for high-speed corner detection. In European Conference on Computer Vision, pages 430–443, 2006.
  151. Brief: Computing a local binary descriptor very fast. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(7):1281–1298, 2011.
  152. Brisk: Binary robust invariant scalable keypoints. In 2011 International Conference on Computer Vision, pages 2548–2555, 2011.
  153. Orb: An efficient alternative to sift or surf. In International Conference on Computer Vision, page 2564–2571, 2011.
  154. KAZE Features. In European Conference on Computer Vision, pages 214–227. Springer Berlin Heidelberg, 2012.
  155. Fast Explicit Diffusion for Accelerated Features in Nonlinear Scale Spaces. British Machine Vision Conference, pages 13.1–13.11, 2013.
  156. Local feature descriptor for image matching: A survey. IEEE Access, 7:6424–6434, 2019.
  157. Evaluation of Keypoint Extraction and Matching for Pose Estimation Using Pairs of Spherical Images. Conference on Graphics, Patterns and Images, pages 374–381, 2017.
  158. Corner detection in spherical images via the accelerated segment test on a geodesic grid. In International Symposium on Visual Computing, pages 407–415, 2013.
  159. Spherical fast corner detector. In IEEE International Conference on Mechatronics and Automation, pages 2597–2602, 2015.
  160. Superpoint: Self-supervised interest point detection and description. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 337–33712, 2018.
  161. Optical flow and scene flow estimation: A survey. Pattern Recognition, 114:107861, 2021.
  162. Optical flow-based video completion in spherical image sequences. In IEEE International Conference on Robotics and Biomimetics, pages 388–395, 2016.
  163. DeepFlow: Large displacement optical flow with deep matching. IEEE International Conference on Computer Vision, pages 1385–1392, 2013.
  164. Y. Mochizuki and A. Imiya. Featureless visual navigation using optical flow of omnidirectional image sequence. In IEEE International Conference on Simulation, Modeling, and Programming for Autonomous Robots, pages 307–318, 2008.
  165. Determining optical flow. Artificial Intelligence, 17(1):185 – 203, 1981.
  166. Multi-resolution motion estimation. In IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 4, pages 2873–2876 vol.4, 1997.
  167. Multiresolution motion estimation for omnidirectional images. In European Signal Processing Conference, pages 1–4, 2005.
  168. Y. Mochizuki and A. Imiya. Multiresolution optical flow computation of spherical images. In Computer Analysis of Images and Patterns, pages 348–355, 2011.
  169. Optical flow estimation from multichannel spherical image decomposition. Computer Vision and Image Understanding, 115(9):1263–1272, 2011.
  170. A phase-based framework for optical flow estimation on omnidirectional images. Signal, Image and Video Processing, 10:1–8, 2014.
  171. Optical flow and depth from motion for omnidirectional images using a tv-l1 variational framework on graphs. In IEEE International Conference on Image Processing, pages 1469–1472, 2009.
  172. A duality based approach for realtime tv-l1 optical flow. In Pattern Recognition, pages 214–223, 2007.
  173. E-cnn: Accurate spherical camera rotation estimation via uniformization of distorted optical flow fields. In IEEE International Conference on Acoustics, Speech and Signal Processing, pages 2232–2236, 2019.
  174. Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8934–8943, 2018.
  175. Flownet: Learning optical flow with convolutional networks. In IEEE International Conference on Computer Vision, pages 2758–2766, 2015.
  176. A. Ranjan and M. J. Black. Optical flow estimation using a spatial pyramid network. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016.
  177. OmniFlowNet: a Perspective Neural Network Adaptation for Optical Flow Estimation in Omnidirectional Images. In International Conference on Pattern Recognition, 2021.
  178. Revisiting optical flow estimation in 360 videos. In IEEE International Conference on Pattern Recognition, pages 8196–8203, 2021.
  179. Liteflownet: A lightweight convolutional neural network for optical flow estimation. In IEEEConference on Computer Vision and Pattern Recognition, pages 8981–8989, 2018.
  180. Z. Arican and P. Frossard. Dense disparity estimation from omnidirectional images. In IEEE Conference on Advanced Video and Signal Based Surveillance, pages 399–404, 2007.
  181. Virtual reality with motion parallax by dense optical flow-based depth generation from two spherical images. In IEEE/SICE International Symposium on System Integration, pages 887–892, 2017.
  182. Room layout estimation with object and material attributes information using a spherical camera. In Conference on 3D Vision, pages 519–527, 2016.
  183. 3d room geometry reconstruction using audio-visual sensors. In Conference on 3D Vision, pages 621–629, 2017.
  184. 360sd-net: 360° stereo depth estimation with learnable cost volume. IEEE International Conference on Robotics and Automation, 2020.
  185. M. Roxas and T. Oishi. Variational fisheye stereo. IEEE Robotics and Automation Letters, 5(2):1303–1310, 2020.
  186. B. D. Lucas and T. Kanade. An iterative image registration technique with an application to stereo vision. In International Joint Conference on Artificial Intelligence, pages 674–679, 1981.
  187. J. R. Shewchuk. Reprint of: Delaunay refinement algorithms for triangular mesh generation. Computational Geometry: Theory and Applications, 2014.
  188. R. J. Radke. Computer Vision for Visual Effects. Cambridge, 2012.
  189. V. Kolmogorov and R. Zabih. Multi-camera scene reconstruction via graph cuts. In European Conference on Computer Vision, pages 82–96, 2002.
  190. Depth based view blending in view synthesis reference software (vsrs), 2015. ISO/IEC JTC1/SC29/WG11 MPEG2015, M37232, Geneva, Switzerland.
  191. A system for acquiring, processing, and rendering panoramic light field stills for virtual reality. ACM Transactions on Graphics (TOG), 37(6):1–15, 2018.
  192. Megaparallax: Casual 360 panoramas with motion parallax. IEEE Transactions on Visualization and Computer Graphics, 25(5):1828–1835, 2019.
  193. Omniphotos: Casual 360∘superscript360360^{\circ}360 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT vr photography. ACM Transactions on Graphics, 39(6):1–12, 2020.
  194. H. Kim and A. Hilton. 3d modelling of static environments using multiple spherical stereo. In Kiriakos N. Kutulakos, editor, European Conference on Computer Vision, pages 169–183, 2010.
  195. OmniMVS: End-to-End Learning for Omnidirectional Stereo Matching. In IEEE International Conference on Computer Vision, 2019.
  196. End-to-end learning for omnidirectional stereo matching with uncertainty prior. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020.
  197. From google street view to 3d city models. In International Conference on Computer Vision Workshops, pages 2188–2195, 2009.
  198. B. Micusik and J. Kosecka. Piecewise planar city 3d modeling from street view panoramic sequences. In IEEE Conference on Computer Vision and Pattern Recognition, pages 2906–2912, 2009.
  199. 3d modelling with the samsung gear 360. In Virtual Reconstruction and Visualization of Complex Architectures, volume 42, pages 85–90. International Society for Photogrammetry and Remote Sensing, 2017.
  200. A variational framework for structure from motion in omnidirectional image sequences. Journal of Mathematical Imaging and Vision, 41(3):182–193, 2011.
  201. Large-scale direct SLAM for omnidirectional cameras. International Conference on Intelligent Robots and Systems, pages 141–148, 2015.
  202. Recovering 3d indoor floor plans by exploiting low-cost spherical photography. In Pacific Conference on Computer Graphics and Applications, PG ’18, pages 45–48, 2018.
  203. Recovering 3d existing-conditions of indoor structures from spherical images. Computers and Graphics, 77:16 – 29, 2018.
  204. Light field rendering. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, pages 31–42, 1996.
  205. Epipolar-plane image analysis: An approach to determining structure from motion. International Journal of Computer Vision, 1(1):7–55, 1987.
  206. Marc Levoy. Light fields and computational imaging. Computer, 39(8):46–55, 2006.
  207. Axial-cones: Modeling spherical catadioptric cameras for wide-angle light field rendering. ACM Trans. Graph., 29(6):172, 2010.
  208. Panorama light-field imaging. In Computer Graphics Forum, volume 33, pages 43–52. Wiley Online Library, 2014.
  209. Coordinate transformation on a sphere using conformal mapping. Monthly Weather Review, 127(12):2733–2740, 1999.
  210. Light field networks: Neural scene representations with single-evaluation rendering. 2021.
  211. Fast bilateral-space stereo for synthetic defocus. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4466–4474, 2015.
  212. Openvslam: a versatile visual slam framework. In Proceedings of the 27th ACM International Conference on Multimedia, pages 2292–2295, 2019.
  213. A method for registration of 3-d shapes. volume 14, pages 239–256, 1992.
  214. S. Shen. Accurate multiple view 3d reconstruction using patch-based stereo for large-scale scenes. IEEE Transactions on Image Processing, 22(5):1901–1914, 2013.
  215. Y. Furukawa and J. Ponce. Accurate, Dense, and Robust Multi-View Stereopsis. In IEEE Conference on Computer Vision and Pattern Recognition, volume 32, pages 1–8, 2007.
  216. A. T. Wood. Simulation of the von mises fisher distribution. Communications in Statistics - Simulation and Computation, 23(1):157–164, 1994.
  217. Domain transform for edge-aware image and video processing. ACM Trans. Graph., 30(4):69:1–69:12, 2011.
  218. Slic superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11):2274–2282, 2012.
  219. F. Kangni and R. Laganiere. Orientation and pose recovery from spherical panoramas. In IEEE International Conference on Computer Vision, pages 1–8, 2007.
  220. Spherical Superpixel Segmentation. IEEE Transactions on Multimedia, 20(6):1406–1417, 2018.
  221. Generalized shortest path-based superpixels for accurate segmentation of spherical images. pages 2650–2656, 2021.
  222. Fast and accurate superpixel algorithms for 360∘{}^{\circ}start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT images. Signal Processing, 189:108277, 2021.
  223. Omnislam: Omnidirectional localization and dense mapping for wide-baseline multi-camera systems. pages 559–566, 2020.
  224. Structured3d: A large photo-realistic dataset for structured 3d modeling. pages 519–535, 2020.
  225. Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision, 2017.
  226. Semantic scene completion from a single depth image. IEEE Conference on Computer Vision and Pattern Recognition, 2017.
  227. L. Tchapmi and D. Huber. The SUMO challenge, 2019.
  228. Blender Online Community. Blender - A 3D Modelling and Rendering Package. Blender Foundation, 2020.
  229. M. Menze and A. Geiger. Object scene flow for autonomous vehicles. In IEEE Conference on Computer Vision and Pattern Recognition, pages 3061–3070, 2015.
Citations (29)

Summary

  • The paper presents methodologies for reconstructing 3D scenes from spherical imagery using monocular, stereo, and multi-view approaches integrated with deep learning.
  • It identifies challenges like distortion in equirectangular projections and high computational demands in multi-view systems.
  • The survey highlights a shift towards learning-based methods that enhance scene reconstruction for immersive VR/AR experiences and reliable autonomous navigation.

Overview of 3D Scene Geometry Estimation from 360° Imagery

360° cameras have revolutionized the way we capture and experience the world. They allow for immersive experiences in virtual reality (VR), augment reality (AR) applications, and have become valuable tools in various industries from real estate to autonomous driving. Understanding the precise 3D structure of captured scenes is crucial for these technologies to advance. This overview sheds light on methodologies for estimating the 3D geometry of scenes from spherical, or 360°, imagery.

Fundamental Concepts and Challenges

Before diving into methods for 3D geometry estimation, it's essential to understand the basics of spherical imaging. Spherical cameras capture light from all directions, encoding it into a 2D format such as the equirectangular projection. However, converting a spherical scene onto a flat image introduces distortions, especially near image poles. These distortions pose challenges for applying traditional image processing and computer vision algorithms, which are generally designed for planar (perspective) images.

Techniques for 3D Scene Reconstruction

Methods to reconstruct 3D scenes from spherical images fall into monocular (single image), stereoscopic (image pairs), and multi-view (multiple images) approaches. For monocular scenes, modern techniques rely heavily on deep learning, which has shown substantial progress despite the challenge of training models with limited data. When two images are available, the disparity between views can be assessed to deduce depth information, addressing occlusions more effectively. Multi-view setups are even more robust, combining the benefits of monocular and stereo methods, promising accurate reconstruction of entire scenes. They are, however, more computationally demanding and may require complex setups like array cameras or sequential captures from a moving camera.

Current Trends and Benchmarks

The field is witnessing a trend towards learning-based methodologies, fueled by the need to extract depth from complex and occlusion-heavy scenes. These methods must be trained on large, varied datasets that contain annotated depth information. There's also a push for standardizing evaluation metrics to fairly compare different algorithms and gauge their performance reliably.

State-of-the-Art Performance

Evaluating the latest techniques reveals that state-of-the-art algorithms are becoming adept at modeling room layouts from single images, though complex scenes with varied depth ranges remain challenging. Current methods excel in predefined environments but require further refinement to handle outdoor scenes and large-scale, diverse datasets.

Concluding Thoughts

360° image-based 3D scene reconstruction is a dynamically growing field with a strong shift towards deep learning solutions. While current methods show impressive potential, challenges like dealing with distortions, demanding computational requirements, and the necessity for extensive data remain. As VR and AR technologies continue to evolve, so will the algorithms that underpin our understanding of the captured scenes, driving innovations across multiple domains.