Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PanoViT: Vision Transformer for Room Layout Estimation from a Single Panoramic Image (2212.12156v1)

Published 23 Dec 2022 in cs.CV

Abstract: In this paper, we propose PanoViT, a panorama vision transformer to estimate the room layout from a single panoramic image. Compared to CNN models, our PanoViT is more proficient in learning global information from the panoramic image for the estimation of complex room layouts. Considering the difference between a perspective image and an equirectangular image, we design a novel recurrent position embedding and a patch sampling method for the processing of panoramic images. In addition to extracting global information, PanoViT also includes a frequency-domain edge enhancement module and a 3D loss to extract local geometric features in a panoramic image. Experimental results on several datasets demonstrate that our method outperforms state-of-the-art solutions in room layout prediction accuracy.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Weichao Shen (6 papers)
  2. Yuan Dong (30 papers)
  3. Zonghao Chen (10 papers)
  4. Zhengyi Zhao (12 papers)
  5. Yang Gao (762 papers)
  6. Zhu Liu (86 papers)

Summary

We haven't generated a summary for this paper yet.