Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DiffCalib: Reformulating Monocular Camera Calibration as Diffusion-Based Dense Incident Map Generation (2405.15619v1)

Published 24 May 2024 in cs.CV

Abstract: Monocular camera calibration is a key precondition for numerous 3D vision applications. Despite considerable advancements, existing methods often hinge on specific assumptions and struggle to generalize across varied real-world scenarios, and the performance is limited by insufficient training data. Recently, diffusion models trained on expansive datasets have been confirmed to maintain the capability to generate diverse, high-quality images. This success suggests a strong potential of the models to effectively understand varied visual information. In this work, we leverage the comprehensive visual knowledge embedded in pre-trained diffusion models to enable more robust and accurate monocular camera intrinsic estimation. Specifically, we reformulate the problem of estimating the four degrees of freedom (4-DoF) of camera intrinsic parameters as a dense incident map generation task. The map details the angle of incidence for each pixel in the RGB image, and its format aligns well with the paradigm of diffusion models. The camera intrinsic then can be derived from the incident map with a simple non-learning RANSAC algorithm during inference. Moreover, to further enhance the performance, we jointly estimate a depth map to provide extra geometric information for the incident map estimation. Extensive experiments on multiple testing datasets demonstrate that our model achieves state-of-the-art performance, gaining up to a 40% reduction in prediction errors. Besides, the experiments also show that the precise camera intrinsic and depth maps estimated by our pipeline can greatly benefit practical applications such as 3D reconstruction from a single in-the-wild image.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. Objectron: A large scale dataset of object-centric videos in the wild with pose annotations. In CVPR.
  2. Cuneyt Akinlar and Cihan Topal. 2011. EDLines: A real-time line segment detector with a false detection control. Pattern Recognition Letters (2011).
  3. Estimating and exploiting the aleatoric uncertainty in surface normal estimation. In ICCV.
  4. Jonathan T. Barron and Jitendra Malik. 2020. Shape, Illumination, and Reflectance from Shading. arXiv preprint arXiv: 2010.03592 (2020).
  5. ARKitScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile RGB-D Data. In NeurIPS Datasets and Benchmarks Track.
  6. nuScenes: A multimodal dataset for autonomous driving. In CVPR.
  7. Shapenet: An information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015).
  8. BPnP: Further empowering end-to-end learning with back-propagatable geometric optimization. arXiv: 1909.06043 (2019).
  9. A Generalist Framework for Panoptic Segmentation of Images and Videos. arXiv preprint arXiv: 2210.06366 (2022).
  10. The cityscapes dataset for semantic urban scene understanding. In CVPR.
  11. James M Coughlan and Alan L Yuille. 1999. Manhattan world: Compass direction from a single image by bayesian inference. In ICCV.
  12. ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. In CVPR.
  13. Prafulla Dhariwal and Alex Nichol. 2021. Diffusion Models Beat GANs on Image Synthesis. arXiv preprint arXiv: 2105.05233 (2021).
  14. Martin A Fischler and Robert C Bolles. 1981. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 6 (1981), 381–395.
  15. GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image. arXiv preprint arXiv:2403.12013 (2024).
  16. Mve-a multi-view reconstruction environment.. In GCH.
  17. Vision meets robotics: The KITTI dataset. IJRR (2013).
  18. Generative Adversarial Networks. Commun. ACM (2014). https://doi.org/10.1145/3422622
  19. GP2C: Geometric projection parameter consensus for joint 3D pose and focal length estimation in the wild. In ICCV.
  20. Denoising Diffusion Probabilistic Models. arXiv:2006.11239 [cs.CV]
  21. A perceptual measure for deep single image camera calibration. In CVPR.
  22. Camera Self-Calibration Using Human Faces. In FG.
  23. Perspective Fields for Single Image Camera Calibration. In CVPR.
  24. Perspective Fields for Single Image Camera Calibration. (2022).
  25. Kasiopy. 2023. Multi-Resolution Noise for Diffusion Model Training. https://wandb.ai/johnowhitaker/multires_noise/reports/Multi-Resolution-Noise-for-Diffusion-Model-Training--VmlldzozNjYyOTU2?s=31. last accessed 17.11.2023.
  26. Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation. arXiv preprint arXiv: 2312.02145 (2023).
  27. Ctrl-C: Camera calibration transformer with line-classification. In ICCV.
  28. Neural geometric parser for single image camera calibration. In ECCV.
  29. Vision transformers for dense prediction. In Proceedings of the IEEE/CVF international conference on computer vision. 12179–12188.
  30. Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 3 (2022).
  31. Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding. In ICCV.
  32. High-resolution image synthesis with latent diffusion models. In CVPR. 10684–10695.
  33. Image Super-Resolution via Iterative Refinement. arXiv preprint arXiv: 2104.07636 (2021).
  34. A multi-view stereo benchmark with high-resolution images and multi-camera videos. In CVPR. 3260–3269.
  35. Indoor segmentation and support inference from RGBD images. In ECCV. 746–760.
  36. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020).
  37. A benchmark for the evaluation of RGBD SLAM systems. In IROS.
  38. Peter Sturm. 2005. Multi-view geometry for general camera models. In CVPR.
  39. Scalability in perception for autonomous driving: Waymo open dataset. In CVPR.
  40. LSD: A fast line segment detector with a false detection control. TPAMI (2008).
  41. Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images. arXiv preprint arXiv: 1804.01654 (2018).
  42. Learning Shape Priors for Single-View 3D Completion and Reconstruction. arXiv preprint arXiv: 1809.05068 (2018).
  43. Sun3D: A database of big spaces reconstructed using SFM and object labels. In ICCV.
  44. Diffusion Models Trained with Large Data Are Transferable Visual Models. arXiv preprint arXiv:2403.06090 (2024).
  45. Towards 3d scene reconstruction from locally scale-aligned monocular video depth. arXiv preprint arXiv:2202.01470 (2022).
  46. FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9310–9320.
  47. Metric3D: Towards zero-shot metric 3d prediction from a single image. In ICCV.
  48. Learning to Recover 3D Scene Shape from a Single Image. Computer Vision and Pattern Recognition (2020). https://doi.org/10.1109/CVPR46437.2021.00027
  49. Learning to Recover 3D Scene Shape from a Single Image. In CVPR.
  50. MVImgNet: A Large-scale Dataset of Multi-view Images. In CVPR.
  51. NeWCRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation. In CVPR.
  52. Zhengyou Zhang. 2000. A flexible new technique for camera calibration. TPAMI (2000).
  53. Tame a Wild Camera: In-the-Wild Monocular Camera Calibration. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023, Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine (Eds.). http://papers.nips.cc/paper_files/paper/2023/hash/8db9279f593652ee9bb2223b4a2c43fa-Abstract-Conference.html
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com