Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
o3 Pro
5 tokens/sec
GPT-4.1 Pro
37 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
Gemini 2.5 Flash Deprecated
12 tokens/sec
2000 character limit reached

From a Bird's Eye View to See: Joint Camera and Subject Registration without the Camera Calibration (2212.09298v3)

Published 19 Dec 2022 in cs.CV

Abstract: We tackle a new problem of multi-view camera and subject registration in the bird's eye view (BEV) without pre-given camera calibration. This is a very challenging problem since its only input is several RGB images from different first-person views (FPVs) for a multi-person scene, without the BEV image and the calibration of the FPVs, while the output is a unified plane with the localization and orientation of both the subjects and cameras in a BEV. We propose an end-to-end framework solving this problem, whose main idea can be divided into following parts: i) creating a view-transform subject detection module to transform the FPV to a virtual BEV including localization and orientation of each pedestrian, ii) deriving a geometric transformation based method to estimate camera localization and view direction, i.e., the camera registration in a unified BEV, iii) making use of spatial and appearance information to aggregate the subjects into the unified BEV. We collect a new large-scale synthetic dataset with rich annotations for evaluation. The experimental results show the remarkable effectiveness of our proposed method.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (65)
  1. Ego2top: Matching viewers in egocentric and top-view videos. In Proceedings of the European Conference on Computer Vision, pages 253–268, 2016.
  2. Egocentric meets top-view. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(6):1353–1366, 2018a.
  3. Integrating egocentric videos in top-view surveillance videos: Joint identification and temporal alignment. In Proceedings of the European Conference on Computer Vision, pages 285–300, 2018b.
  4. Deep occlusion reasoning for multi-camera multi-target detection. In Proceedings of the IEEE International Conference on Computer Vision, pages 271–279, 2017.
  5. Fantrack: 3D multi-object tracking with feature association network. In Proceedings of the IEEE Intelligent Vehicles Symposium, pages 1426–1433, 2019.
  6. Score refinement for confidence-based 3D multi-object tracking. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 8083–8090, 2021.
  7. Monoloco: Monocular 3D pedestrian localization and uncertainty estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6861–6871, 2019.
  8. Perceiving humans: from monocular 3D localization to social distancing. IEEE Transactions on Intelligent Transportation Systems, 23(7):7401–7418, 2021.
  9. Online inspection of 3D parts via a locally overlapping camera network. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, pages 1–10, 2016.
  10. Learning a neural solver for multiple object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6247–6257, 2020.
  11. Simultaneous calibration of odometry and sensor parameters for mobile robots. IEEE Transactions on Robotics, 29(2):475–492, 2013.
  12. Deft: Detection embeddings for tracking. arXiv preprint arXiv:2102.02267, 2021.
  13. Deep multi-camera people detection. In Proceedings of the IEEE International Conference on Machine Learning and Applications, pages 848–853, 2017.
  14. Wildtrack: A multi-camera hd dataset for dense unscripted pedestrian detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5030–5039, 2018.
  15. Monorun: Monocular 3D object detection by reconstruction and uncertainty propagation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10379–10388, 2021.
  16. Graph-detr3d: Rethinking overlapping regions for multi-view 3D object detection. In Proceedings of the 30th ACM International Conference on Multimedia, pages 5999–6008, 2022.
  17. Probabilistic 3D multi-modal, multi-object tracking for autonomous driving. In Proceedings of the IEEE International Conference on Robotics and Automation, pages 14227–14233, 2021.
  18. Fast and robust multi-person 3D pose estimation and tracking from multiple views. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10):6981–6992, 2021.
  19. Extrinsic calibration of a non-overlapping camera network based on close-range photogrammetry. Applied optics, 55(23):6363–6370, 2016.
  20. Multiple human association between top and horizontal views by matching subjects’ spatial distributions. arXiv preprint arXiv:1907.11458, 2019.
  21. Complementary-view co-interest person detection. In Proceedings of the ACM International Conference on Multimedia, pages 2746–2754, 2020.
  22. Multiple human association and tracking from egocentric and complementary top views. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9):5225–5242, 2021.
  23. Connecting the Complementary-View Videos: Joint Camera Identification and Subject Association. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2416–2425, 2022a.
  24. Multi-view multi-human association with deep assignment network. IEEE Transactions on Image Processing, 31:1830–1840, 2022b.
  25. Relating view directions of complementary-view mobile cameras via the human shadow. International Journal of Computer Vision, 131(5):1106–1121, 2023.
  26. Benchmarking the complementary-view multi-human association and tracking. International Journal of Computer Vision, 132(1):118–136, 2024.
  27. Patch-netvlad: Multi-scale fusion of locally-global descriptors for place recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14141–14152, 2021.
  28. Recognition and 3D localization of pedestrian actions from monocular video. In Proceedings of the IEEE International Conference on Intelligent Transportation Systems, pages 1–7, 2020.
  29. Multiview detection with shadow transformer (and view-coherent data augmentation). In Proceedings of the ACM International Conference on Multimedia, pages 1673–1682, 2021.
  30. Multiview detection with feature perspective transformation. In Proceedings of the European Conference on Computer Vision, pages 1–18, 2020.
  31. Bevdet: High-performance multi-camera 3D object detection in bird-eye-view. arXiv preprint arXiv:2112.11790, 2021.
  32. Consistent shape maps via semidefinite programming. In Computer graphics forum, pages 177–186, 2013.
  33. Polarformer: Multi-camera 3D object detection with polar transformer. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 1042–1050, 2023.
  34. PifPaf: Composite Fields for Human Pose Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11977–11986, 2019.
  35. Correlation verification for image retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5374–5384, 2022.
  36. Stereo vision-based semantic 3D object and ego-motion tracking for autonomous driving. In Proceedings of the European Conference on Computer Vision, pages 646–661, 2018.
  37. Bevdepth: Acquisition of reliable depth for multi-view 3D object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 1477–1485, 2023.
  38. Unsupervised domain adaptation for monocular 3D object detection via self-training. In Proceedings of the European Conference on Computer Vision, pages 245–262, 2022a.
  39. Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. In Proceedings of the European Conference on Computer Vision, pages 1–18, 2022b.
  40. An external parameter calibration method for multiple cameras based on laser rangefinder. Measurement, 47:954–962, 2014.
  41. David G Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91–110, 2004.
  42. M3dssd: Monocular 3D single stage object detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6145–6154, 2021.
  43. Delving into localization errors for monocular 3D object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4721–4730, 2021.
  44. Simpletrack: Understanding and rethinking 3D multi-object tracking. In Proceedings of the European Conference on Computer Vision, pages 680–696, 2023.
  45. Robert Clay Prim. Shortest connection networks and some generalizations. The Bell System Technical Journal, 36(6):1389–1401, 1957.
  46. Categorical depth distribution network for monocular 3D object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8555–8564, 2021.
  47. John Riccitiello. John riccitiello sets out to identify the engine of growth for unity technologies (interview). VentureBeat. Interview with Dean Takahashi. Retrieved January, 18(3), 2015.
  48. Superglue: Learning feature matching with graph neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4938–4947, 2020.
  49. Structure-from-motion revisited. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4104–4113, 2016.
  50. 3D human pose estimation from multi person stereo 360 scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 1–8, 2019.
  51. Spatial-aware feature aggregation for image based cross-view geo-localization. Advances in Neural Information Processing Systems, 32, 2019.
  52. Where am I looking at? joint location and orientation estimation by cross-view matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4064–4072, 2020.
  53. Stacked homography transformations for multi-view pedestrian detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6049–6057, 2021.
  54. LoFTR: Detector-free local feature matching with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8922–8931, 2021.
  55. Dissecting person re-identification from the viewpoint of viewpoint. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 608–617, 2020.
  56. Multi-camera vehicle tracking and re-identification based on visual and spatial-temporal features. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 275–284, 2019.
  57. Depth-conditioned dynamic message propagation for monocular 3D object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 454–463, 2021.
  58. Monocular 3D object detection with depth from motion. In Proceedings of the European Conference on Computer Vision, pages 386–403. Springer, 2022a.
  59. Detr3d: 3D object detection from multi-view images via 3d-to-2d queries. In Conference on Robot Learning, pages 180–191. PMLR, 2022b.
  60. 3D multi-object tracking: A baseline and new evaluation metrics. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 10359–10366, 2020a.
  61. Gnn3dmot: Graph neural network for 3D multi-object tracking with multi-feature learning. arXiv preprint arXiv:2006.07327, 2020b.
  62. Center-based 3D object detection and tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11784–11793, 2021.
  63. Learnable online graph representations for 3D multi-object tracking. IEEE Robotics and Automation Letters, 7(2):5103–5110, 2022.
  64. Scalable person re-identification: A benchmark. In Proceedings of the IEEE International Conference on Computer Vision, pages 1116–1124, 2015.
  65. R2former: Unified retrieval and reranking transformer for place recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19370–19380, 2023.
Citations (1)

Summary

We haven't generated a summary for this paper yet.