Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

3D Visual Perception for Self-Driving Cars using a Multi-Camera System: Calibration, Mapping, Localization, and Obstacle Detection (1708.09839v1)

Published 31 Aug 2017 in cs.CV

Abstract: Cameras are a crucial exteroceptive sensor for self-driving cars as they are low-cost and small, provide appearance information about the environment, and work in various weather conditions. They can be used for multiple purposes such as visual navigation and obstacle detection. We can use a surround multi-camera system to cover the full 360-degree field-of-view around the car. In this way, we avoid blind spots which can otherwise lead to accidents. To minimize the number of cameras needed for surround perception, we utilize fisheye cameras. Consequently, standard vision pipelines for 3D mapping, visual localization, obstacle detection, etc. need to be adapted to take full advantage of the availability of multiple cameras rather than treat each camera individually. In addition, processing of fisheye images has to be supported. In this paper, we describe the camera calibration and subsequent processing pipeline for multi-fisheye-camera systems developed as part of the V-Charge project. This project seeks to enable automated valet parking for self-driving cars. Our pipeline is able to precisely calibrate multi-camera systems, build sparse 3D maps for visual navigation, visually localize the car with respect to these maps, generate accurate dense maps, as well as detect obstacles based on real-time depth map extraction.

3D Visual Perception for Self-Driving Cars using a Multi-Camera System

In the field of autonomous vehicles, the ability to accurately perceive and understand the environment is crucial for safe and effective operation. The paper "3D Visual Perception for Self-Driving Cars using a Multi-Camera System: Calibration, Mapping, Localization, and Obstacle Detection" offers a comprehensive framework for leveraging a multi-camera system consisting of fisheye cameras to achieve robust 3D visual perception, particularly for applications such as automated valet parking.

Overview of the System

The system utilizes a surround-view multi-camera configuration, aimed at avoiding blind spots, thereby ensuring a comprehensive 360-degree field of view. The use of fisheye cameras is particularly noteworthy, as they minimize the number of cameras required while ensuring wide coverage. This paper presents a sophisticated pipeline encompassing essential components: camera calibration, sparse and dense mapping, localization, and obstacle detection.

Key Components

  1. Camera Calibration: Fundamental to the system is the precise calibration of the multi-camera setup. The authors propose a novel SLAM-based extrinsic calibration method which eliminates the need for fiducial targets. It calculates the inter-camera transformations using naturally occurring scene features and wheel odometry data, achieving high precision in calibration over large areas with reduced infrastructure setup requirements.
  2. Sparse Mapping and Motion Estimation: The paper introduces advanced techniques for ego-motion estimation within a generalized camera framework. By exploiting the Ackermann motion model typical of vehicles, the authors developed a 2-point algorithm which enhances the efficiency of RANSAC-based motion estimation, thereby facilitating scalable sparse mapping across extensive environments.
  3. Loop Closure and Pose Graph Optimization: Handling the inevitable drift in pose estimates due to accumulated errors, the framework proactively addresses loop closure detections using vocabulary tree-based image retrieval coupled with geometric verification. This is optimized within a pose graph framework that markedly enhances global map consistency.
  4. Dense Mapping and Height Map Fusion: The paper extends plane sweeping stereo techniques for fisheye cameras to acquire dense depth maps. Subsequently, these depth maps are fused into precise height maps optimized through a two-pass process, rendering the system apt for high-fidelity representation of environments such as parking garages.
  5. Localization Using Sparse Maps: Leveraging 2D-3D matches between map features and image features, the system achieves accurate position and orientation estimation using a novel solver tailored for generalized cameras. This allows seamless localization even when revisiting previously mapped areas.

Practical Implications and Future Directions

The results demonstrate near real-time operational capability, critical for applications such as valet parking and low-speed urban driving. The pipeline facilitates robust obstacle detection and adapts well to dynamic environments, although further exploration into dynamic object tracking and semantic segmentation could enhance its applicability to more complex driving scenarios.

In terms of future evolution, the system could benefit from integration with additional sensor modalities, such as LiDAR or radar, potentially augmenting its efficacy in adverse weather conditions. Moreover, exploring the trade-offs between camera count, complexity, and cost — as constrained by project budgets — could yield valuable insights into optimizing autonomous vehicle systems for broader commercial deployment.

Overall, the paper provides a coherent and well-validated approach to autonomous navigation using a uniquely constrained camera setup, contributing substantially to the discourse on multi-camera 3D perception for self-driving cars.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Christian Häne (14 papers)
  2. Lionel Heng (5 papers)
  3. Gim Hee Lee (135 papers)
  4. Friedrich Fraundorfer (41 papers)
  5. Paul Furgale (1 paper)
  6. Torsten Sattler (72 papers)
  7. Marc Pollefeys (230 papers)
Citations (196)