Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Cooperative Perception for 3D Object Detection in Driving Scenarios using Infrastructure Sensors (1912.12147v2)

Published 18 Dec 2019 in cs.CV, cs.LG, cs.MA, cs.RO, and stat.ML

Abstract: 3D object detection is a common function within the perception system of an autonomous vehicle and outputs a list of 3D bounding boxes around objects of interest. Various 3D object detection methods have relied on fusion of different sensor modalities to overcome limitations of individual sensors. However, occlusion, limited field-of-view and low-point density of the sensor data cannot be reliably and cost-effectively addressed by multi-modal sensing from a single point of view. Alternatively, cooperative perception incorporates information from spatially diverse sensors distributed around the environment as a way to mitigate these limitations. This article proposes two schemes for cooperative 3D object detection using single modality sensors. The early fusion scheme combines point clouds from multiple spatially diverse sensing points of view before detection. In contrast, the late fusion scheme fuses the independently detected bounding boxes from multiple spatially diverse sensors. We evaluate the performance of both schemes, and their hybrid combination, using a synthetic cooperative dataset created in two complex driving scenarios, a T-junction and a roundabout. The evaluation shows that the early fusion approach outperforms late fusion by a significant margin at the cost of higher communication bandwidth. The results demonstrate that cooperative perception can recall more than 95% of the objects as opposed to 30% for single-point sensing in the most challenging scenario. To provide practical insights into the deployment of such system, we report how the number of sensors and their configuration impact the detection performance of the system.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Eduardo Arnold (5 papers)
  2. Mehrdad Dianati (36 papers)
  3. Robert de Temple (1 paper)
  4. Saber Fallah (25 papers)
Citations (196)

Summary

  • The paper demonstrates that early fusion of multi-view point cloud data boosts detection recall to over 95%, outperforming single-sensor methods.
  • Researchers evaluated early and late fusion schemes on synthetic urban datasets, highlighting trade-offs between bandwidth needs and accuracy.
  • The study underscores the potential of cooperative perception with infrastructure sensors to enable safer autonomous navigation at complex intersections.

Cooperative Perception for 3D Object Detection in Driving Scenarios using Infrastructure Sensors

In this paper, the authors introduce adaptive cooperative 3D object detection schemes grounded in infrastructure sensors for driving scenarios. The objective centers on enhancing the performance of 3D object detection—a critical function for autonomous vehicle perception systems—by surmounting limitations like occlusion, limited field-of-view, and reduced point density commonly associated with singular sensor modalities.

Their research explores the viability of cooperative perception, which aggregates information from an array of spatially dispersed sensors, as a means to mitigate the aforementioned challenges. Two primary schemes are elaborated upon: early fusion and late fusion. Early fusion brings together point clouds from multiple vantage points prior to object detection, whereas late fusion amalgamates independently detected bounding boxes post-detection across different sensors.

The authors offer a detailed evaluation of these schemes using a custom synthetic dataset portraying two intricate urban scenarios: a T-junction and a roundabout. The dataset comprises depth maps from six to eight roadside sensors, simulating the field deployment of such systems. Their experimental outcomes underline that early fusion significantly eclipses the performance of late fusion, albeit with an increased communication bandwidth requirement. Early fusion displays superior recall, identifying more than 95% of objects compared to the 30% detected using single-point detection. This underscores the efficacy of cooperative perception in complex environments where singular vantage point systems might falter.

In probing into the implications of these findings, the researchers delve into how the number and configuration of sensors influence detection capabilities. They discern that as sensor numbers amplify, so does the system's detection prowess—exemplified by the fact that employing all available sensors in the T-junction scenario accounts for a jump in detection recall from less than 30% to over 95%.

This exploration implicates significant theoretical and practical considerations for autonomous vehicle systems. Theoretically, it advocates for an early fusion approach that capitalizes on the dense point cloud data spanning maximal field-of-view and mitigating spatial impairments, thereby enhancing object detection accuracy. Practically, this proposal aligns with the contemporary push towards smart infrastructure-assisted autonomous systems, presenting a cost-effective route towards safer navigation, especially at complex intersections.

Moving forward, broader adoption of infrastructure-dependent detection systems may prompt deeper evaluations of real-world challenges such as network-induced latencies or realistic sensor noise. Moreover, future work may focus on integrating vehicle-to-infrastructure (V2I) communications to prop up such cooperative systems, potentially paving the way for more precise autonomous navigation technologies.

In sum, this research artfully articulates the prospective enhancements to autonomous perception systems achievable via collaborative sensor integration, thereby inviting further exploration within cooperative autonomous vehicle frameworks.