LATR: 3D Lane Detection from Monocular Images with Transformer (2308.04583v2)

Published 8 Aug 2023 in cs.CV

Abstract: 3D lane detection from monocular images is a fundamental yet challenging task in autonomous driving. Recent advances primarily rely on structural 3D surrogates (e.g., bird's eye view) built from front-view image features and camera parameters. However, the depth ambiguity in monocular images inevitably causes misalignment between the constructed surrogate feature map and the original image, posing a great challenge for accurate lane detection. To address the above issue, we present a novel LATR model, an end-to-end 3D lane detector that uses 3D-aware front-view features without transformed view representation. Specifically, LATR detects 3D lanes via cross-attention based on query and key-value pairs, constructed using our lane-aware query generator and dynamic 3D ground positional embedding. On the one hand, each query is generated based on 2D lane-aware features and adopts a hybrid embedding to enhance lane information. On the other hand, 3D space information is injected as positional embedding from an iteratively-updated 3D ground plane. LATR outperforms previous state-of-the-art methods on both synthetic Apollo, realistic OpenLane and ONCE-3DLanes by large margins (e.g., 11.4 gain in terms of F1 score on OpenLane). Code will be released at https://github.com/JMoonr/LATR .

Authors (7)

Yueru Luo (7 papers)
Chaoda Zheng (13 papers)
Xu Yan (130 papers)
Tang Kun (2 papers)
Chao Zheng (95 papers)
Shuguang Cui (275 papers)
Zhen Li (334 papers)

Citations (23)

View on Semantic Scholar

Summary

The paper introduces a transformer-based architecture that processes lane detection in the front view using lane-aware queries and dynamic 3D ground positional embeddings.
It bypasses traditional BEV methods to overcome depth ambiguities in monocular images, ensuring robust detection across varying terrains.
Experimental results demonstrate significant F1 score improvements and reduced errors, highlighting LATR's potential to advance autonomous driving systems.

An Analysis of LATR: 3D Lane Detection from Monocular Images with Transformer

The paper presents LATR, a methodology for 3D lane detection using monocular images, offering a potentially significant advancement in autonomous driving technology. It introduces a novel approach that bypasses the traditional reliance on structural 3D surrogates, such as the Bird's Eye View (BEV), for 3D lane detection tasks. LATR eradicates issues of misaligned feature maps caused by depth ambiguities in monocular images by utilizing 3D-aware front-view features.

Key Contributions and Methodology

The core innovation of LATR is its end-to-end transformer-based architecture, which directly processes lane detection in the front view, leveraging lane-aware queries and dynamic 3D ground positional embeddings. The methodology can be divided into several critical components:

Lane-aware Query Generator: Unlike traditional methods that utilize learnable tokens without any prior image feature knowledge, LATR introduces a lane-aware query generator. This component employs both lane-level and point-level embeddings, facilitating a more robust capture of 3D lane information by maintaining the geometric relationships across the lanes.
Dynamic 3D Ground Positional Embedding: LATR does not opt for intermediary view transformations which typically introduce inaccuracies in real-world driving scenarios, especially under varying terrains such as uphill or downhill roads. Instead, it introduces a dynamic grounding plane with 3D positional embeddings iteratively updating to adjust for the observed road conditions and optimize the plane's alignment based on real-world constraints.
End-to-End Transformer Decoder: Utilizing a transformer decoder with lane-aware query interactions and dynamic 3D ground positional embedding enables the model to output precise predictions about lane positions without relying heavily on prior assumptions in geometry or depth data.

Strong Numerical Results

The experimental results underscore LATR's robust performance on various datasets, including synthetic Apollo, realistic OpenLane, and ONCE-3DLanes. LATR demonstrably surpasses state-of-the-art methods by substantial margins, achieving significant improvements in F1 scores (e.g., an 11.4 increase on the OpenLane dataset). This indicates the model's efficacy in both controlled and diverse real-world scenarios, highlighting superior performance in terms of accuracy and error reduction along both X and Z coordinates in different distance ranges.

Implications and Future Research Directions

The advancement introduced by LATR has broad implications for autonomous driving applications. By tackling the depth ambiguity issue in monocular imaging through innovative usage of transformer architectures, LATR significantly enhances the reliability and precision of lane detection systems, a critical factor in deployment for trajectory planning and lane-keeping in autonomous vehicles.

Moving forward, this research could inspire further development avenues within the AI and automotive industry. The methodology's focus on fully exploiting monocular imagery can lead to reduced hardware costs without compromising accuracy, offering a competitive edge over LiDAR-centric approaches. Future studies could explore extending this approach to multi-modal systems, integrating other sensor data to enhance robustness in diverse environmental conditions.

The research lays a promising groundwork for advancements in AI technologies applied to real-world domains. It points towards a future where robust, precise 3D lane detection using cost-effective sensing can catalyze the widespread adoption of autonomous driving technologies.

PDF Markdown

Related Papers

GitHub

GitHub - JMoonr/LATR: [ICCV2023 Oral] LATR: 3D Lane Detection from Monocular Images with Transformer (184 stars)