Papers

Topics

Authors

Recent

View all

Assistant AI Research Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

GPT-5.1

GPT-5.1 109 tok/s

Gemini 3.0 Pro 52 tok/s Pro

Gemini 2.5 Flash 159 tok/s Pro

Kimi K2 203 tok/s Pro

Claude Sonnet 4.5 37 tok/s Pro

2000 character limit reached

Chrome Extension

Enhance arXiv with our new Chrome Extension.

Sponsor

Organize your preprints, BibTeX, and PDFs with Paperpile.
Get 30 days free

Content

Paper Summary Paper Prompts Open Problems Continue Learning Related Papers Authors Collections

Lane2Seq: Towards Unified Lane Detection via Sequence Generation (2402.17172v1)

Published 27 Feb 2024 in cs.CV

Abstract: In this paper, we present a novel sequence generation-based framework for lane detection, called Lane2Seq. It unifies various lane detection formats by casting lane detection as a sequence generation task. This is different from previous lane detection methods, which depend on well-designed task-specific head networks and corresponding loss functions. Lane2Seq only adopts a plain transformer-based encoder-decoder architecture with a simple cross-entropy loss. Additionally, we propose a new multi-format model tuning based on reinforcement learning to incorporate the task-specific knowledge into Lane2Seq. Experimental results demonstrate that such a simple sequence generation paradigm not only unifies lane detection but also achieves competitive performance on benchmarks. For example, Lane2Seq gets 97.95\% and 97.42\% F1 score on Tusimple and LLAMAS datasets, establishing a new state-of-the-art result for two benchmarks.

References (51)

Tusimple dataset. https://github.com/TuSimple/tusimple-benchmark. Accessed on 11th August 2023.
Flamingo: a visual language model for few-shot learning. Advances in Neural Information Processing Systems, 35:23716–23736, 2022.
Unsupervised labeled lane markers using maps. In Proceedings of the IEEE/CVF international conference on computer vision workshops, pages 0–0, 2019.
Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
Annotating object instances with a polygon-rnn. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5230–5238, 2017.
Bsnet: Lane detection via draw b-spline curves nearby. arXiv preprint arXiv:2301.06910, 2023.
MMDetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155, 2019.
Pix2seq: A language modeling framework for object detection. arXiv preprint arXiv:2109.10852, 2021.
A unified sequence interface for vision tasks. Advances in Neural Information Processing Systems, 35:31333–31346, 2022.
Seqtrack: Sequence to sequence learning for visual object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14572–14581, 2023.
An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
Guiding pretraining in reinforcement learning with large language models. arXiv preprint arXiv:2302.06692, 2023.
Spinnet: Spinning convolutional network for lane boundary detection. Computational Visual Media, 5:417–428, 2019.
Rethinking efficient lane detection via curve modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17062–17070, 2022.
Laneformer: Object-aware row-column transformers for lane detection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 799–807, 2022.
Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022.
Clrernet: Improving confidence of lane detection with laneiou. arXiv preprint arXiv:2305.08366, 2023.
Eigenlanes: Data-driven lane descriptors for structurally diverse lanes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17163–17171, 2022.
Towards unified scene text spotting based on sequence generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15223–15232, 2023.
Line-cnn: End-to-end traffic line detection with line proposal unit. IEEE Transactions on Intelligent Transportation Systems, 21(1):248–258, 2019.
Iteratively-refined interactive 3d medical image segmentation with multi-agent reinforcement learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9394–9402, 2020.
Condlanenet: a top-to-down lane detection framework based on conditional convolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3773–3782, 2021.
End-to-end lane shape prediction with transformers. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pages 3694–3702, 2021.
Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
Unified-io: A unified model for vision, language, and multi-modal tasks. arXiv preprint arXiv:2206.08916, 2022.
End-to-end active object tracking via reinforcement learning. In International conference on machine learning, pages 3286–3295. PMLR, 2018.
Reinforcement learning for visual object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2894–2902, 2016.
Spatial as deep: Spatial cnn for traffic scene understanding. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
Spts: single-point text spotting. In Proceedings of the 30th ACM International Conference on Multimedia, pages 4272–4281, 2022.
Tuning computer vision models with task rewards. arXiv preprint arXiv:2302.08242, 2023.
Ultra fast structure-aware deep lane detection. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIV 16, pages 276–291. Springer, 2020.
Improving language understanding by generative pre-training. 2018.
Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831. PMLR, 2021.
Policy gradient methods for reinforcement learning with function approximation. Advances in neural information processing systems, 12, 1999.
Keep your eyes on the lane: Real-time attention-guided lane detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 294–302, 2021.
Polylanenet: Lane estimation via deep polynomial regression. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 6150–6156. IEEE, 2021.
Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning, pages 6105–6114. PMLR, 2019.
A keypoint-based global association network for lane detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1392–1401, 2022.
Lanenet: Real-time lane detection networks for autonomous driving. arXiv preprint arXiv:1807.01726, 2018.
Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8:229–256, 1992.
Yolop: You only look once for panoptic driving perception. Machine Intelligence Research, 19(6):550–562, 2022.
Adnet: Lane shape prediction via anchor decomposition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6404–6413, 2023.
Lane detection with versatile atrousformer and local semantic guidance. Pattern Recognition, 133:109053, 2023.
Unitab: Unifying text and box outputs for grounded vision-language modeling. In European Conference on Computer Vision, pages 521–539. Springer, 2022.
Deep layer aggregation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2403–2412, 2018.
Houghlanenet: Lane detection with deep hough transform and dynamic convolution. arXiv preprint arXiv:2307.03494, 2023.
Deep reinforcement learning based lane detection and localization. Neurocomputing, 413:328–338, 2020.
Resa: Recurrent feature-shift aggregator for lane detection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 3547–3554, 2021.
Clrnet: Cross layer refinement network for lane detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 898–907, 2022.
End to end lane detection with one-to-several transformer. arXiv preprint arXiv:2305.00675, 2023.