Lane2Seq: Towards Unified Lane Detection via Sequence Generation (2402.17172v1)
Abstract: In this paper, we present a novel sequence generation-based framework for lane detection, called Lane2Seq. It unifies various lane detection formats by casting lane detection as a sequence generation task. This is different from previous lane detection methods, which depend on well-designed task-specific head networks and corresponding loss functions. Lane2Seq only adopts a plain transformer-based encoder-decoder architecture with a simple cross-entropy loss. Additionally, we propose a new multi-format model tuning based on reinforcement learning to incorporate the task-specific knowledge into Lane2Seq. Experimental results demonstrate that such a simple sequence generation paradigm not only unifies lane detection but also achieves competitive performance on benchmarks. For example, Lane2Seq gets 97.95\% and 97.42\% F1 score on Tusimple and LLAMAS datasets, establishing a new state-of-the-art result for two benchmarks.
- Tusimple dataset. https://github.com/TuSimple/tusimple-benchmark. Accessed on 11th August 2023.
- Flamingo: a visual language model for few-shot learning. Advances in Neural Information Processing Systems, 35:23716–23736, 2022.
- Unsupervised labeled lane markers using maps. In Proceedings of the IEEE/CVF international conference on computer vision workshops, pages 0–0, 2019.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Annotating object instances with a polygon-rnn. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5230–5238, 2017.
- Bsnet: Lane detection via draw b-spline curves nearby. arXiv preprint arXiv:2301.06910, 2023.
- MMDetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155, 2019.
- Pix2seq: A language modeling framework for object detection. arXiv preprint arXiv:2109.10852, 2021.
- A unified sequence interface for vision tasks. Advances in Neural Information Processing Systems, 35:31333–31346, 2022.
- Seqtrack: Sequence to sequence learning for visual object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14572–14581, 2023.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Guiding pretraining in reinforcement learning with large language models. arXiv preprint arXiv:2302.06692, 2023.
- Spinnet: Spinning convolutional network for lane boundary detection. Computational Visual Media, 5:417–428, 2019.
- Rethinking efficient lane detection via curve modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17062–17070, 2022.
- Laneformer: Object-aware row-column transformers for lane detection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 799–807, 2022.
- Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022.
- Clrernet: Improving confidence of lane detection with laneiou. arXiv preprint arXiv:2305.08366, 2023.
- Eigenlanes: Data-driven lane descriptors for structurally diverse lanes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17163–17171, 2022.
- Towards unified scene text spotting based on sequence generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15223–15232, 2023.
- Line-cnn: End-to-end traffic line detection with line proposal unit. IEEE Transactions on Intelligent Transportation Systems, 21(1):248–258, 2019.
- Iteratively-refined interactive 3d medical image segmentation with multi-agent reinforcement learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9394–9402, 2020.
- Condlanenet: a top-to-down lane detection framework based on conditional convolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3773–3782, 2021.
- End-to-end lane shape prediction with transformers. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pages 3694–3702, 2021.
- Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
- Unified-io: A unified model for vision, language, and multi-modal tasks. arXiv preprint arXiv:2206.08916, 2022.
- End-to-end active object tracking via reinforcement learning. In International conference on machine learning, pages 3286–3295. PMLR, 2018.
- Reinforcement learning for visual object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2894–2902, 2016.
- Spatial as deep: Spatial cnn for traffic scene understanding. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
- Spts: single-point text spotting. In Proceedings of the 30th ACM International Conference on Multimedia, pages 4272–4281, 2022.
- Tuning computer vision models with task rewards. arXiv preprint arXiv:2302.08242, 2023.
- Ultra fast structure-aware deep lane detection. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIV 16, pages 276–291. Springer, 2020.
- Improving language understanding by generative pre-training. 2018.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
- Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831. PMLR, 2021.
- Policy gradient methods for reinforcement learning with function approximation. Advances in neural information processing systems, 12, 1999.
- Keep your eyes on the lane: Real-time attention-guided lane detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 294–302, 2021.
- Polylanenet: Lane estimation via deep polynomial regression. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 6150–6156. IEEE, 2021.
- Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning, pages 6105–6114. PMLR, 2019.
- A keypoint-based global association network for lane detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1392–1401, 2022.
- Lanenet: Real-time lane detection networks for autonomous driving. arXiv preprint arXiv:1807.01726, 2018.
- Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8:229–256, 1992.
- Yolop: You only look once for panoptic driving perception. Machine Intelligence Research, 19(6):550–562, 2022.
- Adnet: Lane shape prediction via anchor decomposition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6404–6413, 2023.
- Lane detection with versatile atrousformer and local semantic guidance. Pattern Recognition, 133:109053, 2023.
- Unitab: Unifying text and box outputs for grounded vision-language modeling. In European Conference on Computer Vision, pages 521–539. Springer, 2022.
- Deep layer aggregation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2403–2412, 2018.
- Houghlanenet: Lane detection with deep hough transform and dynamic convolution. arXiv preprint arXiv:2307.03494, 2023.
- Deep reinforcement learning based lane detection and localization. Neurocomputing, 413:328–338, 2020.
- Resa: Recurrent feature-shift aggregator for lane detection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 3547–3554, 2021.
- Clrnet: Cross layer refinement network for lane detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 898–907, 2022.
- End to end lane detection with one-to-several transformer. arXiv preprint arXiv:2305.00675, 2023.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days freePaper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.