Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 92 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 32 tok/s
GPT-5 High 40 tok/s Pro
GPT-4o 83 tok/s
GPT OSS 120B 467 tok/s Pro
Kimi K2 197 tok/s Pro
2000 character limit reached

Think Twice Before Recognizing: Large Multimodal Models for General Fine-grained Traffic Sign Recognition (2409.01534v1)

Published 3 Sep 2024 in cs.CV, cs.AI, and cs.MM

Abstract: We propose a new strategy called think twice before recognizing to improve fine-grained traffic sign recognition (TSR). Fine-grained TSR in the wild is difficult due to the complex road conditions, and existing approaches particularly struggle with cross-country TSR when data is lacking. Our strategy achieves effective fine-grained TSR by stimulating the multiple-thinking capability of large multimodal models (LMM). We introduce context, characteristic, and differential descriptions to design multiple thinking processes for the LMM. The context descriptions with center coordinate prompt optimization help the LMM to locate the target traffic sign in the original road images containing multiple traffic signs and filter irrelevant answers through the proposed prior traffic sign hypothesis. The characteristic description is based on few-shot in-context learning of template traffic signs, which decreases the cross-domain difference and enhances the fine-grained recognition capability of the LMM. The differential descriptions of similar traffic signs optimize the multimodal thinking capability of the LMM. The proposed method is independent of training data and requires only simple and uniform instructions. We conducted extensive experiments on three benchmark datasets and two real-world datasets from different countries, and the proposed method achieves state-of-the-art TSR results on all five datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (70)
  1. Provid: Progressive and multimodal vehicle reidentification for large-scale urban surveillance. IEEE Transactions on Multimedia, 20(3):645–658, 2017.
  2. Cost analysis of road traffic crashes in china. International Journal of Injury Control and Safety Promotion, 27(3):385–391, 2020.
  3. An improved traffic signs recognition and tracking method for driver assistance system. In Proceedings of the IEEE International Conference on Computer and Information Science (ICIS), pages 1–6, 2016.
  4. Traffic sign recognition based on hog feature extraction. Journal of Measurements in Engineering, 9(3):142–155, 2021.
  5. Segmentation masks for real-time traffic sign recognition using weighted hog-based trees. In Proceedings of the International IEEE Conference on Intelligent Transportation Systems (ITSC), pages 1954–1959. IEEE, 2011.
  6. Real-time traffic sign recognition using spatially weighted hog trees. In Proceedings of the International Conference on Advanced Robotics (ICAR), pages 61–66. IEEE, 2011.
  7. An efficient method for traffic sign recognition based on extreme learning machine. IEEE Transactions on Cybernetics, 47(4):920–933, 2016.
  8. David G Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60:91–110, 2004.
  9. Fast traffic sign recognition with a rotation invariant binary pattern based feature. Sensors, 15(1):2161–2180, 2015.
  10. Detection and recognition of traffic signs from road scene images. In Proceedings of the International Conference on Frontiers of Information Technology (FIT), pages 330–335, 2014.
  11. Grtr: Gradient rebalanced traffic sign recognition for autonomous vehicles. IEEE Transactions on Automation Science and Engineering, 2023.
  12. Improved vgg model-based efficient traffic sign recognition for safe driving in 5g scenarios. International Journal of Machine Learning and Cybernetics, 12:3069–3080, 2021.
  13. Traffic sign recognition using deep learning neural network and spatial transformer. In Proceedings of the International Conference on Advances in Computing, Communication and Applied Informatics (ACCAI), pages 1–8, 2023.
  14. Traffic sign recognition using a multi-task convolutional neural network. IEEE Transactions on Intelligent Transportation Systems, 19(4):1100–1111, 2017.
  15. Pyramid transformer for traffic sign detection. In Proceedings of the International Conference on Computer and Knowledge Engineering (ICCKE), pages 112–116. IEEE, 2022.
  16. Pre-locator incorporating swin-transformer refined classifier for traffic sign recognition. Intelligent Automation & Soft Computing, 37(2), 2023.
  17. Signparser: An end-to-end framework for traffic sign understanding. International Journal of Computer Vision, 132(3):805–821, 2024.
  18. Evaluation of vision transformers for traffic sign classification. Wireless Communications and Mobile Computing, 2022:1–14, 2022.
  19. Economic Commission for Europe-Inland Tansport Committee et al. Convention on road signs and signals. United Nations Treaty Series, 1091:3, 1968.
  20. General traffic sign recognition by feature matching. In Proceedings of the IEEE International Conference Image and Vision Computing New Zealand (IVCNZ), pages 409–414, 2009.
  21. An unsupervised approach for traffic sign recognition based on bag-of-visual-words. In Proceedings of the International Conference on Information Technology and Electrical Engineering (ICITEE), pages 1–4, 2016.
  22. Few-shot traffic sign recognition with clustering inductive bias and random neural network. Pattern Recognition, 100:107160, 2020.
  23. Zero-shot traffic sign recognition based on midlevel feature matching. Sensors, 23(23):9607, 2023.
  24. A note on traffic sign recognition based on vision transformer adapter using visual feature matching. ITE technical report, 47(6):1–4, 2023.
  25. Language models are few-shot learners. In Proceedings of the Annual Conference on Advances in Neural Information Processing Systems (NeurIPS), volume 33, pages 1877–1901, 2020.
  26. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240):1–113, 2023.
  27. Gpt-4 technical report, 2024.
  28. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  29. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv, pages 1–155, 2023.
  30. Multimodal few-shot learning with frozen language models. In Proceedings of the Annual Conference on Advances in Neural Information Processing Systems (NeurIPS), volume 34, pages 200–212, 2021.
  31. Openai. gpt-4o system card. 2024.
  32. Openai. gpt-4v(ision) system card. 2023.
  33. Openai. gpt-4v(ision) technical work and authors. 2023.
  34. Lmeye: An interactive perception network for large language models. IEEE Transactions on Multimedia, pages 1–13, 2024.
  35. Dual modality prompt tuning for vision-language pre-trained model. IEEE Transactions on Multimedia, 2023.
  36. Transferring image-clip to video-text retrieval via temporal relations. IEEE Transactions on Multimedia, 25:7772–7785, 2022.
  37. Sgva-clip: Semantic-guided visual adapting of vision-language models for few-shot image classification. IEEE Transactions on Multimedia, 2023.
  38. Cross-domain few-shot in-context learning for enhancing traffic sign recognition. arXiv preprint arXiv:2407.05814, 2024.
  39. Chat-gpt is on the horizon: Could a large language model be suitable for intelligent traffic safety research and applications? ArXiv, 2023.
  40. A survey on multimodal large language models for autonomous driving. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 958–979, 2024.
  41. Traffic sign recognition using scale invariant feature transform and color classification. In Proceedings of the International Symposium on Computer and Information Sciences (ISCIS), pages 1–6, 2008.
  42. Recognition of traffic signs with artificial neural networks: A novel dataset and algorithm. In Proceedings of the International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pages 171–176, 2021.
  43. Lightweight deep network for traffic sign classification. Annals of Telecommunications, 75:369–379, 2020.
  44. U Syed Abudhagir and N Ashok. Highly sensitive deep learning model for road traffic sign identification. Mathematical Statistician and Engineering Applications, 71(4):3194–3205, 2022.
  45. Traffic sign recognition based on deep learning. Multimedia Tools and Applications, 81(13):17779–17791, 2022.
  46. An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the International Conference on Learning Representations (ICLR), pages 1–21, 2021.
  47. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10012–10022, 2021.
  48. Sustainable and transferable traffic sign recognition for intelligent transportation systems. IEEE Transactions on Intelligent Transportation Systems, 2022.
  49. A survey on evaluation of large language models. arXiv, pages 1–45, 2023.
  50. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023.
  51. Ferret: Refer and ground anything anywhere at any granularity. arXiv preprint arXiv:2310.07704, 2023.
  52. Glamm: Pixel grounding large multimodal model. arXiv preprint arXiv:2311.03356, 2023.
  53. Drive as you speak: Enabling human-like interaction with large language models in autonomous vehicles. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 902–909, 2024.
  54. Satoshi Suzuki et al. Topological structural analysis of digitized binary images by border following. Computer Vision, Graphics, and Image Processing, 30(1):32–46, 1985.
  55. Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition. Neural Networks, 32:323–332, 2012.
  56. Traffic sign recognition—how far are we from the solution? In Proceedings of the International Joint Conference on Neural Networks (IJCNN), pages 1–8, 2013.
  57. Traffic-sign detection and classification in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2110–2118, 2016.
  58. Set-of-mark prompting unleashes extraordinary visual grounding in gpt-4v. arXiv, pages 1–23, 2023.
  59. Densely connected convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4700–4708, 2017.
  60. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning (ICML), pages 6105–6114, 2019.
  61. Real-time traffic sign recognition based on efficient cnns in the wild. IEEE Transactions on Intelligent Transportation Systems, 20(3):975–984, 2018.
  62. Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer. arXiv preprint arXiv:2110.02178, 2021.
  63. Swin transformer v2: Scaling up capacity and resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12009–12019, 2022.
  64. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16000–16009, 2022.
  65. Training data-efficient image transformers & distillation through attention. In Proceedings of the International Conference on Machine Learning (ICML), pages 10347–10357, 2021.
  66. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning (ICML), pages 8748–8763, 2021.
  67. Eva-02: A visual representation for neon genesis. arXiv preprint arXiv:2303.11331, 2023.
  68. Sam 2: Segment anything in images and videos. arXiv preprint arXiv:2408.00714, 2024.
  69. Vision transformer adapter for dense predictions. In Proceedings of the International Conference on Learning Representations (ICLR), pages 1–20, 2023.
  70. Proving test set contamination in black box language models. In Proceedings of the International Conference on Learning Representations (ICLR), pages 1–19, 2024.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube