CUEING: a lightweight model to Capture hUman attEntion In driviNG (2305.15710v2)
Abstract: Discrepancies in decision-making between Autonomous Driving Systems (ADS) and human drivers underscore the need for intuitive human gaze predictors to bridge this gap, thereby improving user trust and experience. Existing gaze datasets, despite their value, suffer from noise that hampers effective training. Furthermore, current gaze prediction models exhibit inconsistency across diverse scenarios and demand substantial computational resources, restricting their on-board deployment in autonomous vehicles. We propose a novel adaptive cleansing technique for purging noise from existing gaze datasets, coupled with a robust, lightweight convolutional self-attention gaze prediction model. Our approach not only significantly enhances model generalizability and performance by up to 12.13% but also ensures a remarkable reduction in model complexity by up to 98.2% compared to the state-of-the art, making in-vehicle deployment feasible to augment ADS decision visualization and performance.
- O. Wagner, “Nearly 400 car crashes in 11 months involved automated tech, companies tell regulators,” National Public Radio, 2022-06-15. [Online]. Available: https://urlzs.com/jGdAF
- A. Paul. (2021). [Online]. Available: https://urlzs.com/ewbg7
- F. Codevilla, M. Müller, A. López, V. Koltun, and A. Dosovitskiy, “End-to-end driving via conditional imitation learning,” in 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2018, pp. 4693–4700.
- Y. Xia, J. Kim, J. Canny, K. Zipser, T. Canas-Bajo, and D. Whitney, “Periphery-fovea multi-resolution driving model guided by human attention,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2020, pp. 1767–1775.
- R. Zhang, F. Torabi, G. Warnell, and P. Stone, “Recent advances in leveraging human guidance for sequential decision-making tasks,” Autonomous Agents and Multi-Agent Systems, vol. 35, no. 2, p. 31, 2021.
- W. Bao, Q. Yu, and Y. Kong, “Drive: Deep reinforced accident anticipation with visual explanation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 7619–7628.
- Y. Wang, J. Jiang, S. Li, R. Li, S. Xu, J. Wang, and K. Li, “Decision-making driven by driver intelligence and environment reasoning for high-level autonomous vehicles: A survey,” IEEE Transactions on Intelligent Transportation Systems, 2023.
- P. Larsson. (2022). [Online]. Available: https://urlzs.com/FZuu9
- K. Wiggers. (2020). [Online]. Available: https://urlzs.com/jC5YP
- A. James. (2022). [Online]. Available: https://urlzs.com/2RV72
- Y. Xia, D. Zhang, J. Kim, K. Nakayama, K. Zipser, and D. Whitney, “Predicting driver attention in critical situations,” in Asian conference on computer vision. Springer, 2018, pp. 658–674.
- J. Fang, D. Yan, J. Qiao, J. Xue, H. Wang, and S. Li, “Dada-2000: Can driving accident be predicted by driver attentionf analyzed by a benchmark,” in 2019 IEEE Intelligent Transportation Systems Conference (ITSC). IEEE, 2019, pp. 4303–4309.
- A. Palazzi, D. Abati, F. Solera, R. Cucchiara et al., “Predicting the driver’s focus of attention: the dr (eye) ve project,” IEEE transactions on pattern analysis and machine intelligence, vol. 41, no. 7, pp. 1720–1733, 2018.
- J. Fang, D. Yan, J. Qiao, J. Xue, and H. Yu, “Dada: Driver attention prediction in driving accident scenarios,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 6, pp. 4959–4971, 2021.
- Y. Rong, N.-R. Kassautzki, W. Fuhl, and E. Kasneci, “Where and what: Driver attention-based object detection,” Proceedings of the ACM on Human-Computer Interaction, vol. 6, no. ETRA, pp. 1–22, 2022.
- I. Kotseruba and J. K. Tsotsos, “Attention for vision-based assistive and automated driving: A review of algorithms and datasets,” IEEE Transactions on Intelligent Transportation Systems, 2022.
- A. Pal, S. Mondal, and H. I. Christensen, “” looking at the right stuff”-guided semantic-gaze for autonomous driving,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11 883–11 892.
- C. Ahlström, K. Kircher, M. Nyström, and B. Wolfe, “Eye tracking in driver attention research—how gaze data interpretations influence what we learn,” Frontiers in neuroergonomics, vol. 2, p. 778043, 2021.
- S. Alletto, A. Palazzi, F. Solera, S. Calderara, and R. Cucchiara, “Dr (eye) ve: a dataset for attention-based tasks with applications to autonomous and assisted driving,” in Proceedings of the ieee conference on computer vision and pattern recognition workshops, 2016, pp. 54–60.
- T. Deng, H. Yan, L. Qin, T. Ngo, and B. Manjunath, “How do drivers allocate their potential attention? driving fixation prediction via convolutional neural networks,” IEEE Transactions on Intelligent Transportation Systems, vol. 21, no. 5, pp. 2146–2154, 2019.
- K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 2015.
- S. Baee, E. Pakdamanian, I. Kim, L. Feng, V. Ordonez, and L. Barnes, “Medirl: Predicting the visual attention of drivers via maximum entropy deep inverse reinforcement learning,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 13 158–13 168.
- M. E. Celebi and H. A. Kingravi, “Linear, deterministic, and order-invariant initialization methods for the k-means clustering algorithm,” Partitional clustering algorithms, pp. 79–98, 2015.
- S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding,” arXiv preprint arXiv:1510.00149, 2015.
- G. Jocher, A. Stoken, J. Borovec, NanoCode012, ChristopherSTAN, L. Changyu, Laughing, tkianai, A. Hogan, lorenzomammana, yxNONG, AlexWang1900, L. Diaconu, Marc, wanghaoyang0106, ml5ah, Doug, F. Ingham, Frederik, Guilhen, Hatovix, J. Poznanski, J. Fang, L. Yu, changyu98, M. Wang, N. Gupta, O. Akhtar, PetrDvoracek, and P. Rai, “ultralytics/yolov5: v3.1 - Bug Fixes and Performance Improvements,” Oct. 2020. [Online]. Available: https://doi.org/10.5281/zenodo.4154370
- F. Yu, H. Chen, X. Wang, W. Xian, Y. Chen, F. Liu, V. Madhavan, and T. Darrell, “Bdd100k: A diverse driving dataset for heterogeneous multitask learning,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale. arxiv 2020,” arXiv preprint arXiv:2010.11929, 2010.
- S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 3–19.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “Pytorch: An imperative style, high-performance deep learning library,” in Advances in Neural Information Processing Systems 32. Curran Associates, Inc., 2019, pp. 8024–8035.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
- K. Crowston, “Amazon mechanical turk: A research tool for organizations and information systems scholars,” in Shaping the Future of ICT Research. Methods and Approaches: IFIP WG 8.2, Working Conference, Tampa, FL, USA, December 13-14, 2012. Proceedings. Springer, 2012, pp. 210–221.
- G. Varol, J. Romero, X. Martin, N. Mahmood, M. J. Black, I. Laptev, and C. Schmid, “Learning from synthetic humans,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 109–117.
- X. Peng, B. Usman, N. Kaushik, D. Wang, J. Hoffman, and K. Saenko, “Visda: A synthetic-to-real benchmark for visual domain adaptation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018, pp. 2021–2026.
- R. Sovit and G. Vikas. (2022). [Online]. Available: https://learnopencv.com/performance-comparison-of-yolo-models/
- C. Wang, A. Bochkovskiy, and H. M. Liao, “Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023. IEEE, 2023, pp. 7464–7475.