Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Understanding and Modeling the Effects of Task and Context on Drivers' Gaze Allocation (2310.09275v3)

Published 13 Oct 2023 in cs.CV

Abstract: To further advance driver monitoring and assistance systems, it is important to understand how drivers allocate their attention, in other words, where do they tend to look and why. Traditionally, factors affecting human visual attention have been divided into bottom-up (involuntary attraction to salient regions) and top-down (driven by the demands of the task being performed). Although both play a role in directing drivers' gaze, most of the existing models for drivers' gaze prediction apply techniques developed for bottom-up saliency and do not consider influences of the drivers' actions explicitly. Likewise, common driving attention benchmarks lack relevant annotations for drivers' actions and the context in which they are performed. Therefore, to enable analysis and modeling of these factors for drivers' gaze prediction, we propose the following: 1) we correct the data processing pipeline used in DR(eye)VE to reduce noise in the recorded gaze data; 2) we then add per-frame labels for driving task and context; 3) we benchmark a number of baseline and SOTA models for saliency and driver gaze prediction and use new annotations to analyze how their performance changes in scenarios involving different tasks; and, lastly, 4) we develop a novel model that modulates drivers' gaze prediction with explicit action and context information. While reducing noise in the DR(eye)VE gaze data improves results of all models, we show that using task information in our proposed model boosts performance even further compared to bottom-up models on the cleaned up data, both overall (by 24% KLD and 89% NSS) and on scenarios that involve performing safety-critical maneuvers and crossing intersections (by up to 10--30% KLD). Extended annotations and code are available at https://github.com/ykotseruba/SCOUT.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (70)
  1. J. K. Caird, K. A. Johnston, C. R. Willness, and M. Asbridge, “The use of meta-analysis or research synthesis to combine driving simulation or naturalistic study results on driver distraction,” Journal of Safety Research, vol. 49, pp. 91–96, 2014.
  2. I. Kotseruba and J. K. Tsotsos, “Behavioral research and practical models of drivers’ attention,” arXiv:2104.05677, 2021.
  3. M. Sivak, “The information that drivers use: is it indeed 90% visual?,” Perception, vol. 25, no. 9, pp. 1081–1089, 1996.
  4. G. Osterberg, “Topography of the layer of the rods and cones in the human retima,” Acta Ophthalmologica, vol. 13, no. 6, pp. 1–102, 1935.
  5. J. K. Tsotsos, A computational perspective on visual attention. MIT Press, 2011.
  6. K. Nobre and S. Kastner, The Oxford Handbook of Attention. Oxford University Press, 2014.
  7. J. S. McCarley, K. S. Steelman, and W. J. Horrey, “The View from the Driver’s Seat: What Good Is Salience?,” Applied Cognitive Psychology, vol. 28, no. 1, pp. 47–54, 2014.
  8. P. R. Chapman and G. Underwood, “Visual search of driving situations: Danger and experience,” Perception, vol. 27, no. 8, pp. 951–964, 1998.
  9. G. Underwood, P. Chapman, Z. Berger, and D. Crundall, “Driving experience, attentional focusing, and the recall of recently inspected events,” Transportation Research Part F: Traffic Psychology and Behaviour, vol. 6, no. 4, pp. 289–304, 2003.
  10. J. Werneke and M. Vollrath, “What does the driver look at? The influence of intersection characteristics on attention allocation and driving behavior,” Accident Analysis & Prevention, vol. 45, pp. 610–619, 2012.
  11. J. Werneke and M. Vollrath, “How do environmental characteristics at intersections change in their relevance for drivers before entering an intersection: Analysis of drivers’ gaze and driving behavior in a driving simulator study,” Cognition, Technology & Work, vol. 16, no. 2, pp. 157–169, 2014.
  12. S. Lemonnier, R. Brémond, and T. Baccino, “Gaze behavior when approaching an intersection: Dwell time distribution and comparison with a quantitative prediction,” Transportation Research Part F: Traffic Psychology and Behaviour, vol. 35, pp. 60–74, 2015.
  13. G. Li, Y. Wang, F. Zhu, X. Sui, N. Wang, X. Qu, and P. Green, “Drivers’ visual scanning behavior at signalized and unsignalized intersections: A naturalistic driving study in China,” Journal of Safety Research, vol. 71, pp. 219–229, 2019.
  14. A. Palazzi, D. Abati, S. Calderara, F. Solera, and R. Cucchiara, “Predicting the Driver’s Focus of Attention: the DR (eye) VE Project,” IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol. 41, no. 7, pp. 1720–1733, 2018.
  15. A. Borji and L. Itti, “State-of-the-art in visual attention modeling,” IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol. 35, no. 1, pp. 185–207, 2012.
  16. A. Borji, “Saliency prediction in the deep learning era: Successes and limitations,” IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol. 43, no. 2, pp. 679–700, 2021.
  17. J. M. Wolfe and T. S. Horowitz, “Five factors that guide attention in visual search,” Nature Human Behaviour, vol. 1, no. 3, p. 0058, 2017.
  18. V. Navalpakkam and L. Itti, “Modeling the influence of task on attention,” Vision research, vol. 45, no. 2, pp. 205–231, 2005.
  19. S. Frintrop, VOCUS: A visual attention system for object detection and goal-directed search. Springer, 2006.
  20. A. Rosenfeld, M. Biparva, and J. K. Tsotsos, “Priming neural networks,” in CVPR Workshops, pp. 2011–2020, 2018.
  21. V. Ramanishka, A. Das, J. Zhang, and K. Saenko, “Top-down visual saliency guided by captions,” in CVPR, pp. 7206–7215, 2017.
  22. Z. Ding, X. Ren, E. David, M. Vo, G. Kreiman, and M. Zhang, “Efficient zero-shot visual search via target and context-aware transformer,” arXiv:2211.13470, 2022.
  23. C. Cao, X. Liu, Y. Yang, Y. Yu, J. Wang, Z. Wang, Y. Huang, L. Wang, C. Huang, W. Xu, D. Ramanan, and T. S. Huang, “Look and think twice: Capturing top-down visual attention with feedback convolutional neural networks,” in ICCV, pp. 2956–2964, 2015.
  24. M. Biparva and J. Tsotsos, “STNet: Selective Tuning of convolutional networks for object localization,” in ICCV Workshops, 2017.
  25. J. Zhang, S. A. Bargal, Z. Lin, J. Brandt, X. Shen, and S. Sclaroff, “Top-down neural attention by excitation backprop,” International Journal of Computer Vision, vol. 126, no. 10, pp. 1084–1102, 2018.
  26. I. Kotseruba and J. K. Tsotsos, “Attention for vision-based assistive and automated driving: A review of algorithms and datasets,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, 2022.
  27. M. Ning, C. Lu, and J. Gong, “An Efficient Model for Driving Focus of Attention Prediction using Deep Learning,” in Proceedings of the IEEE International Conference on Intelligent Transportation Systems (ITSC), 2019.
  28. T. Deng, H. Yan, L. Qin, T. Ngo, and B. Manjunath, “How do drivers allocate their potential attention? driving fixation prediction via convolutional neural networks,” IEEE Transactions on Intelligent Transportation Systems, vol. 21, no. 5, pp. 2146–2154, 2019.
  29. J. Fang, D. Yan, J. Qiao, J. Xue, and H. Yu, “DADA: Driver attention prediction in driving accident scenarios,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 6, pp. 4959–4971, 2021.
  30. D. Gopinath, G. Rosman, S. Stent, K. Terahata, L. Fletcher, B. Argall, and J. Leonard, “MAAD: A Model and Dataset for “Attended Awareness” in Driving,” in ICCV Workshops, pp. 3426–3436, 2021.
  31. Y. Xia, D. Zhang, J. Kim, K. Nakayama, K. Zipser, and D. Whitney, “Predicting driver attention in critical situations,” in ACCV, 2018.
  32. S. Gan, X. Pei, Y. Ge, Q. Wang, S. Shang, S. E. Li, and B. Nie, “Multisource adaption for driver attention prediction in arbitrary driving scenes,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 11, pp. 20912–20925, 2022.
  33. A. Pal, S. Mondal, and H. I. Christensen, ““Looking at the Right Stuff” — Guided Semantic-Gaze for Autonomous Driving,” in CVPR, 2020.
  34. T. Deng, K. Yang, Y. Li, and H. Yan, “Where does the driver look? top-down-based saliency detection in a traffic driving environment,” IEEE Transactions on Intelligent Transportation Systems, vol. 17, no. 7, pp. 2051–2062, 2016.
  35. H. R. Tavakoli, E. Rahtu, J. Kannala, and A. Borji, “Digging deeper into egocentric gaze prediction,” in WACV, 2019.
  36. T. Deng, H. Yan, and Y.-J. Li, “Learning to boost bottom-up fixation prediction in driving environments via random forest,” IEEE Transactions on Intelligent Transportation Systems, vol. 19, no. 9, pp. 3059–3067, 2017.
  37. M. F. Land and D. N. Lee, “Where we look when we steer,” Nature, vol. 369, no. 6483, pp. 742–744, 1994.
  38. A. Borji, D. N. Sihite, and L. Itti, “Computational modeling of top-down visual attention in interactive environments.,” in BMVC, 2011.
  39. A. Borji, D. N. Sihite, and L. Itti, “What/where to look next? Modeling top-down visual attention in complex interactive environments,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 44, no. 5, pp. 523–538, 2013.
  40. A. Tawari and B. Kang, “A computational framework for driver’s visual attention using a fully convolutional architecture,” in Proceedings of the IEEE Symposium on Intelligent Vehicles (IV), 2017.
  41. P. V. Amadori, T. Fischer, and Y. Demiris, “HammerDrive: A Task-Aware Driving Visual Attention Model,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 6, 2022.
  42. S. Baee, E. Pakdamanian, I. Kim, L. Feng, V. Ordonez, and L. Barnes, “MEDIRL: Predicting the visual attention of drivers via maximum entropy deep inverse reinforcement learning,” in ICCV, 2021.
  43. I. Kasahara, S. Stent, and H. S. Park, “Look Both Ways: Self-supervising Driver Gaze Estimation and Road Scene Saliency,” in Proceedings of the European Conference on Computer Vision, pp. 126–142, 2022.
  44. J. Fang, D. Yan, J. Qiao, J. Xue, H. Wang, and S. Li, “DADA-2000: Can Driving Accident be Predicted by Driver Attentionf Analyzed by A Benchmark,” in Proceedings of the IEEE Conference on International Intelligent Transportation Systems (ITSC), 2019.
  45. J. Ross, M. C. Morrone, M. E. Goldberg, and D. C. Burr, “Changes in visual perception at the time of saccades,” Trends in Neurosciences, vol. 24, no. 2, pp. 113–121, 2001.
  46. A. Vedaldi and B. Fulkerson, “VLFeat: An open and portable library of computer vision algorithms,” in Proceedings of the ACM International Conference on Multimedia, pp. 1469–1472, 2010.
  47. K. Kurzhals and D. Weiskopf, “Space-time visual analytics of eye-tracking data for dynamic stimuli,” IEEE Transactions on Visualization and Computer Graphics, vol. 19, no. 12, pp. 2129–2138, 2013.
  48. Z. Teed and J. Deng, “RAFT: Recurrent all-pairs field transforms for optical flow,” in ECCV, 2020.
  49. P. K. Mital, T. J. Smith, R. L. Hill, and J. M. Henderson, “Clustering of gaze during dynamic scene viewing is predicted by motion,” Cognitive computation, vol. 3, pp. 5–24, 2011.
  50. M. Jiang, S. Huang, J. Duan, and Q. Zhao, “Salicon: Saliency in context,” in CVPR, 2015.
  51. J. Xu, M. Jiang, S. Wang, M. S. Kankanhalli, and Q. Zhao, “Predicting human gaze beyond pixels,” Journal of vision, vol. 14, no. 1, pp. 28–28, 2014.
  52. A. Nguyen, Z. Yan, and K. Nahrstedt, “Your attention is unique: Detecting 360-degree video saliency in head-mounted display for head movement prediction,” in Proceedings of the ACM International Conference on Multimedia, pp. 1190–1198, 2018.
  53. O. Lappi, P. Rinkkala, and J. Pekkanen, “Systematic observation of an expert driver’s gaze strategy – an on-road case study,” Frontiers in psychology, vol. 8, p. 620, 2017.
  54. M. Kümmerer, T. S. Wallis, and M. Bethge, “DeepGaze II: Reading fixations from deep features trained on object recognition,” arXiv:1610.01563, 2016.
  55. R. Droste, J. Jiao, and J. A. Noble, “Unified image and video saliency modeling,” in ECCV, 2020.
  56. S. Jain, P. Yarlagadda, S. Jyoti, S. Karthik, R. Subramanian, and V. Gandhi, “ViNet: Pushing the limits of visual modality for audio-visual saliency prediction,” in IROS, pp. 3520–3527, 2021.
  57. R. Fahimi and N. D. Bruce, “On metrics for measuring scanpath similarity,” Behavior Research Methods, vol. 53, no. 2, pp. 609–628, 2021.
  58. Z. Bylinskii, T. Judd, A. Oliva, A. Torralba, and F. Durand, “What do different evaluation metrics tell us about saliency models?,” IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol. 41, no. 3, pp. 740–757, 2018.
  59. Z. Liu, J. Ning, Y. Cao, Y. Wei, Z. Zhang, S. Lin, and H. Hu, “Video Swin Transformer,” in CVPR, pp. 3202–3211, 2022.
  60. D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning spatiotemporal features with 3D convolutional networks,” in ICCV, 2015.
  61. S. Xie, C. Sun, J. Huang, Z. Tu, and K. Murphy, “Rethinking spatiotemporal feature learning for video understanding,” in ECCV, pp. 318–335, 2018.
  62. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in NeurIPS, 2017.
  63. A. Rasouli and I. Kotseruba, “PedFormer: Pedestrian Behavior Prediction via Cross-Modal Attention Modulation and Gated Multitask Learning,” in ICRA, IEEE, 2023.
  64. J. Carreira and A. Zisserman, “Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset,” in CVPR, 2017.
  65. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv:1412.6980, 2014.
  66. J. L. Campbell, C. Carney, and B. H. Kantowitz, “Human factors design guidelines for advanced traveler information systems (ATIS) and commercial vehicle operations (CVO),” Tech. Rep. FHWA-RD-98-057, US department of Transportation, Federal Highway Administration, 1998.
  67. Z. Bylinski, “MATLAB implementation of saliency metrics.” https://github.com/cvzoya/saliency/blob/master/code_forMetrics.
  68. R. Fahimi and N. D. Bruce, “Code for ”On metrics for measuring scanpath similarity”.” https://github.com/rAm1n/saliency.
  69. M. Kümmerer, Z. Bylinskii, T. Judd, A. Borji, L. Itti, F. Durand, A. Oliva, and A. Torralba, “MIT/Tübingen Saliency Benchmark.” https://saliency.tuebingen.ai/.
  70. M. Kümmerer, T. S. A. Wallis, and M. Bethge, “Saliency Benchmarking Made Easy: Separating Models, Maps and Metrics,” in Proceedings of the European Conference on Computer Vision, 2018.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Iuliia Kotseruba (23 papers)
  2. John K. Tsotsos (52 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.