Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Trends, Applications, and Challenges in Human Attention Modelling (2402.18673v2)

Published 28 Feb 2024 in cs.CV and cs.AI

Abstract: Human attention modelling has proven, in recent years, to be particularly useful not only for understanding the cognitive processes underlying visual exploration, but also for providing support to artificial intelligence models that aim to solve problems in various domains, including image and video processing, vision-and-language applications, and LLMling. This survey offers a reasoned overview of recent efforts to integrate human attention mechanisms into contemporary deep learning models and discusses future research directions and challenges. For a comprehensive overview on the ongoing research refer to our dedicated repository available at https://github.com/aimagelab/awesome-human-visual-attention.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. Deep saliency prior for reducing visual distraction. In CVPR, 2022.
  2. Gaze complements control input for goal prediction during assisted teleoperation. In RSS, 2022.
  3. Tempsal-uncovering temporal information for deep saliency prediction. In CVPR, 2023.
  4. MEDIRL: Predicting the Visual Attention of Drivers via Maximum Entropy Deep Inverse Reinforcement Learning. In ICCV, 2021.
  5. Predicting Native Language from Gaze. In ACL, 2017.
  6. RadioTransformer: A Cascaded Global-Focal Transformer for Visual Attention–Guided Disease Classification. In ECCV, 2022.
  7. Learning visual importance for graphic designs and data visualizations. In ACM UIST, 2017.
  8. What do different evaluation metrics tell us about saliency models? IEEE TPAMI, 41(3), 2019.
  9. AiR: Attention with Reasoning Capability. In ECCV, 2020.
  10. Learning from unique perspectives: User-aware saliency modeling. In CVPR, 2023.
  11. Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention. ACM TOMM, 14(2), 2018.
  12. Predicting Human Eye Fixations via an LSTM-Based Saliency Attentive Model. IEEE TIP, 27(10), 2018.
  13. Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions? In EMNLP, 2016.
  14. Pre-Trained Language Models Augmented with Synthetic Scanpaths for Natural Language Understanding. In EMNLP, 2023.
  15. Shifting more attention to video salient object detection. In CVPR, 2019.
  16. Predicting visual importance across graphic design types. In ACM UIST, 2020.
  17. Guiding human gaze with convolutional neural networks. arXiv:1712.06492, 2017.
  18. Automatic gaze analysis: A survey of deep learning based approaches. IEEE TPAMI, 46(1), 2023.
  19. Ego4D: Around the world in 3,000 hours of egocentric video. In CVPR, 2022.
  20. Privacy-Aware Eye Tracking: Challenges and Future Directions. IEEE Pervasive Computing, 22(1), 2023.
  21. Human Attention in Image Captioning: Dataset and Analysis. In ICCV, 2019.
  22. Multi-Modal Gaze Following in Conversational Scenarios. In WACV, 2024.
  23. Nonverbal Robot Feedback for Human Teachers. In CoRL, 2019.
  24. GazeVQA: A video question answering dataset for multiview eye-gaze task-oriented collaborations. In EMNLP, 2023.
  25. A model of saliency-based visual attention for rapid scene analysis. IEEE TPAMI, 20(11), 1998.
  26. SALICON: Saliency in Context. In CVPR, 2015.
  27. Saliency-guided image translation. In CVPR, 2021.
  28. Learning to predict where humans look. In ICCV, 2009.
  29. Creation and validation of a chest X-ray dataset with eye-tracking and report dictation for AI development. Scient. Data, 8(1), 2021.
  30. Synthesizing Human Gaze Feedback for Improved NLP Performance. In EACL, 2023.
  31. Predicting visual fixations. Annual Review of Vision Science, 9, 2023.
  32. Deepgaze iii: Modeling free-viewing human scanpaths with deep learning. J. of Vision, 22(5), 2022.
  33. Saccadic model of eye movements for free-viewing condition. Vision Research, 116, 2015.
  34. Understanding visual saliency in mobile user interfaces. In ACM MobileHCI, 2020.
  35. Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing, 494, 2022.
  36. Eye-gaze-guided vision transformer for rectifying shortcut learning. IEEE Trans. Med. Imaging, 41(11), 2023.
  37. Realistic saliency guided image enhancement. In CVPR, 2023.
  38. Integrating human gaze into attention for egocentric activity recognition. In WACV, 2021.
  39. Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention. In CVPR, 2023.
  40. Looking at the right stuff - guided semantic-gaze for autonomous driving. In CVPR, 2020.
  41. Predicting the Driver’s Focus of Attention: The DR(eye)VE Project. IEEE TPAMI, 41(7), 2018.
  42. Gvgnet: Gaze-directed visual grounding for learning under-specified object referring intention. IEEE RA-L, 8(9), 2023.
  43. Exploring Human-like Attention Supervision in Visual Question Answering. In AAAI, 2018.
  44. Joint learning of audio-visual saliency prediction and sound source localization on multi-face videos. IJCV, 2023.
  45. Inferring native and non-native human reading comprehension and subjective text difficulty from scanpaths in reading. In ETRA, 2022.
  46. Understanding teacher gaze patterns for robot learning. In CoRL, 2019.
  47. Native Language Prediction from Gaze: a Reproducibility Study. In ACL Workshops, 2023.
  48. Improving Natural Language Processing Tasks with Human Gaze-Guided Neural Attention. In NeurIPS, 2020.
  49. VQA-MHUG: A gaze dataset to study multimodal neural attention in visual question answering. In CoNLL, 2021.
  50. Multimodal Integration of Human-Like Attention in Visual Question Answering. In CVPR Workshops, 2023.
  51. Seeing with Humans: Gaze-Assisted Neural Image Captioning. arXiv:1608.05203, 2016.
  52. ScanDMM: A Deep Markov Model of Scanpath Prediction for 360 Images. In CVPR, 2023.
  53. Generating Image Descriptions via Sequential Cross-Modal Alignment Guided by Human Gaze. In EMNLP, 2020.
  54. Foveated Neural Computation. In ECML PKDD, 2022.
  55. Learning unsupervised video object segmentation through visual attention. In CVPR, 2019.
  56. Follow my eye: Using gaze to supervise computer-aided diagnosis. IEEE Trans. Med. Imaging, 41(7), 2022.
  57. GazeGNN: A Gaze-Guided Graph Neural Network for Chest X-Ray Classification. In WACV, 2024.
  58. Predicting driver attention in critical situations. In ACCV, 2018.
  59. Voila-A: Aligning Vision-Language Models with User’s Gaze Attention. arXiv:2401.09454, 2023.
  60. SGDNet: an end-to-end saliency-guided deep neural network for no-reference image quality assessment. In ACM MM, 2019.
  61. Supervising Neural Attention Models for Video Captioning by Human Gaze Data. In CVPR, 2017.
  62. Gravitational laws of focus of attention. IEEE TPAMI, 42(12), 2020.
  63. Human gaze assisted artificial intelligence: A review. In IJCAI, 2020.
  64. Saliency-guided Transformer network combined with local embedding for no-reference image quality assessment. In ICCV, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Giuseppe Cartella (6 papers)
  2. Marcella Cornia (61 papers)
  3. Vittorio Cuculo (4 papers)
  4. Alessandro D'Amelio (3 papers)
  5. Dario Zanca (32 papers)
  6. Giuseppe Boccignone (11 papers)
  7. Rita Cucchiara (142 papers)
Citations (4)