Learning User Embeddings from Human Gaze for Personalised Saliency Prediction (2403.13653v2)
Abstract: Reusable embeddings of user behaviour have shown significant performance improvements for the personalised saliency prediction task. However, prior works require explicit user characteristics and preferences as input, which are often difficult to obtain. We present a novel method to extract user embeddings from pairs of natural images and corresponding saliency maps generated from a small amount of user-specific eye tracking data. At the core of our method is a Siamese convolutional neural encoder that learns the user embeddings by contrasting the image and personal saliency map pairs of different users. Evaluations on two public saliency datasets show that the generated embeddings have high discriminative power, are effective at refining universal saliency maps to the individual users, and generalise well across users and images. Finally, based on our model's ability to encode individual user characteristics, our work points towards other applications that can benefit from reusable embeddings of gaze behaviour.
- Gaze-enhanced Crossmodal Embeddings for Emotion Recognition. Proceedings of the ACM on Human-Computer Interaction 6, ETRA (2022), 1–18.
- Mingxiao An and Sundong Kim. 2021. Neural user embedding from browsing events. In Machine Learning and Knowledge Discovery in Databases: Applied Data Science Track: European Conference, ECML PKDD 2020, Ghent, Belgium, September 14–18, 2020, Proceedings, Part IV. Springer, 175–191.
- Saliency based image cropping. In Image Analysis and Processing–ICIAP 2013: 17th International Conference, Naples, Italy, September 9-13, 2013. Proceedings, Part I 17. Springer, 773–782.
- Eye movements reveal epistemic curiosity in human observers. Vision research 117 (2015), 81–90.
- Salient object detection: A survey. Computational visual media 5 (2019), 117–150.
- The relationship between gaze behavior, expertise, and performance: A systematic review. Psychological bulletin 145, 10 (2019), 980.
- Maximilian Davide Broda and Benjamin De Haas. 2022. Individual differences in looking at persons in scenes. Journal of Vision 22, 12 (2022), 9–9.
- Maximilian D Broda and Benjamin de Haas. 2022. Individual fixation tendencies in person viewing generalize from images to videos. i-Perception 13, 6 (2022), 20416695221128844.
- Guy Thomas Buswell. 1935. How people look at pictures: a study of the psychology and perception in art. (1935).
- Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007).
- Predicting human scanpaths in visual question answering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10876–10885.
- A deep multi-level network for saliency prediction. In 2016 23rd International Conference on Pattern Recognition (ICPR). IEEE, 3488–3493.
- Individual differences in visual salience vary along semantic dimensions. Proceedings of the National Academy of Sciences 116, 24 (2019), 11687–11692.
- Gaze behaviour of expert and novice microneurosurgeons differs during observations of tumor removal recordings. In Proceedings of the Symposium on Eye Tracking Research and Applications. 377–380.
- A survey of user profiling: State-of-the-art, challenges, and solutions. IEEE Access 7 (2019), 144907–144924.
- How much time do you have? modeling multi-duration saliency. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4473–4482.
- User profiles for personalized information access. The adaptive Web: methods and strategies of Web personalization (2007), 54–89.
- On-device few-shot personalization for real-time gaze estimation. In Proceedings of the IEEE/CVF international conference on computer vision workshops. 0–0.
- Exploring duality in visual question-driven top-down saliency. IEEE transactions on neural networks and learning systems 31, 7 (2019), 2672–2679.
- Human attention in image captioning: Dataset and analysis. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8529–8538.
- In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737 (2017).
- Gender differences for specific body regions when looking at men and women. Journal of Nonverbal Behavior 32 (2008), 67–78.
- Eye movements during everyday behavior predict personality traits. Frontiers in human neuroscience (2018), 105.
- Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning. pmlr, 448–456.
- Shamsi T Iqbal and Brian P Bailey. 2004. Using eye gaze patterns to identify user tasks. In The Grace Hopper Celebration of Women in Computing, Vol. 4. 2004.
- Laurent Itti and Christof Koch. 2001. Computational modelling of visual attention. Nature reviews neuroscience 2, 3 (2001), 194–203.
- A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on pattern analysis and machine intelligence 20, 11 (1998), 1254–1259.
- Kalervo Järvelin and Jaana Kekäläinen. 2017. IR evaluation methods for retrieving highly relevant documents. In ACM SIGIR Forum, Vol. 51. ACM New York, NY, USA, 243–250.
- Salicon: Saliency in context. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1072–1080.
- A benchmark of computational models of saliency to predict human fixations. (2012).
- Learning to predict where humans look. In 2009 IEEE 12th international conference on computer vision. IEEE, 2106–2113.
- Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
- Gaze distribution analysis and saliency prediction across age groups. PloS one 13, 2 (2018), e0193149.
- Bruce Krulwich. 1997. Lifestyle finder: Intelligent user profiling using large-scale demographic data. AI magazine 18, 2 (1997), 37–37.
- Deeply-supervised nets. In Artificial intelligence and statistics. PMLR, 562–570.
- Network in network. arXiv preprint arXiv:1312.4400 (2013).
- DeepGaze IIE: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 12919–12928.
- Marcel Linka and Benjamin de Haas. 2020. OSIEshort: A small stimulus set can reliably estimate individual differences in semantic salience. Journal of vision 20, 9 (2020), 13–13.
- A user attention model for video summarization. In Proceedings of the tenth ACM international conference on Multimedia. 533–542.
- Useridentifier: Implicit user representations for simple and effective personalized sentiment analysis. arXiv preprint arXiv:2110.00135 (2021).
- Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention. arXiv preprint arXiv:2303.15274 (2023).
- Few-shot personalized saliency prediction based on adaptive image selection considering object and visual attention. Sensors 20, 8 (2020), 2170.
- Shallow and deep convolutional networks for saliency prediction. In Proceedings of the IEEE conference on computer vision and pattern recognition. 598–606.
- Syskill & Webert: Identifying interesting web sites. In AAAI/IAAI, Vol. 1. 54–61.
- Curious eyes: Individual differences in personality predict eye movement behavior in scene-viewing. Cognition 122, 1 (2012), 86–90.
- Gender classification based on eye movements: A processing effect during passive face viewing. Advances in cognitive psychology 13, 3 (2017), 232.
- Deep gaze pooling: Inferring and visually decoding search intents from human gaze fixations. Neurocomputing 387 (2020), 369–382.
- An instructable, adaptive interface for discovering and monitoring information on the world-wide web. In Proceedings of the 4th international conference on Intelligent user interfaces. 157–160.
- Differences in visual search behavior between expert and novice team sports athletes: A systematic review with meta-analysis. (2022).
- VQA-MHUG: A gaze dataset to study multimodal neural attention in visual question answering. arXiv preprint arXiv:2109.13116 (2021).
- Multimodal Integration of Human-Like Attention in Visual Question Answering. In Proc. Workshop on Gaze Estimation and Prediction in the Wild (GAZE), CVPRW. 2647–2657. https://openaccess.thecvf.com/content/CVPR2023W/GAZE/papers/Sood_Multimodal_Integration_of_Human-Like_Attention_in_Visual_Question_Answering_CVPRW_2023_paper.pdf
- Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15, 1 (2014), 1929–1958.
- Neural Photofit: gaze-based mental image reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 245–254.
- Facial Composite Generation with Iterative Human Feedback. In Annual Conference on Neural Information Processing Systems. PMLR, 165–183.
- Roel Vertegaal. 2002. Designing attentive interfaces. In Proceedings of the 2002 symposium on Eye tracking research & applications. 23–30.
- Analysing eye gaze patterns during confusion and errors in human–agent collaborations. In 2022 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN). IEEE, 224–229.
- Dirk Walther and Christof Koch. 2006. Modeling attention to salient proto-objects. Neural networks 19, 9 (2006), 1395–1407.
- Author2vec: A framework for generating user embedding. arXiv preprint arXiv:2003.11627 (2020).
- Personalized saliency and its prediction. IEEE transactions on pattern analysis and machine intelligence 41, 12 (2018), 2975–2989.
- Beyond Universal Saliency: Personalized Saliency Prediction with Multi-task CNN.. In IJCAI. 3887–3893.
- Predicting goal-directed human attention using inverse reinforcement learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 193–202.
- Alfred L Yarbus and Alfred L Yarbus. 1967. Eye movements during perception of complex objects. Eye movements and vision (1967), 171–211.
- Bingqing Yu. 2018. Personalization of saliency estimation. McGill University (Canada).
- Gaze prediction for recommender systems. In Proceedings of the 10th ACM Conference on Recommender Systems. 131–138.
- UserAdapter: Few-shot user learning in sentiment analysis. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 1484–1488.