Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Modular Multimodal Architecture for Gaze Target Prediction: Application to Privacy-Sensitive Settings (2307.05158v1)

Published 11 Jul 2023 in cs.CV

Abstract: Predicting where a person is looking is a complex task, requiring to understand not only the person's gaze and scene content, but also the 3D scene structure and the person's situation (are they manipulating? interacting or observing others? attentive?) to detect obstructions in the line of sight or apply attention priors that humans typically have when observing others. In this paper, we hypothesize that identifying and leveraging such priors can be better achieved through the exploitation of explicitly derived multimodal cues such as depth and pose. We thus propose a modular multimodal architecture allowing to combine these cues using an attention mechanism. The architecture can naturally be exploited in privacy-sensitive situations such as surveillance and health, where personally identifiable information cannot be released. We perform extensive experiments on the GazeFollow and VideoAttentionTarget public datasets, obtaining state-of-the-art performance and demonstrating very competitive results in the privacy setting case.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Anshul Gupta (20 papers)
  2. Samy Tafasca (5 papers)
  3. Jean-Marc Odobez (24 papers)
Citations (20)

Summary

We haven't generated a summary for this paper yet.