Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
112 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Modeling social interaction dynamics using temporal graph networks (2404.06611v1)

Published 5 Apr 2024 in cs.HC and cs.SI

Abstract: Integrating intelligent systems, such as robots, into dynamic group settings poses challenges due to the mutual influence of human behaviors and internal states. A robust representation of social interaction dynamics is essential for effective human-robot collaboration. Existing approaches often narrow their focus to facial expressions or speech, overlooking the broader context. We propose employing an adapted Temporal Graph Networks to comprehensively represent social interaction dynamics while enabling its practical implementation. Our method incorporates temporal multi-modal behavioral data including gaze interaction, voice activity and environmental context. This representation of social interaction dynamics is trained as a link prediction problem using annotated gaze interaction data. The F1-score outperformed the baseline model by 37.0%. This improvement is consistent for a secondary task of next speaker prediction which achieves an improvement of 29.0%. Our contributions are two-fold, including a model to representing social interaction dynamics which can be used for many downstream human-robot interaction tasks like human state inference and next speaker prediction. More importantly, this is achieved using a more concise yet efficient message passing method, significantly reducing it from 768 to 14 elements, while outperforming the baseline model.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. H. Hamann, Y. Khaluf, J. Botev, M. Divband Soorati, E. Ferrante, O. Kosak, J.-M. Montanier, S. Mostaghim, R. Redpath, J. Timmis, F. Veenstra, M. Wahby, and A. Zamuda, “Hybrid societies: Challenges and perspectives in the design of collective behavior in self-organizing systems,” Frontiers in Robotics and AI, vol. 3, 2016.
  2. J. Holler and G. Beattie, “How iconic gestures and speech interact in the representation of meaning: Are both aspects really integral to the process?” 2003.
  3. J. K. Burgoon and A. E. Bacue, “Nonverbal communication skills,” Handbook of communication and social interaction skills, pp. 179–219, 2003.
  4. J. Y. Chew and I. Jayarathne, “Human states and nonverbal cues in multi-party facilitation: A statistical perspective,” in Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction, ser. HRI ’24.   New York, NY, USA: Association for Computing Machinery, 2024, p. 317–321.
  5. Z. Shen, A. Elibol, and N. Y. Chong, “Understanding nonverbal communication cues of human personality traits in human-robot interaction,” IEEE/CAA Journal of Automatica Sinica, vol. 7, no. 6, pp. 1465–1477, 2020.
  6. J. Urakami and K. Seaborn, “Nonverbal cues in human–robot interaction: A communication studies perspective,” ACM Transactions on Human-Robot Interaction, vol. 12, no. 2, pp. 1–21, 2023.
  7. S. Gillet, M. T. Parreira, M. Vázquez, and I. Leite, “Learning gaze behaviors for balancing participation in group human-robot interactions,” in 2022 17th ACM/IEEE International Conference on Human-Robot Interaction (HRI).   IEEE, 2022, pp. 265–274.
  8. N. Yamashita, K. Hirata, S. Aoyagi, H. Kuzuoka, and Y. Harada, “Impact of seating positions on group video communication,” in Proceedings of the 2008 ACM conference on Computer supported cooperative work, 2008, pp. 177–186.
  9. Ü. Arslan Aydin, S. Kalkan, and C. Acartürk, “Speech driven gaze in a face-to-face interaction,” Frontiers in Neurorobotics, vol. 15, p. 8, 2021.
  10. J. Y. Chew and X. Wang, “Joint attention estimation during multi-party facilitation using multi-modal fusion,” in Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction, ser. HRI ’24.   New York, NY, USA: Association for Computing Machinery, 2024, p. 322–326.
  11. P. Wei, Y. Liu, T. Shu, N. Zheng, and S.-C. Zhu, “Where and why are they looking? jointly inferring human attention and intentions in complex tasks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 6801–6809.
  12. C. Nakatani, H. Kawashima, and N. Ukita, “Interaction-aware joint attention estimation using people attributes,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 10 224–10 233.
  13. Z. Yumak and N. Magnenat-Thalmann, “Multimodal and multi-party social interactions,” in Context aware human-robot and human-agent interaction.   Springer, 2015, pp. 275–298.
  14. P. W. Battaglia, J. B. Hamrick, V. Bapst, A. Sanchez-Gonzalez, V. Zambaldi, M. Malinowski, A. Tacchetti, D. Raposo, A. Santoro, R. Faulkner et al., “Relational inductive biases, deep learning, and graph networks,” arXiv preprint arXiv:1806.01261, 2018.
  15. M. Nasri, Z. Fang, M. Baratchi, G. Englebienne, S. Wang, A. Koutamanis, and C. Rieffe, “A gnn-based architecture for group detection from spatio-temporal trajectory data,” in International Symposium on Intelligent Data Analysis.   Springer, 2023, pp. 327–339.
  16. F. Yang, W. Yin, T. Inamura, M. Björkman, and C. Peters, “Group behavior recognition using attention-and graph-based neural networks,” in ECAI 2020.   IOS Press, 2020, pp. 1626–1633.
  17. G. Sharma, K. Stefanov, A. Dhall, and J. Cai, “Graph-based group modelling for backchannel detection,” in Proceedings of the 30th ACM International Conference on Multimedia, 2022, pp. 7190–7194.
  18. L. Fan, W. Wang, S. Huang, X. Tang, and S.-C. Zhu, “Understanding human gaze communication by spatio-temporal graph reasoning,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 5724–5733.
  19. Y. Sugano, Y. Matsushita, and Y. Sato, “Appearance-based gaze estimation using visual saliency,” IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 2, pp. 329–341, 2012.
  20. A. Recasens, A. Khosla, C. Vondrick, and A. Torralba, “Where are they looking?” Advances in neural information processing systems, vol. 28, 2015.
  21. E. Chong, N. Ruiz, Y. Wang, Y. Zhang, A. Rozga, and J. M. Rehg, “Connecting gaze, scene, and attention: Generalized attention estimation via joint modeling of gaze and scene saliency,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 383–398.
  22. M. Rasenberg, W. Pouw, A. Özyürek, and M. Dingemanse, “The multimodal nature of communicative efficiency in social interaction,” Scientific Reports, vol. 12, 11 2022.
  23. L. Hadley, W. Brimijoin, and W. Whitmer, “Speech, movement, and gaze behaviours during dyadic conversation in noise,” Scientific Reports, vol. 9, 07 2019.
  24. R. Trabelsi, J. Varadarajan, L. Zhang, I. Jabri, Y. Pei, F. Smach, A. Bouallegue, and P. Moulin, “Understanding the dynamics of social interactions: A multi-modal multi-view approach,” ACM Transactions on Multimedia Computing, Communications and Applications, vol. 15, no. 1s, Feb. 2019, publisher Copyright: © 2019 Association for Computing Machinery.
  25. J. Y. Chew and K. Nakamura, “Who to teach a robot to facilitate multi-party social interactions?” in Companion of the 2023 ACM/IEEE International Conference on Human-Robot Interaction, 2023, pp. 127–131.
  26. F. Tonini, N. Dall’Asen, C. Beyan, and E. Ricci, “Object-aware gaze target detection,” 2023.
  27. T. J. Park, N. Kanda, D. Dimitriadis, K. J. Han, S. Watanabe, and S. Narayanan, “A review of speaker diarization: Recent advances with deep learning,” 2021.
  28. X. Ren, A. Lattas, B. Gecer, J. Deng, C. Ma, X. Yang, and S. Zafeiriou, “Facial geometric detail recovery via implicit representation,” 2022.
  29. J. Deng, J. Guo, X. An, Z. Zhu, and S. Zafeiriou, “Masked face recognition challenge: The insightface track report,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 1437–1444.
  30. E. Rossi, B. Chamberlain, F. Frasca, D. Eynard, F. Monti, and M. Bronstein, “Temporal graph networks for deep learning on dynamic graphs,” arXiv preprint arXiv:2006.10637, 2020.
  31. J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: pre-training of deep bidirectional transformers for language understanding,” CoRR, vol. abs/1810.04805, 2018. [Online]. Available: http://arxiv.org/abs/1810.04805
  32. S. Kumar, X. Zhang, and J. Leskovec, “Predicting dynamic embedding trajectory in temporal interaction networks,” in Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, 2019, pp. 1269–1278.
  33. K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using rnn encoder-decoder for statistical machine translation,” arXiv preprint arXiv:1406.1078, 2014.
  34. M. Fey and J. E. Lenssen, “Fast graph representation learning with PyTorch Geometric,” in ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019.
  35. F. Poursafaei, S. Huang, K. Pelrine, and R. Rabbany, “Towards better evaluation for dynamic link prediction,” Advances in Neural Information Processing Systems, vol. 35, pp. 32 928–32 941, 2022.
  36. R. Trivedi, M. Farajtabar, P. Biswal, and H. Zha, “Dyrep: Learning representations over dynamic graphs,” in International conference on learning representations, 2019.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com