Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 150 tok/s
Gemini 2.5 Pro 50 tok/s Pro
GPT-5 Medium 31 tok/s Pro
GPT-5 High 26 tok/s Pro
GPT-4o 105 tok/s Pro
Kimi K2 185 tok/s Pro
GPT OSS 120B 437 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Enhancing Apparent Personality Trait Analysis with Cross-Modal Embeddings (2405.03846v1)

Published 6 May 2024 in cs.CV and cs.HC

Abstract: Automatic personality trait assessment is essential for high-quality human-machine interactions. Systems capable of human behavior analysis could be used for self-driving cars, medical research, and surveillance, among many others. We present a multimodal deep neural network with a Siamese extension for apparent personality trait prediction trained on short video recordings and exploiting modality invariant embeddings. Acoustic, visual, and textual information are utilized to reach high-performance solutions in this task. Due to the highly centralized target distribution of the analyzed dataset, the changes in the third digit are relevant. Our proposed method addresses the challenge of under-represented extreme values, achieves 0.0033 MAE average improvement, and shows a clear advantage over the baseline multimodal DNN without the introduced module.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. Analysis of emotion recognition using facial expressions, speech and multimodal information. In Proceedings of the 6th International Conference on Multimodal Interfaces, pages 205–211, 2014.
  2. J. M. Digman. Personality structure: Emergence of the five-factor model. Annual Review of Psychology, 41(1):417–440, 1990.
  3. Explaining first impressions: Modeling, recognizing, and explaining apparent personality from videos. IEEE Transactions on Affective Computing, pages 1–18, 2018.
  4. Modeling, recognizing, and explaining apparent personality from videos. IEEE Transactions on Affective Computing, 2020.
  5. The geneva minimalistic acoustic parameter set (gemaps) for voice research and affective computing. IEEE Transactions on Affective Computing, 7(2):190–202, 2015.
  6. Opensmile: the munich versatile and fast open-source audio feature extractor. In Proceedings of the 18th ACM international conference on Multimedia, pages 1459–1462, 2010.
  7. Emotion recognition in speech with latent discriminative representations learning. Acta Acustica united with Acustica, 104(5):737–740, 2018.
  8. Emobed: Strengthening monomodal emotion recognition via training with crossmodal emotion embeddings. IEEE Transactions on Affective Computing, 2019.
  9. E. Hoffer and N. Ailon. Deep metric learning using triplet network. In International Workshop on Similarity-Based Pattern Recognition, pages 84–92, 2015.
  10. First impressions: A survey on vision-based apparent personality trait analysis. IEEE Transactions on Affective Computing, pages 1–20, 2019.
  11. First impressions: A survey on vision-based apparent personality trait analysis. arXiv: Computer Vision and Pattern Recognition, 2018.
  12. Investigating audio, visual, and text fusion methods for end-to-end automatic personality prediction. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pages 606–611, 2018.
  13. Learning consistent feature representation for cross-modal multimedia retrieval. IEEE Transactions on Multimedia, 17(3):370–381, 2015.
  14. Multimodal score fusion and decision trees for explainable automatic job candidate screening from video cvs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 1–9, 2017.
  15. D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint, arXiv:1412.6980, 2014.
  16. Cr-net: A deep classification-regression network for multimodal apparent personality analysis. International Journal of Computer Vision, pages 1–18, 2020.
  17. Multimodal deep learning. In Proceedings of the 28th International Conference on Machine Learning (ICML), pages 689–696, 2011.
  18. Chalearn lap 2016: First round challenge on first impressions - dataset and results. In Computer Vision – ECCV 2016 Workshops, Lecture Notes in Computer Science, pages 400–418, 2016.
  19. C. Raffel and D. P. W. Ellis. Feed-forward networks with attention can solve some long-term memory problems. arXiv preprint, arXiv:1512.08756, 2016.
  20. Learning factorized multimodal representations. In International Conference on Learning Representations (ICLR), pages 1–20, 2019.
  21. Adversarial cross-modal retrieval. In Proceedings of the 25th ACM international conference on Multimedia, pages 154–162, 2017.
  22. Multi-similarity loss with general pair weighting for deep metric learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5022–5030, 2019.
  23. J. Willis and A. Todorov. First impressions making up your mind after a 100-ms exposure to a face. Psychological Science, 17(7):592–598, 2006.
  24. Low-level fusion of audio, video feature for multi-modal emotion recognition. In Proceedings of the Third International Conference on Computer Vision Theory and Applications (VISAPP), pages 145–151, 2008.
  25. Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional lstm modeling. In 11th Annual Conference of the International Speech Communication Association (INTERSPEECH), pages 1–4, 2010.
  26. Recovering realistic texture in image super-resolution by deep spatial feature transform. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  27. Tensor fusion network for multimodal sentiment analysis. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1103–1114, 2017.
  28. A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE transactions on pattern analysis and machine intelligence, 31(1):39–58, 2008.
  29. PersEmoN: A deep network for joint analysis of apparent personality, emotion and their relationship. IEEE Transactions on Affective Computing, pages 1–10, 2019.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 2 tweets and received 0 likes.

Upgrade to Pro to view all of the tweets about this paper: