Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

COLD Fusion: Calibrated and Ordinal Latent Distribution Fusion for Uncertainty-Aware Multimodal Emotion Recognition (2206.05833v2)

Published 12 Jun 2022 in cs.CV, cs.HC, and cs.MM

Abstract: Automatically recognising apparent emotions from face and voice is hard, in part because of various sources of uncertainty, including in the input data and the labels used in a machine learning framework. This paper introduces an uncertainty-aware audiovisual fusion approach that quantifies modality-wise uncertainty towards emotion prediction. To this end, we propose a novel fusion framework in which we first learn latent distributions over audiovisual temporal context vectors separately, and then constrain the variance vectors of unimodal latent distributions so that they represent the amount of information each modality provides w.r.t. emotion recognition. In particular, we impose Calibration and Ordinal Ranking constraints on the variance vectors of audiovisual latent distributions. When well-calibrated, modality-wise uncertainty scores indicate how much their corresponding predictions may differ from the ground truth labels. Well-ranked uncertainty scores allow the ordinal ranking of different frames across the modalities. To jointly impose both these constraints, we propose a softmax distributional matching loss. In both classification and regression settings, we compare our uncertainty-aware fusion model with standard model-agnostic fusion baselines. Our evaluation on two emotion recognition corpora, AVEC 2019 CES and IEMOCAP, shows that audiovisual emotion recognition can considerably benefit from well-calibrated and well-ranked latent uncertainty measures.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (111)
  1. E. Hüllermeier and W. Waegeman, “Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods,” Machine Learning, vol. 110, pp. 457–506, 2021.
  2. B. Schuller, M. Valster, F. Eyben, R. Cowie, and M. Pantic, “AVEC 2012: the continuous audio/visual emotion challenge,” in ICMI, 2012, pp. 449–456.
  3. M. Valstar, B. Schuller, K. Smith, F. Eyben, B. Jiang, S. Bilakhia, S. Schnieder, R. Cowie, and M. Pantic, “AVEC 2013: the continuous audio/visual emotion and depression recognition challenge,” in AVEC workshop, 2013, pp. 3–10.
  4. F. Ringeval, B. Schuller, M. Valstar, N. Cummins, R. Cowie, L. Tavabi, M. Schmitt, S. Alisamir, S. Amiriparian, E.-M. Messner et al., “AVEC 2019 workshop and challenge: state-of-mind, detecting depression with ai, and cross-cultural affect recognition,” in AVEC workshop, 2019, pp. 3–12.
  5. S. K. D’mello and J. Kory, “A review and meta-analysis of multimodal affect detection systems,” ACM Computing Surveys, vol. 47, no. 3, pp. 1–36, 2015.
  6. M. A. Nicolaou, H. Gunes, and M. Pantic, “Continuous prediction of spontaneous affect from multiple cues and modalities in valence-arousal space,” IEEE TAC, vol. 2, no. 2, pp. 92–105, 2011.
  7. Z. Zeng, M. Pantic, G. I. Roisman, and T. S. Huang, “A survey of affect recognition methods: Audio, visual, and spontaneous expressions,” IEEE TPAMI, vol. 31, no. 1, pp. 39–58, 2008.
  8. P. V. Rouast, M. T. Adam, and R. Chiong, “Deep learning for human affect recognition: Insights and new developments,” IEEE TAC, vol. 12, no. 2, pp. 524–543, 2019.
  9. F. Noroozi, M. Marjanovic, A. Njegus, S. Escalera, and G. Anbarjafari, “Audio-visual emotion recognition in video clips,” IEEE TAC, vol. 10, no. 1, pp. 60–75, 2017.
  10. L. Schoneveld, A. Othmani, and H. Abdelkawy, “Leveraging recent advances in deep learning for audio-visual emotion recognition,” Pattern Recognition Letters, vol. 146, pp. 1–7, 2021.
  11. M. Gerczuk, S. Amiriparian, S. Ottl, and B. W. Schuller, “Emonet: A transfer learning framework for multi-corpus speech emotion recognition,” IEEE TAC, no. 01, pp. 1–1, 2021.
  12. O. Laurent, A. Lafage, E. Tartaglione, G. Daniel, J.-M. Martinez, A. Bursuc, and G. Franchi, “Packed-ensembles for efficient uncertainty estimation,” in ICLR, 2023.
  13. C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” in ICML, 2017, pp. 1321–1330.
  14. J. Mukhoti, V. Kulharia, A. Sanyal, S. Golodetz, P. Torr, and P. Dokania, “Calibrating deep neural networks using focal loss,” in NeurIPS, vol. 33, 2020, pp. 15 288–15 299.
  15. A. Nguyen, J. Yosinski, and J. Clune, “Deep neural networks are easily fooled: High confidence predictions for unrecognizable images,” in CVPR, 2015, pp. 427–436.
  16. C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” in ICLR, 2014.
  17. A. Kumar, S. Sarawagi, and U. Jain, “Trainable calibration measures for neural networks from kernel mean embeddings,” in ICML, 2018, pp. 2805–2814.
  18. J. Moon, J. Kim, Y. Shin, and S. Hwang, “Confidence-aware learning for deep neural networks,” in ICML, 2020, pp. 7034–7044.
  19. X. Yang, P. Ramesh, R. Chitta, S. Madhvanath, E. A. Bernal, and J. Luo, “Deep multimodal representation learning from temporal data,” in CVPR, 2017, pp. 5447–5455.
  20. C. Busso, M. Bulut, C.-C. Lee, A. Kazemzadeh, E. Mower, S. Kim, J. N. Chang, S. Lee, and S. S. Narayanan, “IEMOCAP: Interactive emotional dyadic motion capture database,” Language resources and evaluation, vol. 42, no. 4, pp. 335–359, 2008.
  21. A. B. Zadeh, P. P. Liang, S. Poria, E. Cambria, and L.-P. Morency, “Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph,” in ACL, 2018, pp. 2236–2246.
  22. H. Gunes, B. Schuller, M. Pantic, and R. Cowie, “Emotion representation, analysis and synthesis in continuous space: A survey,” in IEEE FG, 2011, pp. 827–834.
  23. L. Stappen, A. Baird, G. Rizos, P. Tzirakis, X. Du, F. Hafner, L. Schumann, A. Mallol-Ragolta, B. W. Schuller, I. Lefter et al., “Muse 2020 challenge and workshop: Multimodal sentiment analysis, emotion-target engagement and trustworthiness detection in real-life media: Emotional car reviews in-the-wild,” in Proceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop, 2020, pp. 35–44.
  24. L. Stappen, A. Baird, L. Christ, L. Schumann, B. Sertolli, E.-M. Messner, E. Cambria, G. Zhao, and B. W. Schuller, “The muse 2021 multimodal sentiment analysis challenge: sentiment, emotion, physiological-emotion, and stress,” in Proceedings of the 2nd on Multimodal Sentiment Analysis Challenge, 2021, pp. 5–14.
  25. L. Christ, S. Amiriparian, A. Baird, P. Tzirakis, A. Kathan, N. Müller, L. Stappen, E.-M. Meßner, A. König, A. Cowen et al., “The muse 2022 multimodal sentiment analysis challenge: Humor, emotional reactions, and stress,” in Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge.   ACM, 2022, pp. 5–14.
  26. D. Kollias, A. Schulc, E. Hajiyev, and S. Zafeiriou, “Analysing affective behavior in the first ABAW 2020 competition,” in IEEE FG, 2020, pp. 637–643.
  27. D. Kollias and S. Zafeiriou, “Analysing affective behavior in the second ABAW2 competition,” in ICCV, 2021, pp. 3652–3660.
  28. T. Mittal, P. Guhan, U. Bhattacharya, R. Chandra, A. Bera, and D. Manocha, “Emoticon: Context-aware multimodal emotion recognition using frege’s principle,” in CVPR, 2020, pp. 14 234–14 243.
  29. D. Yang, S. Huang, S. Wang, Y. Liu, P. Zhai, L. Su, M. Li, and L. Zhang, “Emotion recognition for multiple context awareness,” in ECCV, 2022, pp. 144–162.
  30. S. Poria, E. Cambria, R. Bajpai, and A. Hussain, “A review of affective computing: From unimodal analysis to multimodal fusion,” Information Fusion, vol. 37, pp. 98–125, 2017.
  31. Y. Jiang, W. Li, M. S. Hossain, M. Chen, A. Alelaiwi, and M. Al-Hammadi, “A snapshot research and implementation of multimodal information fusion for data-driven emotion recognition,” Information Fusion, vol. 53, pp. 209–221, 2020.
  32. S. Zhao, G. Jia, J. Yang, G. Ding, and K. Keutzer, “Emotion recognition from multiple modalities: Fundamentals and methodologies,” IEEE Signal Process. Mag., vol. 38, no. 6, pp. 59–73, 2021.
  33. J. She, Y. Hu, H. Shi, J. Wang, Q. Shen, and T. Mei, “Dive into ambiguity: Latent distribution mining and pairwise uncertainty estimation for facial expression recognition,” in CVPR, 2021, pp. 6248–6257.
  34. Y. Zhang, C. Wang, and W. Deng, “Relative uncertainty learning for facial expression recognition,” NeurIPS, vol. 34, pp. 17 616–17 627, 2021.
  35. K. Wang, X. Peng, J. Yang, S. Lu, and Y. Qiao, “Suppressing uncertainties for large-scale facial expression recognition,” in CVPR, 2020, pp. 6897–6906.
  36. N. Foteinopoulou, C. Tzelepis, and I. Patras, “Estimating continuous affect with label uncertainty,” in ACII, 2021, pp. 1–8.
  37. Z. Zeng, J. Tu, M. Liu, and T. S. Huang, “Multi-stream confidence analysis for audio-visual affect recognition,” in ACII, 2005, pp. 964–971.
  38. Z. Xie and L. Guan, “Multimodal information fusion of audiovisual emotion recognition using novel information theoretic tools,” in ICME, 2013, pp. 1–6.
  39. T. Dang, B. Stasak, Z. Huang, S. Jayawardena, M. Atcheson, M. Hayat, P. Le, V. Sethu, R. Goecke, and J. Epps, “Investigating word affect features and fusion of probabilistic predictions incorporating uncertainty in avec 2017,” in AVEC workshop, 2017, pp. 27–35.
  40. E. Sanchez, M. K. Tellamekala, M. Valstar, and G. Tzimiropoulos, “Affective processes: stochastic modelling of temporal context for emotion and facial expression recognition,” in CVPR, 2021.
  41. T. Mani Kumar, E. Sanchez, G. Tzimiropoulos, T. Giesbrecht, and M. Valstar, “Stochastic process regression for cross-cultural speech emotion recognition,” Interspeech, pp. 3390–3394, 2021.
  42. M. Garnelo, D. Rosenbaum, C. Maddison, T. Ramalho, D. Saxton, M. Shanahan, Y. W. Teh, D. Rezende, and S. A. Eslami, “Conditional neural processes,” in ICML, 2018, pp. 1704–1713.
  43. M. Garnelo, J. Schwarz, D. Rosenbaum, F. Viola, D. J. Rezende, S. M. A. Eslami, and Y. W. Teh, “Neural processes,” in ICML Workshop, 2018.
  44. M. K. Tellamekala, T. Giesbrecht, and M. Valstar, “Modelling stochastic context of audio-visual expressive behaviour with affective processes,” IEEE TAC, no. 01, pp. 1–1, 2022.
  45. A. Schörgendorfer and W. Elmenreich, “Extended confidence-weighted averaging in sensor fusion,” in In Proceedings of the Junior Scientist Conference, 2006, pp. 67–68.
  46. P. W. Große, H. Holzapfel, and A. Waibel, “Confidence based multimodal fusion for person identification,” in ACM MM, 2008, pp. 885–888.
  47. G. Papandreou, A. Katsamanis, V. Pitsikalis, and P. Maragos, “Adaptive multimodal fusion by uncertainty compensation with application to audiovisual speech recognition,” IEEE TASLP, vol. 17, no. 3, pp. 423–435, 2009.
  48. M. Subedar, R. Krishnan, P. L. Meyer, O. Tickoo, and J. Huang, “Uncertainty-aware audiovisual activity recognition using deep bayesian variational inference,” in ICCV, 2019, pp. 6301–6310.
  49. J. Tian, W. Cheung, N. Glaser, Y.-C. Liu, and Z. Kira, “Uno: Uncertainty-aware noisy-or multimodal fusion for unanticipated input degradation,” in ICRA, 2020, pp. 5716–5723.
  50. S. Pramanick, A. Roy, and V. M. Patel, “Multimodal learning using optimal transport for sarcasm and humor detection,” in WACV, 2022, pp. 3930–3940.
  51. H. Wang, J. Zhang, Y. Chen, C. Ma, J. Avery, L. Hull, and G. Carneiro, “Uncertainty-aware multi-modal learning via cross-modal random network prediction,” in ECCV, 2022, pp. 200–217.
  52. U. Sarawgi, R. Khincha, W. Zulfikar, S. Ghosh, and P. Maes, “Uncertainty-aware boosted ensembling in multi-modal settings,” in IJCNN, 2021, pp. 1–9.
  53. T. Zhi-Xuan, H. Soh, and D. Ong, “Factorized inference in deep markov models for incomplete multimodal time series,” in AAAI, vol. 34, no. 06, 2020, pp. 10 334–10 341.
  54. D.-B. Wang, L. Feng, and M.-L. Zhang, “Rethinking calibration of deep neural networks: Do not be afraid of overconfidence,” NeurIPS, vol. 34, 2021.
  55. M. Minderer, J. Djolonga, R. Romijnders, F. Hubis, X. Zhai, N. Houlsby, D. Tran, and M. Lucic, “Revisiting the calibration of modern neural networks,” NeurIPS, vol. 34, 2021.
  56. B. Zadrozny and C. Elkan, “Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers,” in ICML, vol. 1.   Citeseer, 2001, pp. 609–616.
  57. ——, “Transforming classifier scores into accurate multiclass probability estimates,” in Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002, pp. 694–699.
  58. J. Platt et al., “Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods,” Advances in large margin classifiers, vol. 10, no. 3, pp. 61–74, 1999.
  59. G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, 2015.
  60. R. Krishnan and O. Tickoo, “Improving model calibration with accuracy versus uncertainty optimization,” NeurIPS, vol. 33, pp. 18 237–18 248, 2020.
  61. V. Kuleshov, N. Fenner, and S. Ermon, “Accurate uncertainties for deep learning using calibrated regression,” in ICML, 2018, pp. 2796–2804.
  62. H. Song, T. Diethe, M. Kull, and P. Flach, “Distribution calibration for regression,” in ICML, 2019, pp. 5897–5906.
  63. S. Utpala and P. Rai, “Quantile regularization: Towards implicit calibration of regression models,” arXiv preprint arXiv:2002.12860, 2020.
  64. W. Li, X. Huang, J. Lu, J. Feng, and J. Zhou, “Learning probabilistic ordinal embeddings for uncertainty-aware regression,” in CVPR, 2021, pp. 13 896–13 905.
  65. C. Corbière, N. Thome, A. Bar-Hen, M. Cord, and P. Pérez, “Addressing failure prediction by learning model confidence,” in NeurIPS, 2019, pp. 2898–2909.
  66. R. Roady, T. L. Hayes, R. Kemker, A. Gonzales, and C. Kanan, “Are out-of-distribution detection methods effective on large-scale datasets?” arXiv preprint arXiv:1910.14034, 2019.
  67. Y. Geifman and R. El-Yaniv, “Selective classification for deep neural networks,” in NeurIPS, 2017, pp. 4878–4887.
  68. T. Baltrušaitis, C. Ahuja, and L.-P. Morency, “Multimodal machine learning: A survey and taxonomy,” IEEE TPAMI, vol. 41, no. 2, pp. 423–443, 2018.
  69. S. Zhang, S. Zhang, T. Huang, W. Gao, and Q. Tian, “Learning affective features with a hybrid deep model for audio–visual emotion recognition,” IEEE TCSVT, vol. 28, no. 10, pp. 3030–3043, 2017.
  70. F. Lingenfelser, J. Wagner, J. Deng, R. Brueckner, B. Schuller, and E. André, “Asynchronous and event-based fusion systems for affect recognition on naturalistic data in comparison to conventional approaches,” IEEE TAC, vol. 9, no. 4, pp. 410–423, 2016.
  71. F. Ringeval, F. Eyben, E. Kroupi, A. Yuce, J.-P. Thiran, T. Ebrahimi, D. Lalanne, and B. Schuller, “Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological data,” Pattern Recognition Letters, vol. 66, pp. 22–30, 2015.
  72. P. Tzirakis, G. Trigeorgis, M. A. Nicolaou, B. W. Schuller, and S. Zafeiriou, “End-to-end multimodal emotion recognition using deep neural networks,” IEEE Journal of Seleted Topics in Signal Processing, vol. 11, no. 8, pp. 1301–1309, 2017.
  73. G. Evangelopoulos, A. Zlatintsi, A. Potamianos, P. Maragos, K. Rapantzikos, G. Skoumas, and Y. Avrithis, “Multimodal saliency and fusion for movie summarization based on aural, visual, and textual attention,” IEEE Trans. Multimedia, vol. 15, no. 7, pp. 1553–1568, 2013.
  74. J. Kossaifi, A. Toisoul, A. Bulat, Y. Panagakis, T. M. Hospedales, and M. Pantic, “Factorized higher-order cnns with an application to spatio-temporal emotion estimation,” in CVPR, June 2020.
  75. S. Bruch, X. Wang, M. Bendersky, and M. Najork, “An analysis of the softmax cross entropy loss for learning-to-rank with binary relevance,” in ACM SIGIR ICTIR, 2019, pp. 75–78.
  76. J. Chang, Z. Lan, C. Cheng, and Y. Wei, “Data uncertainty learning in face recognition,” in CVPR, 2020, pp. 5710–5719.
  77. J. Kossaifi, R. Walecki, Y. Panagakis, J. Shen, M. Schmitt, F. Ringeval, J. Han, V. Pandit, A. Toisoul, B. Schuller et al., “SEWA DB: A rich database for audio-visual emotion and sentiment research in the wild,” IEEE TPAMI, vol. 43, no. 3, pp. 1022–1040, 2019.
  78. Z. Zhao, Y. Zheng, Z. Zhang, H. Wang, Y. Zhao, and C. Li, “Exploring spatio-temporal representations by integrating attention-based bidirectional-lstm-rnns and fcns for speech emotion recognition,” Interspeech, pp. 272–276, 2018.
  79. S. Tripathi and H. Beigi, “Multi-modal emotion recognition on IEMOCAP with neural networks,” arXiv preprint arXiv:1804.05788, 2018.
  80. W. Dai, S. Cahyawijaya, Z. Liu, and P. Fung, “Multimodal end-to-end sparse model for emotion recognition,” in NAACL, 2021, pp. 5305–5316.
  81. W. Dai, Z. Liu, T. Yu, and P. Fung, “Modality-transferable emotion embeddings for low-resource multimodal emotion recognition,” in IJCNLP-AACL, 2020, pp. 269–280.
  82. I. Lawrence and K. Lin, “A concordance correlation coefficient to evaluate reproducibility,” Biometrics, pp. 255–268, 1989.
  83. Z. Li, Y. Zhou, W. Zhang, Y. Liu, C. Yang, Z. Lian, and S. Hu, “Amoa: Global acoustic feature enhanced modal-order-aware network for multimodal sentiment analysis,” in COLING, 2022, pp. 7136–7146.
  84. Z. Li, Y. Zhou, Y. Liu, F. Zhu, C. Yang, and S. Hu, “Qap: A quantum-inspired adaptive-priority-learning model for multimodal emotion recognition,” in ACL, 2023, pp. 12 191–12 204.
  85. J. Yang, A. Bulat, and G. Tzimiropoulos, “Fan-face: a simple orthogonal improvement to deep face recognition,” in AAAI, vol. 34, no. 07, 2020, pp. 12 621–12 628.
  86. A. Toisoul, J. Kossaifi, A. Bulat, G. Tzimiropoulos, and M. Pantic, “Estimation of continuous valence and arousal levels from faces in naturalistic conditions,” Nature Machine Intelligence, vol. 3, no. 1, pp. 42–50, 2021.
  87. I. Ntinou, E. Sanchez, A. Bulat, M. Valstar, and Y. Tzimiropoulos, “A transfer learning approach to heatmap regression for action unit intensity estimation,” IEEE TAC, 2021.
  88. A. Mollahosseini, B. Hasani, and M. H. Mahoor, “Affectnet: A database for facial expression, valence, and arousal computing in the wild,” IEEE TAC, vol. 10, no. 1, pp. 18–31, 2017.
  89. H. Chen, Y. Deng, S. Cheng, Y. Wang, D. Jiang, and H. Sahli, “Efficient spatial temporal convolutional features for audiovisual continuous affect recognition,” in AVEC workshop, 2019, pp. 19–26.
  90. S. Hershey, S. Chaudhuri, D. P. Ellis, J. F. Gemmeke, A. Jansen, R. C. Moore, M. Plakal, D. Platt, R. A. Saurous, B. Seybold et al., “Cnn architectures for large-scale audio classification,” in ICASSP.   IEEE, 2017, pp. 131–135.
  91. S. Chen, Q. Jin, J. Zhao, and S. Wang, “Multimodal multi-task learning for dimensional and continuous emotion recognition,” in AVEC workshop, 2017, pp. 19–26.
  92. D. S. Park, W. Chan, Y. Zhang, C.-C. Chiu, B. Zoph, E. D. Cubuk, and Q. V. Le, “Specaugment: A simple data augmentation method for automatic speech recognition,” in Interspeech, 2019, pp. 2613–2617.
  93. G. Degottex, J. Kane, T. Drugman, T. Raitio, and S. Scherer, “Covarep—a collaborative voice analysis repository for speech technologies,” in ICASSP.   IEEE, 2014, pp. 960–964.
  94. J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation,” in EMNLP, 2014, pp. 1532–1543.
  95. I. Loshchilov and F. Hutter, “SGDR: Stochastic gradient descent with warm restarts,” in ICLR, 2016.
  96. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in ICLR, 2014.
  97. R. Liaw, E. Liang, R. Nishihara, P. Moritz, J. E. Gonzalez, and I. Stoica, “Tune: A research platform for distributed model selection and training,” arXiv preprint arXiv:1807.05118, 2018.
  98. J. Zhao, R. Li, J. Liang, S. Chen, and Q. Jin, “Adversarial domain adaption for multi-cultural dimensional emotion recognition in dyadic interactions,” in AVEC workshop, 2019, pp. 37–45.
  99. Y.-H. H. Tsai, S. Bai, P. P. Liang, J. Z. Kolter, L.-P. Morency, and R. Salakhutdinov, “Multimodal transformer for unaligned multimodal language sequences,” in ACL, vol. 2019, 2019, p. 6558.
  100. A. Anwar and A. Raychowdhury, “Masked face recognition for secure authentication,” arXiv preprint arXiv:2008.11104, 2020.
  101. Q. Wei, X. Huang, and Y. Zhang, “Fv2es: A fully end2end multimodal system for fast yet effective video emotion recognition inference,” IEEE Trans. Broadcast., vol. 69, no. 1, pp. 10–20, 2022.
  102. D. Hazarika, R. Zimmermann, and S. Poria, “MISA: Modality-invariant and-specific representations for multimodal sentiment analysis,” in ACM MM, 2020, pp. 1122–1131.
  103. D. Stewart, R. Seymour, A. Pass, and J. Ming, “Robust audio-visual speech recognition under noisy audio-video conditions,” IEEE Trans. Cybern., vol. 44, no. 2, pp. 175–184, 2013.
  104. L. Christ, S. Amiriparian, A. Baird, A. Kathan, N. Müller, S. Klug, C. Gagne, P. Tzirakis, E.-M. Meßner, A. König et al., “The muse 2023 multimodal sentiment analysis challenge: Mimicked emotions, cross-cultural humour, and personalisation,” arXiv preprint arXiv:2305.03369, 2023.
  105. Y. Wang, Y. Shen, Z. Liu, P. P. Liang, A. Zadeh, and L.-P. Morency, “Words can shift: Dynamically adjusting word representations using nonverbal behaviors,” in AAAI, vol. 33, no. 01, 2019, pp. 7216–7223.
  106. H. Pham, P. P. Liang, T. Manzini, L.-P. Morency, and B. Póczos, “Found in translation: Learning robust joint representations by cyclic translations between modalities,” in AAAI, vol. 33, no. 01, 2019, pp. 6892–6899.
  107. Z. Liu, Y. Shen, V. B. Lakshminarasimhan, P. P. Liang, A. Zadeh, and L.-P. Morency, “Efficient low-rank multimodal fusion with modality-specific factors,” in ACL, 2018, pp. 2247–2256.
  108. A. Zadeh, M. Chen, S. Poria, E. Cambria, and L.-P. Morency, “Tensor fusion network for multimodal sentiment analysis,” in EMNLP, 2017, pp. 1103–1114.
  109. Y.-H. H. Tsai, P. P. Liang, A. Zadeh, L.-P. Morency, and R. Salakhutdinov, “Learning factorized multimodal representations,” in ICLR, 2019.
  110. Z. Sun, P. Sarma, W. Sethares, and Y. Liang, “Learning relationships between text, audio, and video via deep canonical correlation for multimodal language analysis,” in AAAI, vol. 34, no. 05, 2020, pp. 8992–8999.
  111. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the NAACL-HLT, 2019, pp. 4171–4186.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Mani Kumar Tellamekala (5 papers)
  2. Shahin Amiriparian (32 papers)
  3. Björn W. Schuller (153 papers)
  4. Elisabeth André (65 papers)
  5. Timo Giesbrecht (2 papers)
  6. Michel Valstar (26 papers)
Citations (19)