Context-Aware Siamese Networks for Efficient Emotion Recognition in Conversation (2404.11141v1)
Abstract: The advent of deep learning models has made a considerable contribution to the achievement of Emotion Recognition in Conversation (ERC). However, this task still remains an important challenge due to the plurality and subjectivity of human emotions. Previous work on ERC provides predictive models using mostly graph-based conversation representations. In this work, we propose a way to model the conversational context that we incorporate into a metric learning training strategy, with a two-step process. This allows us to perform ERC in a flexible classification scenario and to end up with a lightweight yet efficient model. Using metric learning through a Siamese Network architecture, we achieve 57.71 in macro F1 score for emotion classification in conversation on DailyDialog dataset, which outperforms the related work. This state-of-the-art result is promising regarding the use of metric learning for emotion recognition, yet perfectible compared to the microF1 score obtained.
- How to train your maml. In Seventh International Conference on Learning Representations, ICLR.
- Neural machine translation by jointly learning to align and translate. 3rd International Conference on Learning Representations, ICLR 2015, 1409.
- Assessing the accuracy of prediction algorithms for classification: An overview. Bioinformatics (Oxford, England), 16:412–24.
- Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5:135–146.
- Large scale online learning of image similarity through ranking. J. Mach. Learn. Res., 11:1109–1135.
- Davide Chicco and Giuseppe Jurman. 2020. The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics, 21(1):6.
- Harald Cramér. 1946. Mathematical Methods of Statistics (PMS-9), Volume 9. Princeton University Press, Princeton.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
- A theoretically sound upper bound on the triplet loss for improving the efficiency of deep distance metric learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10404–10413.
- A unified few-shot classification benchmark to compare transfer and meta learning approaches. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1).
- Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, page 1126–1135. JMLR.org.
- COSMIC: COmmonSense knowledge for eMotion identification in conversations. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 2470–2481, Online. Association for Computational Linguistics.
- Exploring the role of context in utterance-level emotion, act and intent classification in conversations: An empirical study. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 1435–1449, Online. Association for Computational Linguistics.
- DialogueGCN: A graph convolutional neural network for emotion recognition in conversation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 154–164, Hong Kong, China. Association for Computational Linguistics.
- Few-shot emotion recognition in conversation with sequential prototypical networks. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic.
- An adaptive layer to leverage both domain and task specific information from scarce data. Proceedings of the AAAI Conference on Artificial Intelligence, 37(6):7757–7765.
- Deep siamese neural networks for facial expression recognition in the wild. IEEE Transactions on Affective Computing, 14(2):1148–1158.
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Computation, 9(8):1735–1780.
- Meta-learning in neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9):5149–5169.
- UniMSE: Towards unified multimodal sentiment analysis and emotion recognition. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 7837–7851, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Relation-aware graph attention networks with relational position encodings for emotion recognition in conversations. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7360–7370, Online. Association for Computational Linguistics.
- Multi-scale contrastive siamese networks for self-supervised graph representation learning. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, pages 1477–1483. International Joint Conferences on Artificial Intelligence Organization. Main Track.
- M I Jordan. 1986. Serial order: a parallel distributed processing approach. technical report, june 1985-march 1986.
- Supervised contrastive learning. Advances in neural information processing systems, 33:18661–18673.
- Siamese neural networks for one-shot image recognition.
- Bongseok Lee and Yong Suk Choi. 2021. Graph based network with contextualized representations of turns in dialogue. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 443–455, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Past, present, and future: Conversational emotion recognition through structural modeling of psychological knowledge. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 1204–1214, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- EmoCaps: Emotion capsule based model for conversational emotion recognition. In Findings of the Association for Computational Linguistics: ACL 2022, pages 1610–1618, Dublin, Ireland. Association for Computational Linguistics.
- S+PAGE: A speaker and position-aware graph neural network model for emotion recognition in conversation. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 148–157, Online only. Association for Computational Linguistics.
- Roberta: A robustly optimized bert pretraining approach.
- Optimizing millions of hyperparameters by implicit differentiation. In Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume 108 of Proceedings of Machine Learning Research, pages 1540–1552. PMLR.
- Dialoguernn: An attentive rnn for emotion detection in conversations. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01):6818–6825.
- Brian W. Matthews. 1975. Comparison of the predicted and observed secondary structure of t4 phage lysozyme. Biochimica et biophysica acta, 405 2:442–51.
- A simple neural attentive meta-learner. In International Conference on Learning Representations.
- Is discourse role important for emotion recognition in conversation? Proceedings of the AAAI Conference on Artificial Intelligence, 36(10):11121–11129.
- Karl Pearson. 1895. Vii. note on regression and inheritance in the case of two parents. proceedings of the royal society of London, 58(347-352):240–242.
- The refinedweb dataset for falcon llm: Outperforming curated corpora with web data, and web data only.
- Context-dependent embedding utterance representations for emotion recognition in conversations. In Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis, pages 228–236, Toronto, Canada. Association for Computational Linguistics.
- Robert Plutchik. 2001. The Nature of Emotions. American Scientist, 89(4):344.
- Context-dependent sentiment analysis in user-generated videos. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 873–883, Vancouver, Canada. Association for Computational Linguistics.
- Sachin Ravi and Hugo Larochelle. 2016. Optimization as a model for few-shot learning. In International Conference on Learning Representations.
- Fuji Ren and Siyuan Xue. 2020. Intention detection based on siamese neural network with triplet loss. IEEE Access, 8:82242–82254.
- Learning internal representations by error propagation. Technical report, California Univ San Diego La Jolla Inst for Cognitive Science.
- Facenet: A unified embedding for face recognition and clustering. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 815–823.
- Matthew Schultz and Thorsten Joachims. 2003. Learning a distance metric from relative comparisons. In Advances in Neural Information Processing Systems, volume 16. MIT Press.
- Directed acyclic graph network for conversational emotion recognition. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1551–1560, Online. Association for Computational Linguistics.
- Prototypical networks for few-shot learning. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, page 4080–4090, Red Hook, NY, USA. Curran Associates Inc.
- Mpnet: Masked and permuted pre-training for language understanding. In Advances in Neural Information Processing Systems, volume 33, pages 16857–16867. Curran Associates, Inc.
- Supervised prototypical contrastive learning for emotion recognition in conversation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 5197–5206, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Aarohi Srivastava et al. 2022. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615.
- Learning to compare: Relation network for few-shot learning. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1199–1208.
- Llama 2: Open foundation and fine-tuned chat models.
- Context- and sentiment-aware networks for emotion recognition in conversation. IEEE Transactions on Artificial Intelligence, 3(5):699–708.
- Attention is all you need.
- Matching networks for one shot learning. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, page 3637–3645, Red Hook, NY, USA. Curran Associates Inc.
- Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. In Advances in Neural Information Processing Systems, volume 33, pages 5776–5788. Curran Associates, Inc.
- DualGATs: Dual graph attention networks for emotion recognition in conversations. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7395–7408, Toronto, Canada. Association for Computational Linguistics.
- Cauain: Causal aware interaction network for emotion recognition in conversations. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, pages 4524–4530. International Joint Conferences on Artificial Intelligence Organization. Main Track.
- Knowledge-enriched transformer for emotion detection in textual conversations. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 165–176, Hong Kong, China. Association for Computational Linguistics.
- Topic-driven and knowledge-aware transformer for dialogue emotion detection. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1571–1582, Online. Association for Computational Linguistics.
- Iemocap: interactive emotional dyadic motion capture database. Language Resources and Evaluation, 42(4):335–359.
- DailyDialog: A manually labelled multi-turn dialogue dataset. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 986–995, Taipei, Taiwan. Asian Federation of Natural Language Processing.
- MELD: A multimodal multi-party dataset for emotion recognition in conversations. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 527–536, Florence, Italy. Association for Computational Linguistics.
- Barbara Gendron (1 paper)
- Gaël Guibon (2 papers)