Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

EmoMix-3L: A Code-Mixed Dataset for Bangla-English-Hindi Emotion Detection (2405.06922v1)

Published 11 May 2024 in cs.CL

Abstract: Code-mixing is a well-studied linguistic phenomenon that occurs when two or more languages are mixed in text or speech. Several studies have been conducted on building datasets and performing downstream NLP tasks on code-mixed data. Although it is not uncommon to observe code-mixing of three or more languages, most available datasets in this domain contain code-mixed data from only two languages. In this paper, we introduce EmoMix-3L, a novel multi-label emotion detection dataset containing code-mixed data from three different languages. We experiment with several models on EmoMix-3L and we report that MuRIL outperforms other models on this dataset.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Muhammad Abdul-Mageed and Lyle Ungar. 2017. Emonet: Fine-grained emotion detection with gated recurrent neural networks. In Proceedings of ACL.
  2. Review on sentiment analysis of indian languages with a special focus on code mixed indian languages. In Proceedings of ICACTM.
  3. Multi-label emotion classification on code-mixed text: Data and methods. IEEE Access, 10:8779–8789.
  4. Emotion recognition from multimodal physiological signals for emotion aware healthcare systems. Journal of Medical and Biological Engineering, 40:149–157.
  5. BanglaBERT: Language model pretraining and benchmarks for low-resource language understanding evaluation in Bangla. In Findings of the ACL.
  6. Semeval-2019 task 3: Emocontext contextual emotion detection in text. In Proceedings of SemEval.
  7. Unsupervised cross-lingual representation learning at scale. In Proceedings of ACL.
  8. Databricks. 2023. Dolly 2.0: An open source, instruction-following large language model. Accessed: 2023-09-10.
  9. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of NAACL.
  10. Emotion detection and analysis on social media. arXiv preprint arXiv:1901.08458.
  11. Comet: Towards code-mixed translation using parallel monolingual sentences. In Proceedings of CALCS.
  12. OffMix-3L: A novel code-mixed test dataset in bangla-english-hindi for offensive language identification. In Proceedings of SocialNLP.
  13. Emotion detection in email customer care. In Proceedings of NAACL.
  14. Indicnlpsuite: Monolingual corpora, evaluation benchmarks and pre-trained multilingual language models for indian languages. In Findings of the ACL.
  15. Kornraphop Kawintiranon and Lisa Singh. 2021. Knowledge enhanced masked language model for stance detection. In Proceedings of NAACL.
  16. Muril: Multilingual representations for indian languages. arXiv preprint arXiv:2103.10730.
  17. Taewoon Kim and Piek Vossen. 2021. Emoberta: Speaker-aware emotion recognition in conversation with roberta. arXiv preprint arXiv:2108.12009.
  18. Bangla-bert: transformer-based efficient model for transfer learning and language understanding. IEEE Access, 10:91855–91870.
  19. Cross-lingual text classification of transliterated hindi and malayalam. In Proceedings of Big Data.
  20. Tarald O Kvålseth. 1989. Note on cohen’s kappa. Psychological reports, 65(1):223–226.
  21. Bing Liu. 2020. Sentiment analysis: Mining opinions, sentiments, and emotions. Cambridge university press.
  22. Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692.
  23. Daniele Mazzocchi. 2012. langdetect: Language detection library. Python library.
  24. A deep learning approach for recognizing textual emotion from bengali-english code-mixed data. In Proceedings of ICCIT.
  25. Pieter Muysken. 2000. The study of code-mixing. Bilingual Speech: A Typology of Code-Mixing, 110.
  26. Ravindra Nayak and Raviraj Joshi. 2022. L3Cube-HingCorpus and HingBERT: A code mixed Hindi-English dataset and BERT language models. In Proceedings of WILDRE.
  27. Nick Doiron. 2023. hindi-bert. Accessed: 2023-09-10.
  28. Jianzhi Nie. 2023. Awesome instruction datasets. Accessed: 2023-09-10.
  29. OpenAI. 2023. Gpt-3.5 turbo fine-tuning and api updates. Accessed: 2023-08-28.
  30. SentMix-3L: A novel code-mixed test dataset in bangla-english-hindi for sentiment analysis. In Proceedings of SEALP.
  31. Offensive language identification in transliterated and code-mixed bangla. In Proceedings of BLP.
  32. Distilbert, a distilled version of BERT: smaller, faster, cheaper and lighter. In Proceedings of EMC2.
  33. Bertologicomix: How does code-mixing interact with multilingual bert? In Proceedings of AdaptNLP.
  34. Precogiiith@ wassa2023: Emotion detection for urdu-english code-mixed text. In Proceedings of WASSA.
  35. Anshul Wadhawan and Akshita Aggarwal. 2021. Towards emotion recognition in hindi-english code-mixed data: A transformer based approach. arXiv preprint arXiv:2102.09943.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Nishat Raihan (9 papers)
  2. Dhiman Goswami (16 papers)
  3. Antara Mahmud (4 papers)
  4. Antonios Anastasopoulos (111 papers)
  5. Marcos Zampieri (94 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets