Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MaCmS: Magahi Code-mixed Dataset for Sentiment Analysis (2403.04639v2)

Published 7 Mar 2024 in cs.CL

Abstract: The present paper introduces new sentiment data, MaCMS, for Magahi-Hindi-English (MHE) code-mixed language, where Magahi is a less-resourced minority language. This dataset is the first Magahi-Hindi-English code-mixed dataset for sentiment analysis tasks. Further, we also provide a linguistics analysis of the dataset to understand the structure of code-mixing and a statistical study to understand the language preferences of speakers with different polarities. With these analyses, we also train baseline models to evaluate the dataset's quality.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Priya Rani (8 papers)
  2. Gaurav Negi (5 papers)
  3. Theodorus Fransen (4 papers)
  4. John P. McCrae (18 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.