Estimating the Level of Dialectness Predicts Interannotator Agreement in Multi-dialect Arabic Datasets (2405.11282v3)
Abstract: On annotating multi-dialect Arabic datasets, it is common to randomly assign the samples across a pool of native Arabic speakers. Recent analyses recommended routing dialectal samples to native speakers of their respective dialects to build higher-quality datasets. However, automatically identifying the dialect of samples is hard. Moreover, the pool of annotators who are native speakers of specific Arabic dialects might be scarce. Arabic Level of Dialectness (ALDi) was recently introduced as a quantitative variable that measures how sentences diverge from Standard Arabic. On randomly assigning samples to annotators, we hypothesize that samples of higher ALDi scores are harder to label especially if they are written in dialects that the annotators do not speak. We test this by analyzing the relation between ALDi scores and the annotators' agreement, on 15 public datasets having raw individual sample annotations for various sentence-classification tasks. We find strong evidence supporting our hypothesis for 11 of them. Consequently, we recommend prioritizing routing samples of high ALDi scores to native speakers of each sample's dialect, for which the dialect could be automatically identified at higher accuracies.
- NADI 2023: The fourth nuanced Arabic dialect identification shared task. In Proceedings of ArabicNLP 2023, pages 600–613, Singapore (Hybrid). Association for Computational Linguistics.
- Ibrahim Abu Farha and Walid Magdy. 2020. From Arabic sentiment analysis to sarcasm detection: The ArSarcasm dataset. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection, pages 32–39, Marseille, France. European Language Resource Association.
- Ibrahim Abu Farha and Walid Magdy. 2022. The effect of Arabic dialect familiarity on data annotation. In Proceedings of the The Seventh Arabic Natural Language Processing Workshop (WANLP), pages 399–408, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
- SemEval-2022 task 6: iSarcasmEval, intended sarcasm detection in English and Arabic. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), pages 802–814, Seattle, United States. Association for Computational Linguistics.
- Overview of the WANLP 2021 shared task on sarcasm and sentiment detection in Arabic. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, pages 296–305, Kyiv, Ukraine (Virtual). Association for Computational Linguistics.
- Dataset construction for the detection of anti-social behaviour in online communication in arabic. Procedia Computer Science, 142:174–181. Arabic Computational Linguistics.
- Asad: A twitter-based benchmark arabic sentiment analysis dataset.
- DART: A large dataset of dialectal Arabic tweets. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA).
- Masader plus: A new interface for exploring+ 500 arabic nlp datasets. arXiv preprint arXiv:2208.00932.
- Mawqif: A multi-label Arabic dataset for target-specific stance detection. In Proceedings of the The Seventh Arabic Natural Language Processing Workshop (WANLP), pages 174–184, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
- Masader: Metadata sourcing for arabic text and speech data resources.
- As-Said Muhámmad Badawi. 1973. Mustawayat al-arabiyya al-muasira fi Misr. Dar al-maarif.
- Towards responsible natural language annotation for the varieties of Arabic. In Findings of the Association for Computational Linguistics: ACL 2022, pages 364–371, Dublin, Ireland. Association for Computational Linguistics.
- Business for Social Responsibility. 2022. Human rights due diligence of meta’s impacts in israel and palestine in may 2021. https://about.fb.com/wp-content/uploads/2022/09/Human-Rights-Due-Diligence-of-Metas-Impacts-in-Israel-and-Palestine-in-May-2021.pdf.
- A multi-platform Arabic news comment dataset for offensive language detection. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 6203–6212, Marseille, France. European Language Resources Association.
- An arabic speech-act and sentiment corpus of tweets. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA). The 3rd Workshop on Open-Source Arabic Corpora and Processing Tools, OSACT3 ; Conference date: 08-05-2018.
- Joseph L Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychological bulletin, 76(5):378.
- Nizar Y. Habash. 2010. Introduction to Arabic natural language processing, 1 edition, volume 3 of Synthesis Lectures on Human Language Technologies. Morgan and Claypool Publishers.
- ALDi: Quantifying the Arabic level of dialectness of text. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 10597–10611, Singapore. Association for Computational Linguistics.
- Amr Keleg and Walid Magdy. 2023. Arabic dialect identification under scrutiny: Limitations of single-label classification. In Proceedings of ArabicNLP 2023, pages 385–398, Singapore (Hybrid). Association for Computational Linguistics.
- Overview for the second shared task on language identification in code-switched data. In Proceedings of the Second Workshop on Computational Approaches to Code Switching, pages 40–49, Austin, Texas. Association for Computational Linguistics.
- Hamdy Mubarak and Kareem Darwish. 2016. Demographic surveys of arab annotators on crowdflower. In Proceedings of ACM WebSci16 Workshop “Weaving Relations of Trust in Crowd Work: Transparency and Reputation across Platforms.
- Abusive language detection on Arabic social media. In Proceedings of the First Workshop on Abusive Language Online, pages 52–56, Vancouver, BC, Canada. Association for Computational Linguistics.
- ASTD: Arabic sentiment tweets dataset. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 2515–2519, Lisbon, Portugal. Association for Computational Linguistics.
- Arabic dialect identification: An in-depth error analysis on the MADAR parallel corpus. In Proceedings of ArabicNLP 2023, pages 370–384, Singapore (Hybrid). Association for Computational Linguistics.
- Multilingual and multi-aspect hate speech analysis. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4675–4684, Hong Kong, China. Association for Computational Linguistics.
- SemEval-2017 task 4: Sentiment analysis in Twitter. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 502–518, Vancouver, Canada. Association for Computational Linguistics.
- Annotators with attitudes: How annotator beliefs and identities bias toxic language detection. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5884–5906, Seattle, United States. Association for Computational Linguistics.
- Learning from disagreement: A survey. Journal of Artificial Intelligence Research, 72:1385–1470.
- Visual revelations. CHANCE, 19(1):49–52.
- Amr Keleg (7 papers)
- Walid Magdy (41 papers)
- Sharon Goldwater (40 papers)