Advancing the Arabic WordNet: Elevating Content Quality (2403.20215v1)
Abstract: High-quality WordNets are crucial for achieving high-quality results in NLP applications that rely on such resources. However, the wordnets of most languages suffer from serious issues of correctness and completeness with respect to the words and word meanings they define, such as incorrect lemmas, missing glosses and example sentences, or an inadequate, Western-centric representation of the morphology and the semantics of the language. Previous efforts have largely focused on increasing lexical coverage while ignoring other qualitative aspects. In this paper, we focus on the Arabic language and introduce a major revision of the Arabic WordNet that addresses multiple dimensions of lexico-semantic resource quality. As a result, we updated more than 58% of the synsets of the existing Arabic WordNet by adding missing information and correcting errors. In order to address issues of language diversity and untranslatability, we also extended the wordnet structure by new elements: phrasets and lexical gaps.
- On the evaluation and improvement of Arabic Wordnet coverage and usability. Language resources and evaluation, 47:891–917.
- Musa Alkhalifa and Horacio Rodríguez. 2009. Automatically extending NE coverage of Arabic Wordnet using Wikipedia. In Proc. Of the 3rd International Conference on Arabic Language Processing CITALA2009, Rabat, Morocco, pages 23–30.
- Rohi Baalbaki. 2005. Al-mawrid Al-qareeb Arabic-English Dictionary. Dar El Ilm Lilmalayin, Lebanon.
- Mohamed Ali Batita and Mounir Zrigui. 2018. The enrichment of Arabic Wordnet antonym relations. In Computational Linguistics and Intelligent Text Processing: 18th International Conference, CICLing 2017, Budapest, Hungary, April 17–23, 2017, Revised Selected Papers, Part I 18, pages 342–353. Springer.
- Wordnet: A lexical database organized on psycholinguistic principles. In Lexical Acquisition, pages 211–232. Psychology Press.
- Linguistic diversity and bias in online dictionaries. University of Bayreuth African Studies Online, page 173.
- Methods and tools for building the Catalan Wordnet. arXiv preprint cmp-lg/9806009.
- Luisa Bentivogli and Emanuele Pianta. 2000. Looking for lexical gaps. In Proceedings of the ninth EURALEX International Congress, pages 8–12. Stuttgart: Universität Stuttgart.
- Arabic Wordnet semantic relations enrichment through morpho-lexical patterns. In 2013 1st International Conference on Communications, Signal Processing, and their Applications (ICCSPA), pages 1–6. IEEE.
- Peter Paul Buitelaar. 1998. CoreLex: systematic polysemy and underspecification. Brandeis University.
- Mona Diab. 2004. The feasibility of bootstrapping an Arabic Wordnet leveraging parallel corpora and an English Wordnet. In Proceedings of the Arabic Language Technologies and Resources, NEMLAR, Cairo.
- Abed Alhakim Freihat. 2014. An organizational approach to the polysemy problem in Wordnet. Ph.D. thesis, University of Trento.
- Solving specialization polysemy in wordnet. Int. J. Comput. Linguistics Appl., 4(1):29–52.
- A taxonomic classification of Wordnet polysemy types. In Proceedings of the 8th Global WordNet Conference (GWC), pages 106–114.
- Compound noun polysemy and sense enumeration in Wordnet. In Proceedings of the 7th International Conference on Information, Process, and Knowledge Management (eKNOW), pages 166–171.
- One world-seven thousand languages (best paper award, third place). In International Conference on Computational Linguistics and Intelligent Text Processing, pages 220–235. Springer.
- Julio Gonzalo. 2004. Sense proximity versus sense relations. GWC 2004, page 5.
- Mustafa Jarrar. 2021. The Arabic ontology–an Arabic Wordnet with ontologically clean content. Applied ontology, 16(1):1–26.
- Lexical diversity in kinship across languages and dialects. Frontiers in Psychology, 14.
- The quality of lexical semantic resources: A survey. In Proceedings of the 4th International Conference on Natural Language and Speech Processing (ICNLSP 2021), pages 117–129.
- The dimensions of lexical semantic resource quality. In Proceedings of the Second International Workshop on NLP Solutions for Under Resourced Languages (NSURL 2021) co-located with ICNLSP 2021, pages 15–21. ACL Anthology.
- Adrienne Lehrer. 1970. Notes on lexical gaps. Journal of linguistics, 6(2):257–261.
- Bernardo Magnini and Gabriela Cavaglia. 2000. Integrating subject field codes into Wordnet. In LREC, volume 1413.
- Modeling color terminology across thousands of languages. arXiv preprint arXiv:1910.01531.
- Rada Mihalcea and Dan I Moldovan. 2001. Ez. Wordnet: Principles for automatic generation of a coarse grained wordnet. In FLAIRS conference, pages 454–458.
- Roberto Navigli. 2009. Word sense disambiguation: A survey. ACM computing surveys (CSUR), 41(2):1–69.
- Jian-Yun Nie. 2022. Cross-language information retrieval. Springer Nature.
- Thierry Poibeau. 2017. Machine translation. MIT Press.
- Arabic wordnet: Semi-automatic extensions using bayesian inference. In LREC.
- Yago: A large ontology from Wikipedia and Wordnet. Journal of Web Semantics, 6(3):203–217.
- Piek Vossen. 1998. A multilingual database with lexical semantic networks. Dordrecht: Kluwer Academic Publishers. doi, 10:978–94.
- Pushpak Bhattacharyya. 2010. IndoWordNet. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), Valletta, Malta. European Language Resources Association (ELRA).
- Building a WordNet for Arabic. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), pages 29–34. European Language Resources Association.
- Semiautomatic creation of taxonomies. In COLING-02: SEMANET: Building and Using Semantic Networks.
- George A Miller. 1995. Wordnet: a lexical database for English. Communications of the ACM, 38(11):39–41.
- A Wordnet from the ground up. Oficyna Wydawnicza Politechniki Wrocławskiej Wrocław.
- Arabic Wordnet: New content and new applications. In Proceedings of the 8th Global WordNet Conference (GWC), pages 333–341.
- Karin Kipper Schuler. 2005. VerbNet: A broad-coverage, comprehensive verb lexicon. University of Pennsylvania.
- Balkanet: Aims, methods, results and perspectives. a general overview. Romanian Journal of Information science and technology, 7(1-2):9–43.
- PJTM Vossen. 1999. Eurowordnet.