Rosetta Stone at KSAA-RD Shared Task: A Hop From Language Modeling To Word--Definition Alignment (2310.15823v3)
Abstract: A Reverse Dictionary is a tool enabling users to discover a word based on its provided definition, meaning, or description. Such a technique proves valuable in various scenarios, aiding language learners who possess a description of a word without its identity, and benefiting writers seeking precise terminology. These scenarios often encapsulate what is referred to as the "Tip-of-the-Tongue" (TOT) phenomena. In this work, we present our winning solution for the Arabic Reverse Dictionary shared task. This task focuses on deriving a vector representation of an Arabic word from its accompanying description. The shared task encompasses two distinct subtasks: the first involves an Arabic definition as input, while the second employs an English definition. For the first subtask, our approach relies on an ensemble of finetuned Arabic BERT-based models, predicting the word embedding for a given definition. The final representation is obtained through averaging the output embeddings from each model within the ensemble. In contrast, the most effective solution for the second subtask involves translating the English test definitions into Arabic and applying them to the finetuned models originally trained for the first subtask. This straightforward method achieves the highest score across both subtasks.
- ARBERT & MARBERT: Deep bidirectional transformers for Arabic. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 7088–7105, Online. Association for Computational Linguistics.
- Arabert: Transformer-based model for arabic language understanding. In LREC 2020 Workshop Language Resources and Evaluation Conference 11–16 May 2020, page 9.
- Revisiting machine translation for cross-lingual classification. ArXiv, abs/2305.14240.
- Dictionary search based on the target word descrip tion. In Proceedings of NLP.
- Roger Brown and David McNeill. 1966. The “tip of the tongue” phenomenon. Journal of Verbal Learning and Verbal Behavior, 5(4):325–337.
- Learning to represent bilingual dictionaries. CoRR, abs/1808.03726.
- ELECTRA: pre-training text encoders as discriminators rather than generators. CoRR, abs/2003.10555.
- Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–8451, Online. Association for Computational Linguistics.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
- Learning to understand phrases by embedding the dictionary. CoRR, abs/1504.00548.
- The interplay of variant, size, and task type in Arabic pre-trained language models. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, Kyiv, Ukraine (Online). Association for Computational Linguistics.
- Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692.
- Efficient estimation of word representations in vector space. In 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings.
- AraT5: Text-to-text transformers for Arabic language generation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 628–647, Dublin, Ireland. Association for Computational Linguistics.
- Building a scalable database-driven reverse dictionary. IEEE Transactions on Knowledge and Data Engineering, 25(3):528–540.
- Leslie N. Smith and Nicholay Topin. 2017. Super-convergence: Very fast training of residual networks using large learning rates. CoRR, abs/1708.07120.
- BERT for monolingual and cross-lingual reverse dictionary. CoRR, abs/2009.14790.
- Multi-channel reverse dictionary model. CoRR, abs/1912.08441.