Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Egyptian Arabic to English Statistical Machine Translation System for NIST OpenMT'2015 (1606.05759v1)

Published 18 Jun 2016 in cs.CL

Abstract: The paper describes the Egyptian Arabic-to-English statistical machine translation (SMT) system that the QCRI-Columbia-NYUAD (QCN) group submitted to the NIST OpenMT'2015 competition. The competition focused on informal dialectal Arabic, as used in SMS, chat, and speech. Thus, our efforts focused on processing and standardizing Arabic, e.g., using tools such as 3arrib and MADAMIRA. We further trained a phrase-based SMT system using state-of-the-art features and components such as operation sequence model, class-based LLM, sparse features, neural network joint model, genre-based hierarchically-interpolated LLM, unsupervised transliteration mining, phrase-table merging, and hypothesis combination. Our system ranked second on all three genres.

Citations (5)

Summary

We haven't generated a summary for this paper yet.