Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Replicable Benchmarking of Neural Machine Translation (NMT) on Low-Resource Local Languages in Indonesia (2311.00998v1)

Published 2 Nov 2023 in cs.CL and cs.AI

Abstract: Neural machine translation (NMT) for low-resource local languages in Indonesia faces significant challenges, including the need for a representative benchmark and limited data availability. This work addresses these challenges by comprehensively analyzing training NMT systems for four low-resource local languages in Indonesia: Javanese, Sundanese, Minangkabau, and Balinese. Our study encompasses various training approaches, paradigms, data sizes, and a preliminary study into using LLMs for synthetic low-resource languages parallel data generation. We reveal specific trends and insights into practical strategies for low-resource language translation. Our research demonstrates that despite limited computational resources and textual data, several of our NMT systems achieve competitive performances, rivaling the translation quality of zero-shot gpt-3.5-turbo. These findings significantly advance NMT for low-resource languages, offering valuable guidance for researchers in similar contexts.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Lucky Susanto (10 papers)
  2. Ryandito Diandaru (5 papers)
  3. Adila Krisnadhi (4 papers)
  4. Ayu Purwarianti (39 papers)
  5. Derry Wijaya (31 papers)
Citations (1)