Enhancing Machine Translation with LLMs: Domain Terminology Integration
Overview of the WMT 2023 Terminology Shared Task
In the pursuit of advancing machine translation (MT), the WMT 2023 Terminology Shared Task stimulates researchers to design systems capable of translating technical terminology with high accuracy. This year's challenge emphasizes the translation's adherence to specified technical terms, which is crucial in domain-specific communications.
Approach to Terminology Integration
Our systems for the task, centered around German-to-English (DE-EN), English-to-Czech (EN-CS), and Chinese-to-English (ZH-EN) pairs, harnessed LLMs for two vital operations. Firstly, we generated synthetic bilingual data informed by the required terminology using ChatGPT. Subsequently, we fine-tuned a pre-existing generic OPUS MT model with a composite of the synthetic data alongside a random selection of the OPUS generic dataset. Post the fine-tuning, we used the refined MT model for producing translations of the datasets furnished by the task organizers.
Terminology-Constrained Automatic Post-Editing
For translations omitting any requisite terms, we deployed a terminology-constrained automatic post-editing step employing ChatGPT. This step revised the MT output to include overlooked terminology, endeavoring to fulfill the task's terminology constraints without altering the untranslated portions of text.
Results and Findings
Our process demonstrated its efficacy by significantly increasing the prevalence of desired terms in the final translations. Notably, the translations of the blind dataset witnessed an increase in the usage of the specified terms from 36.67% with the original model to an impressive 72.88% following the LLM-based editing—effectively doubling the adherence rate across the three language pairs.
Our evaluation was two-pronged: a term-level evaluation showed the increased fidelity to the required terms, while a sentence-level evaluation assessed whether the term integration affected the overall translation integrity. The automatic evaluations validate that our system improves translations in terms of both terminology adherence and overall quality.
In scenarios where the translation quality by an LLM is considerably weaker than that of a traditional MT model, starting with the stronger MT baseline and seeking improvements proved beneficial. Crucially, the success of this process is dependent on the level of language support provided by the respective LLM. Furthermore, real-time adaptive MT is not a substitute for domain-specific fine-tuning, thus using fine-tuning where possible ensures efficiency—reducing the need for post-editing at inference time.
Future Directions
Future work will look to extend these methods to additional languages and domains, especially low-resource pairs. The techniques discussed may also omit the fine-tuning phase to understand if merely employing LLMs for post-editing tasks is sufficient for quality translation, optimizing the translation process for efficiency and reducing latency at inference.
Overall, the systematic approach of combining domain-specific fine-tuning and LLM-powered terminology editing provides robust solutions for enhancing domain-specific machine translation workflows, thereby contributing to the development of more accurate and reliable translation systems.