DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought (2412.17498v1)

Published 23 Dec 2024 in cs.CL and cs.AI

Abstract: Recently, O1-like models have emerged as representative examples, illustrating the effectiveness of long chain-of-thought (CoT) in reasoning tasks such as math and coding tasks. In this paper, we introduce DRT-o1, an attempt to bring the success of long CoT to neural machine translation (MT). Specifically, in view of the literature books that might involve similes and metaphors, translating these texts to a target language is very difficult in practice due to cultural differences. In such cases, literal translation often fails to convey the intended meaning effectively. Even for professional human translators, considerable thought must be given to preserving semantics throughout the translation process. To simulate LLMs' long thought ability in MT, we first mine sentences containing similes or metaphors from existing literature books, and then develop a multi-agent framework to translate these sentences via long thought. In the multi-agent framework, a translator is used to iteratively translate the source sentence under the suggestions provided by an advisor. To ensure the effectiveness of the long thoughts, an evaluator is also employed to judge whether the translation in the current round is better than the previous one or not. In this manner, we collect tens of thousands of long-thought MT data, which is used to train our DRT-o1. The experimental results on literature translation demonstrate the effectiveness of the DRT-o1. Using Qwen2.5-7B and Qwen2.5-14B as the backbones, the improvement brought by DRT-o1 achieves 7.33~8.26 BLEU and 1.66~3.36 CometScore. Besides, DRT-o1-7B can outperform QwQ-32B-Preview by 7.82 BLEU and 1.46 CometScore, showing its effectiveness. The project is available at https://github.com/krystalan/DRT-o1

Authors (4)

Jiaan Wang (35 papers)
Fandong Meng (174 papers)
Yunlong Liang (33 papers)
Jie Zhou (687 papers)

Summary

The paper presents a technical framework for integrating long chain-of-thought (CoT) reasoning into neural machine translation (MT) for literature, focusing on the challenging task of translating figurative language such as similes and metaphors. The work is structured around both data synthesis and model fine-tuning, with key contributions that can be summarized as follows:

Data Collection and Pre-processing

Literature Mining:
- The approach begins by extracting approximately 577.6K sentences from over 400 public-domain books sourced from Project Gutenberg.
- A filtering process is applied to select sentences of appropriate lengths and further refine the dataset by employing a LLM (specifically Qwen2.5-72B-Instruct) to identify sentences that include similes or metaphors.
- The selected sentences undergo a literal translation evaluation: if the LLM-generated literal translation is deemed inadequate for native comprehension, those sentences are marked as needing deeper reasoning and are retained. This filtering results in approximately 63K pre-selected sentences.

Multi-Agent Framework for Long Thought Synthesis

The core of the methodology is a multi-agent system that simulates an iterative translation process. This system comprises three distinct agents:
- Translator:
- Initially performs a word-level translation by identifying key tokens within the source sentence and aligning them with corresponding target language counterparts.
- Generates a preliminary complete translation (denoted as $t^0$ ) based on both the source sentence and the bilingual keyword pairs.
- Advisor:
- Reviews each translation iteration by providing detailed feedback ( $f^k$ ) aimed at refining the semantic and cultural fidelity of the translation.
- Evaluator:
- Assigns an overall quality score ( $s^k$ ) to the translation at each iteration based on pre-defined evaluation criteria.
Iterative Refinement:
- Starting from the preliminary output, the system enters a refinement loop where the translator uses the previous translation, advisor feedback, and evaluator score to generate an improved translation.
- The process terminates when the translation score exceeds a preset threshold or when the maximum number of iterations is reached.
Post-processing:
- To enhance the fluency and readability of the long thought process, the synthesized chain-of-thought is reformulated using GPT-4o, resulting in self-reflective descriptions that encapsulate the iterative reasoning path.

Data Statistics and Training Setup

The refined process yields 22,264 long thought MT samples, with each sample containing an average of over 500 tokens in the "thought" segment, analogous to previous chain-of-thought datasets developed for math and coding tasks.
The data is split into training, validation, and testing sets, facilitating the supervised fine-tuning (SFT) of two model variants: DRT-o1-7B and DRT-o1-14B.
Both models are built upon the Qwen2.5-7B-Instruct and Qwen2.5-14B-Instruct backbones, respectively, and the fine-tuning is performed using DeepSpeed with ZeRO-3 optimization. Training details include the use of 8×NVIDIA A100 GPUs, a learning rate of 1e-5, and a total training duration of 70 GPU hours for the 7B model and 124 GPU hours for the 14B model over 3 epochs.

Experimental Evaluation

Metrics:
- The evaluation utilizes BLEU for n-gram overlap, alongside reference-free CometKiwi and reference-based CometScore for semantic similarity, ensuring a comprehensive appraisal of translation quality.
Results:
- DRT-o1-7B demonstrates significant performance improvements, achieving an increase of approximately 8.26 BLEU, 1.31 CometKiwi, and 3.36 CometScore over its Qwen2.5-7B-Instruct baseline.
- Similarly, DRT-o1-14B shows an enhancement of roughly 7.33 BLEU, 0.15 CometKiwi, and 1.66 CometScore compared to its Qwen2.5-14B-Instruct counterpart.
- Notably, DRT-o1-7B also outperforms a larger baseline (QwQ-32B-Preview) by 7.82 BLEU and 1.46 CometScore, underscoring the benefits of incorporating long thought in translation tasks.

Relation to Previous Work

The paper positions its contributions alongside established O1-like models, which have traditionally focused on mathematical and coding tasks requiring complex reasoning. By adapting the chain-of-thought (CoT) paradigm for MT, the proposed framework bridges a gap in applying long reasoning sequences to tasks where cultural and semantic nuances (such as those in literary translation) challenge direct translation techniques.
The multi-agent iterative refinement strategy deviates from methods like Monte Carlo Tree Search (MCTS) and data distillation used in previous studies, instead explicitly modeling feedback loops that simulate human-like revision processes.

Conclusion

The paper presents a methodologically rigorous approach to enhancing MT for literature by leveraging long chain-of-thought reasoning. The formulation of a multi-agent system—comprising translation, critical feedback, and evaluation—enables the synthesis of detailed reasoning paths that guide translation refinements, thereby yielding significant improvements in standard translation quality metrics. This work provides a meaningful extension of chain-of-thought methodologies to the translation domain, particularly under challenging conditions where figurative language demands extra-layer reasoning.

This detailed exposition should provide a comprehensive technical overview of the paper’s methodology, experimental setup, and empirical contributions.

PDF Markdown

Related Papers

GitHub

GitHub - krystalan/DRT-o1: DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought (101 stars)

Tweets

https://twitter.com/aigclink/status/1872113131071447180

https://twitter.com/fly51fly/status/1876021971663892601

https://twitter.com/ceobillionaire/status/1872081725347824044

https://twitter.com/Montreal_AI/status/1872086893191737461

https://twitter.com/GptMaestro/status/1872494714408128974

https://twitter.com/arXivGPT/status/1871980523418956161

YouTube

Show All Videos