Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 81 tok/s

Gemini 2.5 Pro 48 tok/s Pro

GPT-5 Medium 32 tok/s Pro

GPT-5 High 32 tok/s Pro

GPT-4o 99 tok/s Pro

Kimi K2 195 tok/s Pro

GPT OSS 120B 462 tok/s Pro

Claude Sonnet 4.5 35 tok/s Pro

2000 character limit reached

Learning from others' mistakes: Finetuning machine translation models with span-level error annotations (2410.16509v1)

Published 21 Oct 2024 in cs.CL and cs.LG

Abstract: Despite growing interest in incorporating feedback to improve LLMs, most efforts focus only on sequence-level annotations. In this work, we explore the potential of utilizing fine-grained span-level annotations from offline datasets to improve model quality. We develop a simple finetuning algorithm, called Training with Annotations (TWA), to directly train machine translation models on such annotated data. TWA utilizes targeted span-level error information while also flexibly learning what to penalize within a span. Moreover, TWA considers the overall trajectory of a sequence when deciding which non-error spans to utilize as positive signals. Experiments on English-German and Chinese-English machine translation show that TWA outperforms baselines such as Supervised FineTuning on sequences filtered for quality and Direct Preference Optimization on pairs constructed from the same data.

Citations (1)

View on Semantic Scholar

Summary

The paper introduces TWA, a novel finetuning method that uses weighted span-level unlikelihood loss to address errors more precisely.
The approach differentiates error and non-error spans, optimizing with cross-entropy loss for better model adjustments in translation tasks.
Empirical results on English-German and Chinese-English tasks show that ignoring off-trajectory tokens post-error significantly improves translation quality.

Finetuning Machine Translation Models with Span-Level Error Annotations

The paper introduces a novel approach called "Training with Annotations" (TWA) designed to enhance machine translation models via span-level error annotations. This method aims to improve upon the limitations of traditional sequence-level annotations by leveraging more granular error data.

Key Concepts and Methodology

The conventional practice in refining machine translation systems often involves sequence-level annotations, typically using scalar scores for entire outputs. In contrast, TWA utilizes span-level annotations, which provide more detailed error information, categorized by type (e.g., fluency, accuracy) and severity (e.g., major, minor).

Training with Annotations (TWA): TWA is a finetuning strategy that considers both error and non-error spans identified in span-level annotations. The core innovation lies in applying a weighted span-level unlikelihood loss to error spans, encouraging the model to learn which specific tokens within an error span should have decreased probabilities. Conversely, non-error tokens preceding errors are optimized using a typical cross-entropy loss, while off-trajectory tokens following an error are ignored to reduce noise.

Empirical Evaluation

TWA was evaluated on English-German and Chinese-English translation tasks, utilizing datasets from the MQM data associated with WMT Shared Tasks. Compared to baselines like Supervised FineTuning (SFT) and Direct Preference Optimization (DPO), TWA demonstrated superior performance. Key findings include:

Improved Quality: TWA consistently outperformed SFT and DPO, highlighting the potential advantages of span-level over sequence-level annotations.
Effectiveness of Error Handling: The use of span-level unlikelihood loss enabled more precise adjustments in the model, leading to better handling of errors without manual heuristic development.
Impact of Ignoring Off-Trajectory Tokens: In the case of English-German translation, significant gains were noted when ignoring tokens immediately following an error span, suggesting that such tokens may introduce irrelevant or misleading signals.

Implications and Future Directions

The paper emphasizes the importance of moving beyond high-quality human-written examples, especially as models increasingly match or surpass human reference translations. By integrating span-level data, TWA unlocks the potential for more nuanced model improvement strategies.

Broader Applications: While demonstrated in machine translation, TWA's applicability could extend to other domains where fine-grained error data can be collected. This could be particularly impactful in fields requiring high precision, such as medical text translation or legal document processing.

Further Developments: Future research may explore the integration of TWA with other advanced metrics and its application to live data scenarios. Additionally, refining the understanding of when and how off-trajectory tokens contribute to noise versus useful signal could lead to more tailored finetuning strategies.

Overall, TWA presents a significant step forward in leveraging detailed annotations for machine learning model enhancement, offering a promising direction for future AI developments.