MAATS: A Multi-Agent Automated Translation System Based on MQM Evaluation (2505.14848v1)

Published 20 May 2025 in cs.CL, cs.LG, and cs.MA

Abstract: We present MAATS, a Multi Agent Automated Translation System that leverages the Multidimensional Quality Metrics (MQM) framework as a fine-grained signal for error detection and refinement. MAATS employs multiple specialized AI agents, each focused on a distinct MQM category (e.g., Accuracy, Fluency, Style, Terminology), followed by a synthesis agent that integrates the annotations to iteratively refine translations. This design contrasts with conventional single-agent methods that rely on self-correction. Evaluated across diverse language pairs and LLMs, MAATS outperforms zero-shot and single-agent baselines with statistically significant gains in both automatic metrics and human assessments. It excels particularly in semantic accuracy, locale adaptation, and linguistically distant language pairs. Qualitative analysis highlights its strengths in multi-layered error diagnosis, omission detection across perspectives, and context-aware refinement. By aligning modular agent roles with interpretable MQM dimensions, MAATS narrows the gap between black-box LLMs and human translation workflows, shifting focus from surface fluency to deeper semantic and contextual fidelity.

Summary

Overview of MAATS: A Multi-Agent Automated Translation System Based on MQM Evaluation

The paper introduces the Multi-Agent Automated Translation System (MAATS), a sophisticated translation framework designed to enhance machine translation quality using the Multidimensional Quality Metrics (MQM) framework. MAATS differentiates itself from conventional single-agent methods by employing multiple specialized agents, each dedicated to a specific MQM category, thereby simulating a professional translation workflow. The system capitalizes on the collaboration among these agents, prioritizing the synthesis of annotations through a central agent to iteratively refine translations.

The architecture of MAATS is built around modularity, with agents focusing on core MQM dimensions such as Accuracy, Fluency, Style, and Terminology. The synthesis agent then consolidates these annotations, ensuring priorities are set for correction, enhancing the translation's semantic and contextual fidelity. This modular approach contrasts with traditional single-agent models that rely on self-correction, which often fall short due to inherent biases and lack of diverse expertise.

Quantitative results reveal that MAATS consistently outperforms baseline methods, including zero-shot and single-agent models across various language pairs. The system demonstrates significant gains in automatic metrics like BLEU and COMET as well as human assessments, especially for linguistically distant language pairs. These improvements indicate the efficacy of MAATS in detecting and correcting critical errors, notably in semantic accuracy and locale adaptation.

A detailed qualitative analysis underscores the strengths of MAATS in error diagnosis, highlighting its ability to detect omissions and refine content with contextual awareness. Human translators ranked MAATS higher than baseline systems, validating its performance against human standards. Confusion matrices further illustrate the system's advantage in boosting true positives and reducing false negatives compared to human-labeled MQM reference data.

The implications of MAATS are considerable for both practical and theoretical advancements in AI-driven translation. By narrowing the gap between machine outputs and human translation workflows, MAATS elevates the role of AI in facilitating high-quality translations that align with human criteria. This framework could pave the way for broader applications of multi-agent systems in other structured NLP tasks, promoting robust and scalable solutions.

Despite its promising results, MAATS exhibits limitations, primarily in capturing emotional tone and pragmatic nuances inherent in natural languages. The MQM framework lacks dimensions to evaluate these aspects, pointing to areas for further research and refinement. Future directions may involve expanding the MQM or integrating complementing methods like discourse modeling to better assess affective content.

In conclusion, MAATS represents a meaningful progression in aligning AI translation systems with human professional workflows. Its innovative use of specialized agents and the MQM framework establishes a platform for improving machine translation quality, which could influence both current practices and future developments within translational AI systems. The multi-agent design offers a blueprint for enhancing interpretability and performance, potentially influencing other domains in AI.