Explanation-Assisted Poetry Machine Translation

Updated 3 November 2025

The paper introduces EAPMT, a novel two-step approach that uses monolingual explanations to guide poetic translation.
The methodology first generates an explanatory scaffold to capture semantic, stylistic, and cultural nuances before translation.
Evaluation demonstrates that EAPMT outperforms traditional MT systems in preserving poetic structure, metaphor, and cultural context.

Explanation-Assisted Poetry Machine Translation (EAPMT) is an advanced methodology in literary machine translation that integrates monolingual, semantic, and cultural explanation to guide the translation of poetry. The defining characteristic of EAPMT is its two-step or multimodal workflow in which a source poem is first explained or paraphrased in its own language, and this explanation then serves as explicit guidance for generating the translation in the target poetic language. This approach addresses key limitations of direct neural and statistical MT systems, especially the difficulty of preserving poeticity, line structure, metaphor, and cultural context across languages.

1. Motivation and Problem Setting

Poetry translation presents intrinsic challenges, including ambiguity, polysemy, and the preservation of structure and figurative language. Conventional machine translation systems (NMT, transformer-based LLMs) typically fail to maintain the deep semantics, imagery, rhythm, line-breaking, and cultural resonance crucial to poetic texts (Wang et al., 5 Jun 2024). EAPMT is motivated by the observation that LLMs, such as ChatGPT and GPT-4, excel at explicative literary analysis but often produce superficial, literal translations when tasked directly with poetic generation.

The EAPMT paradigm leverages the strengths of LLMs in reading comprehension and semantic explanation, using monolingual explanations as an explicit semantic scaffold for subsequent translation. This design enables enhanced preservation of poeticity and nuanced meaning, surpassing traditional prompt engineering and classical MT approaches.

2. Core Methodology and Workflow

The EAPMT workflow is characterized by a two-step prompt-based pipeline, applicable across language pairs and genres:

Explanation Generation: The model receives a prompt to produce an explanation of the source poem—covering both literal and implied meaning in the source language. For example:
1 2
Please provide an explanation for this poem: Poem: {source text}
The explanation includes semantic, stylistic, and cultural analyses, addressing ambiguity, metaphor, and poetic structure.
Translation with Explanation Guidance: The model is then prompted to generate the translation using both the source poem and its explanation as input. Notably, the prompt explicitly instructs the model to base its translation on the explanation:
1 2 3
Please provide the [target language] translation for this poem based on its explanation: Explanation: {model-generated explanation} Poem: {source text}

Zero-shot scenarios (no example translations provided in the prompt) outperform few-shot configurations for modern poetry, confirming that allowing maximal creative flexibility yields optimal translations in this open domain (Wang et al., 5 Jun 2024).

3. Evaluation Frameworks and Metrics

Standard MT evaluation metrics (BLEU, BERTScore, COMET) do not capture poetic quality or adherence to form. EAPMT research refines these evaluation protocols to suit the nuances of poetry translation:

Human Evaluation Criteria (6-point scale):
- Overall Impression (OI)
- Similarity (Sim)
- Fidelity (Fide: intent and deeper meaning)
- Line-breaking (Line)
- Meaningfulness (Mean)
- Poeticity (Poet)
- Accuracy (Acc)
- Errors (Erro)
Automated Evaluation:
- SacreBLEU (BLEU variant), BERTScore, COMET—used for benchmarking but generally considered non-reflective of poetic merit.
- LLM-based scoring: GPT-4 is supplied with definitions for each criterion and produces robust ratings, exhibiting high agreement with professional poet assessments (Wang et al., 5 Jun 2024, Chen et al., 19 Aug 2024).

Specialized genre constraints are enforced using custom metrics. For example, in Vietnamese "luc bat" translation (Huynh et al., 2 Jan 2024), the scoring schema precisely quantifies length conformity, tonal harmony, and rhyme adherence, with mathematical formulas: $\text{score} = \frac{L}{10} + \frac{T \times 3}{10} + \frac{R \times 6}{10}$ where $L, T, R$ encode length, tone, and rhyme sub-scores.

4. Comparative Performance and Key Empirical Results

EAPMT methods consistently surpass direct LLM translation, online machine translation platforms (Google, Baidu, Sogou), and other prompt-based approaches across multiple poetry genres and languages. For modern English–Chinese poetry (Wang et al., 5 Jun 2024):

System	OI	Sim	Fide	Line	Mean	Poet	Acc	Erro
Online System	3.00	3.02	3.15	3.85	3.27	3.08	3.12	3.00
GPT3.5-Best	3.58	3.50	3.60	4.50	4.02	3.82	3.75	3.70
EAPMT-3.5	3.82	3.58	3.67	4.70	4.08	3.93	4.10	3.80
GPT4-Best	3.77	3.53	3.65	4.25	3.97	3.82	3.82	3.70
EAPMT-4.0	4.00	3.60	3.80	4.58	4.15	4.05	4.13	3.87

In Vietnamese luc bat translation (Huynh et al., 2 Jan 2024), finetuned GPT-3 achieves a genre score of 0.805 in text-to-poem and 0.781 in paraphrase-to-poem tasks—substantially above BLOOM-7B (0.678) and zero-shot ChatGPT (0.44 for luc bat).

Qualitative analyses confirm that EAPMT preserves metaphorical resonance, line structure, and aesthetic impact, markedly reducing the frequency of hallucinated or flat translations common in standard LLM or NMT outputs.

EAPMT fundamentally differs from retrieval-augmented machine translation (RAT) (Chen et al., 19 Aug 2024) in that it leverages model-generated or monolingual explanations (semantic, cultural, analytic) rather than externally retrieved, structured knowledge. RAT pipelines integrate diverse knowledge sources—historical, analytical, authorial—and prompt the model to generate and select among translation candidates informed by these perspectives. Empirically, RAT with GPT-4 achieves higher adequacy (LLM-BM=4.1, LLM-Avg=4.0) than EAPMT (LLM-Avg=3.7). This suggests that knowledge-rich retrieval may further augment explanation-based strategies, particularly for classical poetry.

Semantic frameworks such as FrameNet annotations (Chen, 2022) and paratextual explicitation modules (Shen et al., 27 Sep 2025) provide complementary mechanisms for making translations interpretable and culturally mediated. FrameNet maps event structure and role alignment, supporting transparent comparison and informed rewriting. Paratextual apparatus (footnotes, glosses) enable explanation beyond linguistic equivalence, especially for culture-bound terms—a paradigm highly compatible with EAPMT workflows for audience-centered mediation.

6. Implementation Considerations, Pitfalls, and Current Limitations

Prompt Engineering and Model Selection: EAPMT’s effectiveness depends on the careful design of explanation and translation prompts, attuned to both poetic genre and content. Zero-shot approaches are preferred; excessive prompt examples can constrain creativity in open poetic domains (Wang et al., 5 Jun 2024).
Fine-tuning and Data Quality: Finetuning on genre-specific, structurally rich poetic datasets is critical for maintaining synchrony between explanation and translation. Filtering outputs by custom scores (e.g., only luc bat poems with scores >0.9) ensures high-quality training and generation (Huynh et al., 2 Jan 2024).
Interpretability and Auditability: Because explanations can be explicitly tied to translation output, EAPMT offers traceable, controllable generation, vital for research and professional use.
Limitations: Model-generated explanations may themselves misinterpret deep poetic intent, especially in culturally loaded or highly ambiguous texts. The open-endedness of explanation tasks means that automatic evaluation (BLEU, ROUGE, BERTScore) systematically underestimates the plausible diversity and translation quality (Shen et al., 27 Sep 2025).

7. Broader Implications and Future Research Directions

EAPMT broadens the scope of computational literary translation by incorporating explicative and audience-guided mechanisms. It enables not only linguistic equivalence but also cultural mediation, interpretability, and personal adaptation—displacing prescriptive translation in favor of user-contextualized explanation. Future work is likely to focus on integrating retrieval-augmented and audience-adaptive modules, improving domain and genre coverage, and developing robust, aspect-based evaluation metrics capturing the true richness of poetic translation (Chen et al., 19 Aug 2024, Shen et al., 27 Sep 2025).

Key Processes Table

EAPMT Stage	Technique	Purpose
Explanation Generation	Monolingual LLM analysis	Semantic and cultural scaffold
Prompted Translation	Guided LLM generation	Poetic structure and nuance
Evaluation	Custom human + LLM metrics	Quality and adherence
Genre/Style Enforcement	Scoring schema, prompts	Structural fidelity
Integration with Knowledge	Retrieval, paratext	Cultural mediation

EAPMT represents a robust, explanation-driven paradigm, achieving measurable improvements in poetic translation quality and transparency, and interfaces naturally with knowledge augmentation, semantic annotation, and paratextual contextualization. This positions EAPMT as central to ongoing advances in literary machine translation research.