- The paper finds that human translation achieves the highest creativity scores compared to post-editing and MT in literary texts.
- The study employs controlled experiments with English-Catalan and English-Dutch translations, using error analysis and creative shift counts to assess novelty and acceptability.
- The interviews reveal that reliance on MT constrains creative problem-solving, shifting the translator’s role from creator to evaluator.
The paper investigates the impact of neural machine translation (NMT) on translator creativity in literary texts, focusing on novelty and acceptability. The authors quantitatively analyze creativity in translations from English to Catalan and Dutch, comparing machine translation (MT), post-editing (PE), and human translation (HT) modalities. The paper builds upon previous work analyzing the creativity of translations of a short fictional text and expands upon it by using a text with higher creative potential, testing in multiple translation directions, and using multiple professional translators as judges.
The authors start by defining creativity based on prior work in psychology and translation studies, describing it as a combination of novelty and acceptability. They mention Bayer-Hohenwarter's operationalization of creativity, which involves acceptability, flexibility, novelty, and fluency. The paper also reviews recent research in NMT and literary translation, which investigates the usability of NMT in translating literary texts, its effect on translators and translation students, and professionals' opinions on technology.
The methodology revolves around translating Kurt Vonnegut's "2BR02B" from English to Catalan and Dutch using MT, PE, and HT. The MT modality leverages state-of-the-art literary-adapted neural MT systems based on the Transformer architecture (Vaswani et al., 2017). The English-to-Catalan system was trained with over 130 novels in English and their Catalan translations, around 1000 books in Catalan, and over 4 million sentence pairs. The English-to-Dutch system was trained with 500 English novels and their Dutch translations, totaling approximately 5 million sentence pairs.
Four professional literary translators provided the HT and PE versions using the PET post-editing tool (Aziz et al., 2012), segmented by paragraph. Translators were split such that they translated half the text from scratch and post-edited the other half in order to control for translator style. Five professional literary translators then reviewed the translated texts, unaware of the translation modality used for each text.
The authors quantify creativity by combining acceptability and novelty into a single score. Acceptability is measured using the harmonized DQF-MQM Framework, classifying errors into categories such as Accuracy, Fluency, Terminology, Style, Design, Locale Convention, Verity, and Other, with severity levels of Neutral, Minor, Major, and Critical. Reviewers also annotate exceptionally good translation solutions with Kudos. Novelty (or flexibility) is assessed by identifying Units of Creative Potential (UCP) in the source text (ST) and classifying how they are handled in the target texts (TT). The number of creative shifts (CS) are then counted.
The creativity score is then calculated using the following formula:
creativity score=#UCPs#CSs−#words in ST#error points−#Kudos∗100
Where:
- #CSs is the number of creative shifts
- #UCPs is the number of units of creative potential
- #error points is the number of error points
- #Kudos is the number of kudos given
- #words in ST is the number of words in the source text
Post-task semi-structured interviews were conducted to gather more information about the translation and reviewing activities.
The results section covers the translation process, the review process, and the analysis of interviews. The Human-Targeted Translation Error Rate (HTER) is used to analyze the final translations, which is an automatic score that reflects the number of edits performed on the MT output normalized by the number of words in the sentence. The HTER results indicate that HT is further from the MT output than PE in both languages. The Catalan translators made more changes than the Dutch translators for PE. Analysis of the data from PET shows differences between modalities and languages. The average edit time is similar between modalities, but Dutch translators took longer to translate and post-edit than Catalan translators. The average number of keystrokes is higher in HT than in PE. The average number of pauses is also higher in HT than in PE, which the authors hypothesize indicates that translators 'think harder' for a more creative solution, as pauses are an indication of the 'incubation' period in creative translation.
In the review process, the reviewers rated HT as an Extremely good translation, MT as an Extremely bad translation, and PE as a Neither good nor bad translation. The error analysis shows that MT contains more errors than the other two modalities combined. The categories with the most errors are Accuracy, Fluency, and Style. The inter-annotator agreement (Fleiss Kappa) was run in the different modalities and languages according to the errors marked. The reviewers agreed on the overall quality of the translations, but they did not always agree if there was an error or on the type of error in a given sentence. The HT has the highest number of CSs, followed by PE and MT. Inter-annotator agreement (Fleiss Kappa) was run in the different modalities and languages to see if reviewers agree on those UCPs where there was a CS. The agreement was fair to good in all cases except for Dutch MT, for which it was poor. The creativity scores show that HT has the highest creativity score in both languages.
The analysis of the interviews with translators and reviewers revealed several themes, including: MT can only be partially useful in literary translation; creativity is a problem-solving selection process; the proposals act as a constraint to creativity; the delicate equilibrium of reviewing; the ideal tool is human-centered; the technology or modality impact depends on the type of reader; the final translation as a product of many collaborative steps; and the grim future of translation and technology.
The authors conclude that HT scores higher for creativity than PE and MT, and using MT as part of the translation (modality PE) results in a less creative literary translation. They define creative translation as the process of identifying and understanding a problem in the source text, generating several new and elegant solutions that depart from the source text and choosing the one that best fits the target text and culture to provide the reader the same experience as that of the source reader. They suggest that using MT hinders the effectiveness of the translation process, because the translator becomes the evaluator and not the creator.