Reference-less Analysis of Context Specificity in Translation with Personalised Language Models (2303.16618v3)
Abstract: Sensitising LLMs (LMs) to external context helps them to more effectively capture the speaking patterns of individuals with specific characteristics or in particular environments. This work investigates to what extent rich character and film annotations can be leveraged to personalise LMs in a scalable manner. We then explore the use of such models in evaluating context specificity in machine translation. We build LMs which leverage rich contextual information to reduce perplexity by up to 6.5% compared to a non-contextual model, and generalise well to a scenario with no speaker-specific data, relying on combinations of demographic characteristics expressed via metadata. Our findings are consistent across two corpora, one of which (Cornell-rich) is also a contribution of this paper. We then use our personalised LMs to measure the co-occurrence of extra-textual context and translation hypotheses in a machine translation setting. Our results suggest that the degree to which professional translations in our domain are context-specific can be preserved to a better extent by a contextual machine translation model than a non-contextual model, which is also reflected in the contextual model's superior reference-based scores.
- Evaluating discourse phenomena in neural machine translation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1304–1313, New Orleans, Louisiana. Association for Computational Linguistics.
- Cristian Danescu-Niculescu-Mizil and Lillian Lee. 2011. Chameleons in imagined conversations: A new approach to understanding coordination of linguistic style in dialogs. In Proceedings of the 2nd Workshop on Cognitive Modeling and Computational Linguistics, pages 76–87.
- Refocusing on relevance: Personalization in NLG. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5190–5202, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- On the evaluation of machine translation systems trained with back-translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2836–2846, Online. Association for Computational Linguistics.
- Lucie Flek. 2020. Returning the N to NLP: Towards contextually personalized classification models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7828–7838, Online. Association for Computational Linguistics.
- A context-aware language model to improve the speech recognition in air traffic control. Aerospace, 8(11).
- Dirk Hovy. 2015. Demographic factors improve classification performance. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 752–762, Beijing, China. Association for Computational Linguistics.
- Enriching cold start personalized language model using social network information. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 611–617, Baltimore, Maryland. Association for Computational Linguistics.
- Cross-lingual syntactic variation over age and gender. In Proceedings of the Nineteenth Conference on Computational Natural Language Learning, pages 103–112, Beijing, China. Association for Computational Linguistics.
- CTRL: A conditional transformer language model for controllable generation. arXiv, pages 1–18.
- Milton King and Paul Cook. 2020. Evaluating approaches to personalizing language models. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 2461–2469, Marseille, France. European Language Resources Association.
- OpenSubtitles2018: Statistical rescoring of sentence alignments in large, noisy parallel corpora. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA).
- Document-level neural MT: A systematic comparison. In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, pages 225–234, Lisboa, Portugal. European Association for Machine Translation.
- Human centered NLP with user-factor adaptation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1146–1155, Copenhagen, Denmark. Association for Computational Linguistics.
- Paul Michel and Graham Neubig. 2018. Extreme adaptation for personalized neural machine translation. ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers), 2:312–318.
- Trudy Milburn. 2004. Speech community: Reflections upon communication. Annals of the International Communication Association, 28(1):411–441.
- UserIdentifier: Implicit user representations for simple and effective personalized sentiment analysis. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3449–3456, Seattle, United States. Association for Computational Linguistics.
- A large-scale test set for the evaluation of context-aware pronoun translation in neural machine translation. In Proceedings of the Third Conference on Machine Translation: Research Papers, pages 61–72, Brussels, Belgium. Association for Computational Linguistics.
- CUE vectors: Modular training of language models conditioned on diverse contextual signals. In Findings of the Association for Computational Linguistics: ACL 2022, pages 3368–3379, Dublin, Ireland. Association for Computational Linguistics.
- Matt Post and Marcin Junczys-Dowmunt. 2023. Escaping the sentence-level paradigm in machine translation.
- Methods and metrics for cold-start recommendations. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’02, page 253–260, New York, NY, USA. Association for Computing Machinery.
- Londa Schiebinger. 2014. Scientific research must take gender into account. Nature, 507(7490):9.
- Controlling politeness in neural machine translation via side constraints. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 35–40, San Diego, California. Association for Computational Linguistics.
- Simple fusion: Return of the language model. In Proceedings of the Third Conference on Machine Translation: Research Papers, pages 204–211, Brussels, Belgium. Association for Computational Linguistics.
- Amane Sugiyama and Naoki Yoshinaga. 2021. Context-aware decoder for neural machine translation using a target-side document-level language model. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5781–5791, Online. Association for Computational Linguistics.
- Sebastian Vincent. 2023. Context-Based Personalisation in Neural Machine Translation of Dialogue. PhD thesis, University of Sheffield, Sheffield, UK.
- Controlling formality in low-resource NMT with domain adaptation and re-ranking: SLT-CDT-UoS at IWSLT2022. In Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022), pages 341–350, Dublin, Ireland (in-person and online). Association for Computational Linguistics.
- MTCue: Learning zero-shot control of extra-textual attributes by leveraging unstructured context in neural machine translation. In Findings of the Association for Computational Linguistics: ACL 2023, pages 8210–8226, Toronto, Canada. Association for Computational Linguistics.
- Controlling extra-textual attributes about dialogue participants: A case study of English-to-Polish neural machine translation. In Proceedings of the 23rd Annual Conference of the European Association for Machine Translation, pages 121–130, Ghent, Belgium. European Association for Machine Translation.
- When a good translation is wrong in context: Context-aware machine translation improves on deixis, ellipsis, and lexical cohesion. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1198–1212, Florence, Italy. Association for Computational Linguistics.
- Leveraging similar users for personalized language modeling with limited data. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1742–1752, Dublin, Ireland. Association for Computational Linguistics.
- Compositional demographic word embeddings. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4076–4089, Online. Association for Computational Linguistics.
- Automatic generation of personalized comment based on user profile. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pages 229–235, Florence, Italy. Association for Computational Linguistics.
- Sebastian Vincent (3 papers)
- Alice Dowek (1 paper)
- Rowanne Sumner (1 paper)
- Charlotte Blundell (1 paper)
- Emily Preston (1 paper)
- Chris Bayliss (3 papers)
- Chris Oakley (2 papers)
- Carolina Scarton (52 papers)