Pluggable Neural Machine Translation Models via Memory-augmented Adapters (2307.06029v3)
Abstract: Although neural machine translation (NMT) models perform well in the general domain, it remains rather challenging to control their generation behavior to satisfy the requirement of different users. Given the expensive training cost and the data scarcity challenge of learning a new model from scratch for each user requirement, we propose a memory-augmented adapter to steer pretrained NMT models in a pluggable manner. Specifically, we construct a multi-granular memory based on the user-provided text samples and propose a new adapter architecture to combine the model representations and the retrieved results. We also propose a training strategy using memory dropout to reduce spurious dependencies between the NMT model and the memory. We validate our approach on both style- and domain-specific experiments and the results indicate that our method can outperform several representative pluggable baselines.
- Roee Aharoni and Yoav Goldberg. 2020. Unsupervised domain clusters in pretrained language models. In Proceedings of ACL 2020, pages 7747–7763.
- Neural machine translation by jointly learning to align and translate. In Proceedings of ICLR 2015.
- Ankur Bapna and Orhan Firat. 2019. Simple, scalable adaptation for neural machine translation. In Proceedings of EMNLP-IJCNLP 2019, pages 1538–1548.
- Findings of the 2020 conference on machine translation (WMT20). In Proceedings of the 5th Conference on Machine Translation, pages 1–55.
- Findings of the 2014 workshop on statistical machine translation. In Proceedings of the 9th Workshop on Statistical Machine Translation, pages 12–58.
- Improving language models by retrieving from trillions of tokens. In Proceedings of ICML 2022, pages 2206–2240.
- Mask-align: Self-supervised neural word alignment. In Proceedings of ACL-IJCNLP 2021, pages 4781–4791.
- Augmenting pre-trained language models with qa-memory for open-domain question answering. arXiv preprint arXiv:2204.04581.
- Chenhui Chu and Rui Wang. 2018. A survey of domain adaptation for neural machine translation. In Proceedings of COLING 2018, pages 1304–1319.
- Transformer-XL: Attentive language models beyond a fixed-length context. In Proceedings of ACL 2019, pages 2978–2988.
- Plug and play language models: A simple approach to controlled text generation. In Proceedings of ICLR 2020.
- Mention memory: incorporating textual knowledge into transformers through entity mention attention. In Proceedings of ICLR 2022.
- Delta tuning: A comprehensive study of parameter efficient methods for pre-trained language models. arXiv preprint arXiv:2203.06904.
- A simple, fast, and effective reparameterization of IBM model 2. In Proceedings of NAACL 2013, pages 644–648.
- Beyond english-centric multilingual machine translation. Journal of Machine Learning Research, 22(107):1–48.
- Joseph L Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychological bulletin, 76(5):378.
- Achieving human parity on automatic chinese to english news translation. CoRR, abs/1803.05567.
- Efficient nearest neighbor language models. In Proceedings of EMNLP 2021, pages 5703–5714.
- Fast and accurate neural machine translation with translation memory. In Proceedings of ACL-IJCNLP 2021, pages 3170–3180.
- John Hewitt and Percy Liang. 2019. Designing and interpreting probes with control tasks. In Proceedings of EMNLP-IJCNLP 2019, pages 2733–2743.
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, 9(8):1735–1780.
- Parameter-efficient transfer learning for NLP. In Proceedings of ICML 2019, pages 2790–2799.
- LoRA: Low-rank adaptation of large language models. In Proceedings of ICLR 2022.
- Domain adaptation of neural machine translation by lexicon induction. In Proceedings of ACL 2019, pages 2989–3001.
- Text style transfer: A review and experimental evaluation. ACM SIGKDD Explorations Newsletter, 24(1):14–45.
- Learning graph neural networks for image style transfer. In ECCV 2022, pages 111–128. Springer.
- Billion-scale similarity search with gpus. IEEE Transactions on Big Data, 7:535–547.
- CipherDAug: Ciphertext based data augmentation for neural machine translation. In Proceedings of ACL 2022, pages 201–218.
- ARAML: A stable adversarial training framework for text generation. In Proceedings of EMNLP-IJCNLP 2019, pages 4271–4281.
- CTRL - A Conditional Transformer Language Model for Controllable Generation. arXiv preprint arXiv:1909.05858.
- Nearest neighbor machine translation. In Proceedings of ICLR 2021.
- Generalization through memorization: Nearest neighbor language models. In Proceedings of ICLR 2020.
- Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of EMNLP 2014, pages 1746–1751.
- Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
- Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526.
- Findings of the 2022 conference on machine translation (WMT22). In Proceedings of the 7th Conference on Machine Translation, pages 1–45.
- Retrieval-augmented generation for knowledge-intensive nlp tasks. In Advances of NeurIPS 2020, pages 9459–9474.
- Learning multiscale transformer models for sequence generation. In Proceedings of ICML 2022, pages 13225–13241.
- Delete, retrieve, generate: a simple approach to sentiment and style transfer. In Proceedings of NAACL 2018, pages 1865–1874.
- Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of ACL 2021, pages 4582–4597.
- DExperts: Decoding-time controlled text generation with experts and anti-experts. In Proceedings of ACL-IJCNLP 2021, pages 6691–6706.
- Multilingual denoising pre-training for neural machine translation. Transactions of the ACL, 8:726–742.
- Minh-Thang Luong and Christopher Manning. 2015. Stanford neural machine translation systems for spoken language domains. In Proceedings of the 12th IWSLT: Evaluation Campaign, pages 76–79.
- UniPELT: A unified framework for parameter-efficient language model tuning. In Proceedings of ACL 2022, pages 6253–6264.
- Paul Michel and Graham Neubig. 2018. Extreme adaptation for personalized neural machine translation. In Proceedings of ACL 2018, pages 312–318.
- Xing Niu and Marine Carpuat. 2020. Controlling neural machine translation formality with synthetic supervision. In Proceedings of AAAI 2020, pages 8568–8575.
- A study of style in machine translation: Controlling the formality of machine translation output. In Proceedings of EMNLP 2017, pages 2814–2819.
- Multi-task neural models for translating between styles within and across languages. In Proceedings of COLING 2018, pages 1008–1021.
- No language left behind: Scaling human-centered machine translation. arXiv preprint arXiv:2207.04672.
- OpenAI. 2022. Introducing ChatGPT. (Accessed on Jun 18, 2023).
- Bleu: a method for automatic evaluation of machine translation. In Proceedings of ACL 2002, pages 311–318.
- Language models as knowledge bases? In Proceedings of EMNLP-IJCNLP, pages 2463–2473.
- AdapterFusion: Non-destructive task composition for transfer learning. In Proceedings of EACL 2021, pages 487–503.
- Matt Post. 2018. A call for clarity in reporting bleu scores. In Proceedings of the 3rd Conference on Machine Translation: Research Papers, pages 186–191.
- AdapterDrop: On the efficiency of adapters in transformers. In Proceedings of EMNLP 2021, pages 7930–7946.
- Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of CVPR 2023, pages 22500–22510.
- DRAG: Director-generator language modelling framework for non-parallel author stylized rewriting. In Proceedings of EACL 2021, pages 863–873.
- Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(56):1929–1958.
- Adapting language models for non-parallel author-stylized rewriting. In Proceedings of AAAI 2020, pages 9008–9015.
- Attention is all you need. In Advances of NeurIPS 2017, pages 5998–6008.
- The bottom-up evolution of representations in the transformer: A study with machine translation and language modeling objectives. In Proceedings of EMNLP-IJCNLP 2019, pages 4396–4406.
- Thuy Vu and Alessandro Moschitti. 2021. Machine translation customization via automatic training data selection from the web. CoRR, abs/2102.10243.
- On the language coverage bias for neural machine translation. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 4778–4790.
- R-drop: Regularized dropout for neural networks. In Advances of NeurIPS 2021, pages 10890–10905.
- Improving stylized neural machine translation with iterative dual knowledge transfer. In Proceedings of IJCAI 2021, pages 3971–3977.
- An efficient memory-augmented transformer for knowledge-intensive nlp tasks. arXiv preprint arXiv:2210.16773.
- Kevin Yang and Dan Klein. 2021. FUDGE: Controlled text generation with future discriminators. In Proceedings of NAACL 2021, pages 3511–3535.
- Adaptive semiparametric language models. Transactions of the Association for Computational Linguistics, 9:362–373.
- Improving the transformer translation model with document-level context. In Proceedings of EMNLP 2018, pages 533–542.
- Building user-oriented personalized machine translator based on user-generated textual content. Proceedings of the ACM on Human-Computer Interaction, 6(CSCW2):1–26.
- Non-parametric unsupervised domain adaptation for neural machine translation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4234–4241.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days freePaper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.