Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Accurate Translation via Semantically Appropriate Application of Lexical Constraints (2306.12089v1)

Published 21 Jun 2023 in cs.CL

Abstract: Lexically-constrained NMT (LNMT) aims to incorporate user-provided terminology into translations. Despite its practical advantages, existing work has not evaluated LNMT models under challenging real-world conditions. In this paper, we focus on two important but under-studied issues that lie in the current evaluation process of LNMT studies. The model needs to cope with challenging lexical constraints that are "homographs" or "unseen" during training. To this end, we first design a homograph disambiguation module to differentiate the meanings of homographs. Moreover, we propose PLUMCOT, which integrates contextually rich information about unseen lexical constraints from pre-trained LLMs and strengthens a copy mechanism of the pointer network via direct supervision of a copying score. We also release HOLLY, an evaluation benchmark for assessing the ability of a model to cope with "homographic" and "unseen" lexical constraints. Experiments on HOLLY and the previous test setup show the effectiveness of our method. The effects of PLUMCOT are shown to be remarkable in "unseen" constraints. Our dataset is available at https://github.com/papago-lab/HOLLY-benchmark

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Lingua custodia’s participation at the wmt 2021 machine translation using terminologies shared task. In Proceedings of the Sixth Conference on Machine Translation, pages 799–803.
  2. Guided open vocabulary image captioning with constrained beam search. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 936–945, Copenhagen, Denmark. Association for Computational Linguistics.
  3. Layer normalization. arXiv preprint arXiv:1607.06450.
  4. Toms Bergmanis and Mārcis Pinnis. 2021. Facilitating terminology translation with target lemma annotations. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 3105–3111.
  5. Guiding neural machine translation decoding with external knowledge. In Proceedings of the Second Conference on Machine Translation, pages 157–168, Copenhagen, Denmark. Association for Computational Linguistics.
  6. Lexically constrained neural machine translation with explicit alignment guidance. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 12630–12638.
  7. Lexical-constraint-aware neural machine translation via data augmentation. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, pages 3587–3593.
  8. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186.
  9. Training neural machine translation to apply terminology constraints. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3063–3068.
  10. Pointing the unknown words. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 140–149.
  11. Pointer-based fusion of bilingual lexicons into neural machine translation. arXiv preprint arXiv:1909.07907.
  12. Incorporating copying mechanism in sequence-to-sequence learning. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1631–1640.
  13. Neural machine translation decoding with terminology constraints. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 506–512, New Orleans, Louisiana. Association for Computational Linguistics.
  14. Chris Hokamp and Qun Liu. 2017. Lexically constrained decoding for sequence generation using grid beam search. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1535–1546.
  15. Taku Kudo and John Richardson. 2018. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. EMNLP 2018, page 66.
  16. Integrating domain terminology into neural machine translation. In Proceedings of the 28th International Conference on Computational Linguistics, pages 3925–3937.
  17. compare-mt: A tool for holistic comparison of language generation systems. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), pages 35–41.
  18. Jan Niehues. 2021. Continuous learning in neural machine translation using bilingual dictionaries. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 830–840.
  19. fairseq: A fast, extensible toolkit for sequence modeling. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), pages 48–53.
  20. Ogün Öz and Maria Sukhareva. 2021. Towards precise lexicon integration in neural machine translation. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 1084–1095.
  21. Klue: Korean language understanding evaluation. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2).
  22. Matt Post. 2018. A call for clarity in reporting bleu scores. In Proceedings of the Third Conference on Machine Translation: Research Papers, pages 186–191.
  23. Matt Post and David Vilar. 2018. Fast lexically constrained decoding with dynamic beam allocation for neural machine translation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1314–1324, New Orleans, Louisiana. Association for Computational Linguistics.
  24. Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992.
  25. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1715–1725.
  26. Alignment-enhanced transformer for constraining nmt with pre-specified translations. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 8886–8893.
  27. Code-switching for enhancing nmt with pre-specified translation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 449–459.
  28. Lexically constrained neural machine translation with levenshtein transformer. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 3536–3543.
  29. Attention is all you need. Advances in neural information processing systems, 30.
  30. Pointer networks. Advances in neural information processing systems, 28.
  31. A template-based method for constrained neural machine translation. arXiv preprint arXiv:2205.11255.
  32. Integrating vectorized lexical constraints for neural machine translation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7063–7073.
  33. Acquiring knowledge from pre-trained model to neural machine translation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 9266–9273.
  34. Bert, mbert, or bibert? a study on contextualized embeddings for neural machine translation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6663–6675.
  35. Weijia Xu and Marine Carpuat. 2021a. EDITOR: An edit-based transformer with repositioning for neural machine translation with soft lexical constraints. Transactions of the Association for Computational Linguistics, 9:311–328.
  36. Weijia Xu and Marine Carpuat. 2021b. Rule-based morphological inflection improves neural terminology translation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5902–5914.
  37. Neighbors are not strangers: Improving non-autoregressive translation under low-frequency lexical constraints. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5777–5790, Seattle, United States. Association for Computational Linguistics.
  38. Incorporating bert into neural machine translation. In International Conference on Learning Representations.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yujin Baek (3 papers)
  2. Koanho Lee (3 papers)
  3. Dayeon Ki (10 papers)
  4. Hyoung-Gyu Lee (1 paper)
  5. Cheonbok Park (20 papers)
  6. Jaegul Choo (161 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.