Efficient Machine Translation with a BiLSTM-Attention Approach (2410.22335v2)
Abstract: With the rapid development of NLP technology, the accuracy and efficiency of machine translation have become hot topics of research. This paper proposes a novel Seq2Seq model aimed at improving translation quality while reducing the storage space required by the model. The model employs a Bidirectional Long Short-Term Memory network (Bi-LSTM) as the encoder to capture the context information of the input sequence; the decoder incorporates an attention mechanism, enhancing the model's ability to focus on key information during the translation process. Compared to the current mainstream Transformer model, our model achieves superior performance on the WMT14 machine translation dataset while maintaining a smaller size. The study first introduces the design principles and innovative points of the model architecture, followed by a series of experiments to verify the effectiveness of the model. The experimental includes an assessment of the model's performance on different language pairs, as well as comparative analysis with traditional Seq2Seq models. The results show that while maintaining translation accuracy, our model significantly reduces the storage requirements, which is of great significance for translation applications in resource-constrained scenarios. our code are available at https://github.com/mindspore-lab/models/tree/master/research/arxiv_papers/miniformer. Thanks for the support provided by MindSpore Community.
- Layer normalization. arXiv preprint arXiv:1607.06450.
- Dzmitry Bahdanau. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
- A neural probabilistic language model. journal of machine learning research, 3 (feb): 1137-1155, 2003. Google Scholar Google Scholar Digital Library Digital Library.
- Findings of the 2014 workshop on statistical machine translation. In Proceedings of the ninth workshop on statistical machine translation, pages 12–58.
- The mathematics of statistical machine translation: Parameter estimation. Computational linguistics, 19(2):263–311.
- Class-based n-gram models of natural language. Computational linguistics, 18(4):467–480.
- Kyunghyun Cho. 2014. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.
- Jacob Devlin. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- Alex Graves and Alex Graves. 2012. Long short-term memory. Supervised sequence labelling with recurrent neural networks, pages 37–45.
- William John Hutchins. 1986. Machine translation: past, present, future. Ellis Horwood Chichester.
- William John Hutchins and Harold L Somers. 1992. An introduction to machine translation. (No Title).
- Google’s multilingual neural machine translation system: Enabling zero-shot translation. Transactions of the Association for Computational Linguistics, 5:339–351.
- Adam Lopez. 2008. Statistical machine translation. ACM Computing Surveys (CSUR), 40(3):1–49.
- Minh-Thang Luong. 2015. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025.
- Recurrent neural network based language model. In Interspeech, volume 2, pages 1045–1048. Makuhari.
- Scaling neural machine translation. arXiv preprint arXiv:1806.00187.
- A decomposable attention model for natural language inference. arXiv preprint arXiv:1606.01933.
- On the difficulty of training recurrent neural networks. In International conference on machine learning, pages 1310–1318. Pmlr.
- Improving language understanding by generative pre-training.
- Sequence to sequence learning with neural networks. Advances in neural information processing systems, 27.
- Ashish Vaswani. 2017. Attention is all you need. arXiv preprint arXiv:1706.03762.
- Pay less attention with lightweight and dynamic convolutions. arXiv preprint arXiv:1901.10430.
Collections
Sign up for free to add this paper to one or more collections.