Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 105 tok/s
Gemini 2.5 Pro 52 tok/s Pro
GPT-5 Medium 45 tok/s
GPT-5 High 34 tok/s Pro
GPT-4o 108 tok/s
GPT OSS 120B 473 tok/s Pro
Kimi K2 218 tok/s Pro
2000 character limit reached

Efficient Machine Translation with a BiLSTM-Attention Approach (2410.22335v2)

Published 29 Oct 2024 in cs.CL

Abstract: With the rapid development of NLP technology, the accuracy and efficiency of machine translation have become hot topics of research. This paper proposes a novel Seq2Seq model aimed at improving translation quality while reducing the storage space required by the model. The model employs a Bidirectional Long Short-Term Memory network (Bi-LSTM) as the encoder to capture the context information of the input sequence; the decoder incorporates an attention mechanism, enhancing the model's ability to focus on key information during the translation process. Compared to the current mainstream Transformer model, our model achieves superior performance on the WMT14 machine translation dataset while maintaining a smaller size. The study first introduces the design principles and innovative points of the model architecture, followed by a series of experiments to verify the effectiveness of the model. The experimental includes an assessment of the model's performance on different language pairs, as well as comparative analysis with traditional Seq2Seq models. The results show that while maintaining translation accuracy, our model significantly reduces the storage requirements, which is of great significance for translation applications in resource-constrained scenarios. our code are available at https://github.com/mindspore-lab/models/tree/master/research/arxiv_papers/miniformer. Thanks for the support provided by MindSpore Community.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. Layer normalization. arXiv preprint arXiv:1607.06450.
  2. Dzmitry Bahdanau. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
  3. A neural probabilistic language model. journal of machine learning research, 3 (feb): 1137-1155, 2003. Google Scholar Google Scholar Digital Library Digital Library.
  4. Findings of the 2014 workshop on statistical machine translation. In Proceedings of the ninth workshop on statistical machine translation, pages 12–58.
  5. The mathematics of statistical machine translation: Parameter estimation. Computational linguistics, 19(2):263–311.
  6. Class-based n-gram models of natural language. Computational linguistics, 18(4):467–480.
  7. Kyunghyun Cho. 2014. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.
  8. Jacob Devlin. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  9. Alex Graves and Alex Graves. 2012. Long short-term memory. Supervised sequence labelling with recurrent neural networks, pages 37–45.
  10. William John Hutchins. 1986. Machine translation: past, present, future. Ellis Horwood Chichester.
  11. William John Hutchins and Harold L Somers. 1992. An introduction to machine translation. (No Title).
  12. Google’s multilingual neural machine translation system: Enabling zero-shot translation. Transactions of the Association for Computational Linguistics, 5:339–351.
  13. Adam Lopez. 2008. Statistical machine translation. ACM Computing Surveys (CSUR), 40(3):1–49.
  14. Minh-Thang Luong. 2015. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025.
  15. Recurrent neural network based language model. In Interspeech, volume 2, pages 1045–1048. Makuhari.
  16. Scaling neural machine translation. arXiv preprint arXiv:1806.00187.
  17. A decomposable attention model for natural language inference. arXiv preprint arXiv:1606.01933.
  18. On the difficulty of training recurrent neural networks. In International conference on machine learning, pages 1310–1318. Pmlr.
  19. Improving language understanding by generative pre-training.
  20. Sequence to sequence learning with neural networks. Advances in neural information processing systems, 27.
  21. Ashish Vaswani. 2017. Attention is all you need. arXiv preprint arXiv:1706.03762.
  22. Pay less attention with lightweight and dynamic convolutions. arXiv preprint arXiv:1901.10430.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Authors (2)

Github Logo Streamline Icon: https://streamlinehq.com

GitHub

X Twitter Logo Streamline Icon: https://streamlinehq.com

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube