Efficient Machine Translation with a BiLSTM-Attention Approach (2410.22335v2)

Published 29 Oct 2024 in cs.CL

Abstract: With the rapid development of NLP technology, the accuracy and efficiency of machine translation have become hot topics of research. This paper proposes a novel Seq2Seq model aimed at improving translation quality while reducing the storage space required by the model. The model employs a Bidirectional Long Short-Term Memory network (Bi-LSTM) as the encoder to capture the context information of the input sequence; the decoder incorporates an attention mechanism, enhancing the model's ability to focus on key information during the translation process. Compared to the current mainstream Transformer model, our model achieves superior performance on the WMT14 machine translation dataset while maintaining a smaller size. The study first introduces the design principles and innovative points of the model architecture, followed by a series of experiments to verify the effectiveness of the model. The experimental includes an assessment of the model's performance on different language pairs, as well as comparative analysis with traditional Seq2Seq models. The results show that while maintaining translation accuracy, our model significantly reduces the storage requirements, which is of great significance for translation applications in resource-constrained scenarios. our code are available at https://github.com/mindspore-lab/models/tree/master/research/arxiv_papers/miniformer. Thanks for the support provided by MindSpore Community.

References (22)

Collections

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Authors (2)

GitHub

GitHub - mindspore-lab/models (30 stars)

Tweets

https://twitter.com/gastronomy/status/1852171931321065817

HackerNews

Efficient Machine Translation with a BiLSTM-Attention Approach (1 point, 0 comments)