Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Training Neural Machine Translation (NMT) Models using Tensor Train Decomposition on TensorFlow (T3F) (1911.01933v1)

Published 5 Nov 2019 in cs.LG, cs.CL, and stat.ML

Abstract: We implement a Tensor Train layer in the TensorFlow Neural Machine Translation (NMT) model using the t3f library. We perform training runs on the IWSLT English-Vietnamese '15 and WMT German-English '16 datasets with learning rates $\in {0.0004,0.0008,0.0012}$, maximum ranks $\in {2,4,8,16}$ and a range of core dimensions. We compare against a target BLEU test score of 24.0, obtained by our benchmark run. For the IWSLT English-Vietnamese training, we obtain BLEU test/dev scores of 24.0/21.9 and 24.2/21.9 using core dimensions $(2, 2, 256) \times (2, 2, 512)$ with learning rate 0.0012 and rank distributions $(1,4,4,1)$ and $(1,4,16,1)$ respectively. These runs use 113\% and 397\% of the flops of the benchmark run respectively. We find that, of the parameters surveyed, a higher learning rate and more `rectangular' core dimensions generally produce higher BLEU scores. For the WMT German-English dataset, we obtain BLEU scores of 24.0/23.8 using core dimensions $(4, 4, 128) \times (4, 4, 256)$ with learning rate 0.0012 and rank distribution $(1,2,2,1)$. We discuss the potential for future optimization and application of Tensor Train decomposition to other NMT models.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Amelia Drew (8 papers)
  2. Alexander Heinecke (21 papers)