2000 character limit reached
Finnish Language Modeling with Deep Transformer Models (2003.11562v2)
Published 14 Mar 2020 in cs.CL, cs.LG, cs.SD, eess.AS, and stat.ML
Abstract: Transformers have recently taken the center stage in LLMing after LSTM's were considered the dominant model architecture for a long time. In this project, we investigate the performance of the Transformer architectures-BERT and Transformer-XL for the LLMing task. We use a sub-word model setting with the Finnish language and compare it to the previous State of the art (SOTA) LSTM model. BERT achieves a pseudo-perplexity score of 14.5, which is the first such measure achieved as far as we know. Transformer-XL improves upon the perplexity score to 73.58 which is 27\% better than the LSTM model.
- Abhilash Jain (2 papers)
- Aku Ruohe (1 paper)
- Stig-Arne Grönroos (11 papers)
- Mikko Kurimo (27 papers)