Finnish Language Modeling with Deep Transformer Models (2003.11562v2)

Published 14 Mar 2020 in cs.CL, cs.LG, cs.SD, eess.AS, and stat.ML

Abstract: Transformers have recently taken the center stage in LLMing after LSTM's were considered the dominant model architecture for a long time. In this project, we investigate the performance of the Transformer architectures-BERT and Transformer-XL for the LLMing task. We use a sub-word model setting with the Finnish language and compare it to the previous State of the art (SOTA) LSTM model. BERT achieves a pseudo-perplexity score of 14.5, which is the first such measure achieved as far as we know. Transformer-XL improves upon the perplexity score to 73.58 which is 27\% better than the LSTM model.

PDF Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

Authors (4)

Abhilash Jain (2 papers)
Aku Ruohe (1 paper)
Stig-Arne Grönroos (11 papers)
Mikko Kurimo (27 papers)

Finnish Language Modeling with Deep Transformer Models (2003.11562v2)

Related Papers