Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Finnish Language Modeling with Deep Transformer Models (2003.11562v2)

Published 14 Mar 2020 in cs.CL, cs.LG, cs.SD, eess.AS, and stat.ML

Abstract: Transformers have recently taken the center stage in LLMing after LSTM's were considered the dominant model architecture for a long time. In this project, we investigate the performance of the Transformer architectures-BERT and Transformer-XL for the LLMing task. We use a sub-word model setting with the Finnish language and compare it to the previous State of the art (SOTA) LSTM model. BERT achieves a pseudo-perplexity score of 14.5, which is the first such measure achieved as far as we know. Transformer-XL improves upon the perplexity score to 73.58 which is 27\% better than the LSTM model.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Abhilash Jain (2 papers)
  2. Aku Ruohe (1 paper)
  3. Stig-Arne Grönroos (11 papers)
  4. Mikko Kurimo (27 papers)