Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Character-Word Compositional Neural Language Model for Finnish (1612.03266v1)

Published 10 Dec 2016 in cs.CL

Abstract: Inspired by recent research, we explore ways to model the highly morphological Finnish language at the level of characters while maintaining the performance of word-level models. We propose a new Character-to-Word-to-Character (C2W2C) compositional LLM that uses characters as input and output while still internally processing word level embeddings. Our preliminary experiments, using the Finnish Europarl V7 corpus, indicate that C2W2C can respond well to the challenges of morphologically rich languages such as high out of vocabulary rates, the prediction of novel words, and growing vocabulary size. Notably, the model is able to correctly score inflectional forms that are not present in the training data and sample grammatically and semantically correct Finnish sentences character by character.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Matti Lankinen (1 paper)
  2. Hannes Heikinheimo (4 papers)
  3. Pyry Takala (3 papers)
  4. Tapani Raiko (17 papers)
  5. Juha Karhunen (5 papers)
Citations (7)

Summary

We haven't generated a summary for this paper yet.