Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Exploring Swedish & English fastText Embeddings for NER with the Transformer (2007.16007v2)

Published 23 Jul 2020 in cs.CL and cs.LG

Abstract: In this paper, our main contributions are that embeddings from relatively smaller corpora can outperform ones from larger corpora and we make the new Swedish analogy test set publicly available. To achieve a good network performance in NLP downstream tasks, several factors play important roles: dataset size, the right hyper-parameters, and well-trained embeddings. We show that, with the right set of hyper-parameters, good network performance can be reached even on smaller datasets. We evaluate the embeddings at both the intrinsic and extrinsic levels. The embeddings are deployed with the Transformer in named entity recognition (NER) task and significance tests conducted. This is done for both Swedish and English. We obtain better performance in both languages on the downstream task with smaller training data, compared to recently released, Common Crawl versions; and character n-grams appear useful for Swedish, a morphologically rich language.

Citations (4)

Summary

We haven't generated a summary for this paper yet.