Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Generalizing Word Embeddings using Bag of Subwords (1809.04259v1)

Published 12 Sep 2018 in cs.CL and cs.LG

Abstract: We approach the problem of generalizing pre-trained word embeddings beyond fixed-size vocabularies without using additional contextual information. We propose a subword-level word vector generation model that views words as bags of character $n$-grams. The model is simple, fast to train and provides good vectors for rare or unseen words. Experiments show that our model achieves state-of-the-art performances in English word similarity task and in joint prediction of part-of-speech tag and morphosyntactic attributes in 23 languages, suggesting our model's ability in capturing the relationship between words' textual representations and their embeddings.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Jinman Zhao (20 papers)
  2. Sidharth Mudgal (6 papers)
  3. Yingyu Liang (107 papers)
Citations (44)

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com