Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Comparative study of LSA vs Word2vec embeddings in small corpora: a case study in dreams database (1610.01520v2)

Published 5 Oct 2016 in cs.CL and cs.IR

Abstract: Word embeddings have been extensively studied in large text datasets. However, only a few studies analyze semantic representations of small corpora, particularly relevant in single-person text production studies. In the present paper, we compare Skip-gram and LSA capabilities in this scenario, and we test both techniques to extract relevant semantic patterns in single-series dreams reports. LSA showed better performance than Skip-gram in small size training corpus in two semantic tests. As a study case, we show that LSA can capture relevant words associations in dream reports series, even in cases of small number of dreams or low-frequency words. We propose that LSA can be used to explore words associations in dreams reports, which could bring new insight into this classic research area of psychology

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Edgar Altszyler (10 papers)
  2. Mariano Sigman (16 papers)
  3. Sidarta Ribeiro (5 papers)
  4. Diego Fernández Slezak (1 paper)
Citations (71)

Summary

We haven't generated a summary for this paper yet.