Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Novel Method of Extracting Topological Features from Word Embeddings (2003.13074v2)

Published 29 Mar 2020 in cs.LG, cs.CL, math.AT, and stat.ML

Abstract: In recent years, topological data analysis has been utilized for a wide range of problems to deal with high dimensional noisy data. While text representations are often high dimensional and noisy, there are only a few work on the application of topological data analysis in natural language processing. In this paper, we introduce a novel algorithm to extract topological features from word embedding representation of text that can be used for text classification. Working on word embeddings, topological data analysis can interpret the embedding high-dimensional space and discover the relations among different embedding dimensions. We will use persistent homology, the most commonly tool from topological data analysis, for our experiment. Examining our topological algorithm on long textual documents, we will show our defined topological features may outperform conventional text mining features.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Shafie Gholizadeh (5 papers)
  2. Armin Seyeditabari (5 papers)
  3. Wlodek Zadrozny (20 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.