Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A novel sentence embedding based topic detection method for micro-blog (2006.09977v1)

Published 10 Jun 2020 in cs.IR, cs.CL, cs.LG, and stat.ML

Abstract: Topic detection is a challenging task, especially without knowing the exact number of topics. In this paper, we present a novel approach based on neural network to detect topics in the micro-blogging dataset. We use an unsupervised neural sentence embedding model to map the blogs to an embedding space. Our model is a weighted power mean word embedding model, and the weights are calculated by attention mechanism. Experimental result shows our embedding method performs better than baselines in sentence clustering. In addition, we propose an improved clustering algorithm referred as relationship-aware DBSCAN (RADBSCAN). It can discover topics from a micro-blogging dataset, and the topic number depends on dataset character itself. Moreover, in order to solve the problem of parameters sensitive, we take blog forwarding relationship as a bridge of two independent clusters. Finally, we validate our approach on a dataset from sina micro-blog. The result shows that we can detect all the topics successfully and extract keywords in each topic.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Cong Wan (4 papers)
  2. Shan Jiang (61 papers)
  3. Cuirong Wang (2 papers)
  4. Cong Wang (310 papers)
  5. Changming Xu (7 papers)
  6. Xianxia Chen (1 paper)
  7. Ying Yuan (95 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.