Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TopicsRanksDC: Distance-based Topic Ranking applied on Two-Class Data (2105.07826v1)

Published 17 May 2021 in cs.IR, cs.AI, and cs.LG

Abstract: In this paper, we introduce a novel approach named TopicsRanksDC for topics ranking based on the distance between two clusters that are generated by each topic. We assume that our data consists of text documents that are associated with two-classes. Our approach ranks each topic contained in these text documents by its significance for separating the two-classes. Firstly, the algorithm detects topics using Latent Dirichlet Allocation (LDA). The words defining each topic are represented as two clusters, where each one is associated with one of the classes. We compute four distance metrics, Single Linkage, Complete Linkage, Average Linkage and distance between the centroid. We compare the results of LDA topics and random topics. The results show that the rank for LDA topics is much higher than random topics. The results of TopicsRanksDC tool are promising for future work to enable search engines to suggest related topics.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Malik Yousef (1 paper)
  2. Jamal Al Qundus (5 papers)
  3. Silvio Peikert (4 papers)
  4. Adrian Paschke (48 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.