Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Affinity, Rationality, and Diversity of Hierarchical Topic Modeling (2401.14113v2)

Published 25 Jan 2024 in cs.CL

Abstract: Hierarchical topic modeling aims to discover latent topics from a corpus and organize them into a hierarchy to understand documents with desirable semantic granularity. However, existing work struggles with producing topic hierarchies of low affinity, rationality, and diversity, which hampers document understanding. To overcome these challenges, we in this paper propose Transport Plan and Context-aware Hierarchical Topic Model (TraCo). Instead of early simple topic dependencies, we propose a transport plan dependency method. It constrains dependencies to ensure their sparsity and balance, and also regularizes topic hierarchy building with them. This improves affinity and diversity of hierarchies. We further propose a context-aware disentangled decoder. Rather than previously entangled decoding, it distributes different semantic granularity to topics at different levels by disentangled decoding. This facilitates the rationality of hierarchies. Experiments on benchmark datasets demonstrate that our method surpasses state-of-the-art baselines, effectively improving the affinity, rationality, and diversity of hierarchical topic modeling with better performance on downstream tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. The ACL Anthology Reference Corpus: A Reference Dataset for Bibliographic Research in Computational Linguistics. In Proc. of LREC.
  2. The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies. Journal of the ACM (JACM).
  3. Latent dirichlet allocation. Journal of Machine Learning Research.
  4. Learning probability measures with respect to optimal transport metrics. Proc. of NeurIPS.
  5. Neural Models for Documents with Metadata. In Proc. of ACL.
  6. A hybrid hierarchical model for multi-document summarization. In Proc. of ACL.
  7. Nonlinear Structural Equation Model Guided Gaussian Mixture Hierarchical Topic Modeling. In Proc. of ACL.
  8. Hierarchical neural topic modeling with manifold regularization. World Wide Web.
  9. Tree-structured topic modeling with nonparametric neural variational inference. In Proc. of ACL.
  10. Cuturi, M. 2013. Sinkhorn Distances: Lightspeed Computation of Optimal Transport. In Proc. of NeurIPS.
  11. Topic modeling in embedding spaces. Transactions of the Association for Computational Linguistics.
  12. Bayesian Progressive Deep Topic Model with Knowledge Informed Textual Data Coarsening Process. In Proc. of ICML.
  13. Sawtooth factorial topic embeddings guided gamma belief network. In Proc. of ICML.
  14. Differentiable Deep Clustering with Cluster Size Constraints. CoRR.
  15. Learning generative models with sinkhorn divergences. In Proc. of AISTATS.
  16. Hierarchical topic models and the nested Chinese restaurant process. Proc. of NeurIPS.
  17. Recurrent hierarchical topic-guided RNN for language generation. In Proc. of ICML.
  18. Tree-structured neural topic model. In Proc. of ACL.
  19. Modeling topic hierarchies with the recursive chinese restaurant process. In Proc. of CIKM.
  20. A hierarchical aspect-sentiment model for online reviews. In Proc. of AAAI.
  21. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  22. Auto-encoding variational bayes. In Proc. of ICLR.
  23. Lang, K. 1995. Newsweeder: Learning to filter netnews. In Proc. of ICML.
  24. Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality. In Proc. of EACL.
  25. Alleviating” Posterior Collapse”in Deep Topic Models via Policy Gradient. Proc. of NeurIPS.
  26. Topic splitting: a hierarchical topic model based on non-negative matrix factorization. Journal of Systems Science and Systems Engineering.
  27. Pointer sentinel mixture models. arXiv preprint arXiv:1609.07843.
  28. Discovering discrete latent topics with neural variational inference. In Proc. of ICML.
  29. Mixtures of hierarchical topics with pachinko allocation. In Proc. of ICML.
  30. Automatic evaluation of topic coherence. In Proc. of NAACL.
  31. Contrastive Learning for Neural Topic Model. Proc. of NeurIPS.
  32. A nested hdp for hierarchical topic models. arXiv preprint arXiv:1301.3570.
  33. Hierarchically supervised latent Dirichlet allocation. Proc. of NeurIPS.
  34. Computational optimal transport: With applications to data science. Foundations and Trends® in Machine Learning.
  35. Neural topic models for hierarchical topic detection and visualization. In Proc. of KDD.
  36. Stochastic backpropagation and approximate inference in deep generative models. In Proceedings ofthe 31th International Conference on Machine Learning.
  37. Exploring the space of topic coherence measures. In Proc. of WSDM.
  38. Improving GANs using optimal transport. arXiv preprint arXiv:1803.05573.
  39. HyHTM: Hyperbolic Geometry-based Hierarchical Topic Model. In Proc. of ACL Findings.
  40. Sinkhorn, R. 1964. A relationship between arbitrary positive matrices and doubly stochastic matrices. The annals of mathematical statistics.
  41. Autoencoding Variational Inference For Topic Models. In Proc. of ICLR.
  42. Sharing clusters among related groups: Hierarchical Dirichlet processes. Proc. of NeurIPS.
  43. Capturing greater context for question generation. In Proc. of AAAI.
  44. Visualizing data using t-SNE. Journal of machine learning research.
  45. CluHTM-semantic hierarchical topic modeling based on CluWords. In Proc. of ACL.
  46. Knowledge-aware Bayesian deep topic model. Proc. of NeurIPS.
  47. Hierarchical Neural Topic Model with Embedding Cluster and Neural Variational Inference. In Proc. of SDM.
  48. Document-topic hierarchies from document graphs. In Proc. of CIKM.
  49. InfoCTM: A Mutual Information Maximization Perspective of Cross-Lingual Topic Modeling. arXiv preprint arXiv:2304.03544.
  50. Effective neural topic modeling with embedding clustering regularization. In Proc. of ICML.
  51. Short Text Topic Modeling with Flexible Word Patterns. In Proc. of IJCNN.
  52. Discovering Topics in Long-tailed Corpora with Causal Intervention. In Proc. of ACL Findings.
  53. Learning Multilingual Topics with Neural Variational Inference. In Proc. of NLPCC.
  54. Short Text Topic Modeling with Topic Distribution Quantization and Negative Sampling Decoder. In Proc. of EMNLP.
  55. Mitigating Data Sparsity for Short Text Topic Modeling by Topic-Semantic Contrastive Learning. In Proc. of EMNLP.
  56. A Survey on Neural Topic Models: Methods, Applications, and Challenges. Research Square.
  57. Towards the TopMost: A Topic Modeling System Toolkit. arXiv preprint arXiv:2309.06908.
  58. HyperMiner: Topic Taxonomy Mining with Hyperbolic Embedding. In Proc. of NeurIPS.
  59. Nonparametric Forest-Structured Neural Topic Modeling. In Proc. of COLING.
  60. Neural Topic Model via Optimal Transport. In Proc. of ICLR.
Citations (17)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets