Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

KDSTM: Neural Semi-supervised Topic Modeling with Knowledge Distillation (2307.01878v2)

Published 4 Jul 2023 in cs.CL and cs.AI

Abstract: In text classification tasks, fine tuning pretrained LLMs like BERT and GPT-3 yields competitive accuracy; however, both methods require pretraining on large text datasets. In contrast, general topic modeling methods possess the advantage of analyzing documents to extract meaningful patterns of words without the need of pretraining. To leverage topic modeling's unsupervised insights extraction on text classification tasks, we develop the Knowledge Distillation Semi-supervised Topic Modeling (KDSTM). KDSTM requires no pretrained embeddings, few labeled documents and is efficient to train, making it ideal under resource constrained settings. Across a variety of datasets, our method outperforms existing supervised topic modeling methods in classification accuracy, robustness and efficiency and achieves similar performance compare to state of the art weakly supervised text classification methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (59)
  1. The Cambridge handbook of endangered languages. Cambridge University Press, 2011.
  2. Machine learning in finance: A topic modeling approach. European Financial Management, 28(3):744–770, 2022.
  3. Nonparametric spherical topic modeling with word embeddings, 2016.
  4. Identifying patterns of associated-conditions through topic models of electronic medical records. CoRR, abs/1711.10960, 2017. URL http://arxiv.org/abs/1711.10960.
  5. Supervised topic models, 2010.
  6. Latent dirichlet allocation. the Journal of machine Learning research, 3:993–1022, 2003.
  7. The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies, 2009.
  8. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  9. Decoupling sparsity and smoothness in the dirichlet variational autoencoder topic model. Journal of Machine Learning Research, 20(131):1–27, 2019. URL http://jmlr.org/papers/v20/18-569.html.
  10. Multilingual alignment of contextual word representations. arXiv preprint arXiv:2002.03518, 2020.
  11. Cross-layer distillation with semantic calibration, 2021.
  12. End-to-end learning of lda by mirror-descent back propagation over a deep architecture, 2015.
  13. Improving sequence-to-sequence learning via optimal transport, 2019.
  14. Darkrank: Accelerating deep metric learning via cross sample similarities transfer. In AAAI, 2018.
  15. Empirical evaluation of gated recurrent neural networks on sequence modeling, 2014.
  16. Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transportation distances, 2013.
  17. Hyperspherical variational auto-encoders. arXiv preprint arXiv:1804.00891, 2018.
  18. Topicrnn: A recurrent neural network with long-range semantic dependency, 2017.
  19. Topic modeling in embedding spaces. Transactions of the Association for Computational Linguistics, 8:439–453, 2020.
  20. Combining knowledge graph and word embeddings for spherical topic modeling. IEEE Transactions on Neural Networks and Learning Systems, pp.  1–15, 2021. doi: 10.1109/TNNLS.2021.3112045.
  21. Weakly semi-supervised neural topic models. 2019 ICLR Workshop, 2019.
  22. Generating sentences by editing prototypes, 2018.
  23. Distilling the knowledge in a neural network, 2015.
  24. Improving neural topic models using knowledge distillation, 2020.
  25. Siamese network-based supervised topic modeling. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp.  4652–4662, 2018.
  26. Latent dirichlet allocation (lda) and topic modeling: models, applications, a survey, 2018.
  27. Adam: A method for stochastic optimization, 2017.
  28. Semi-supervised learning with deep generative models. In Advances in neural information processing systems, pp. 3581–3589, 2014.
  29. Disclda: Discriminative learning for dimensionality reduction and classification. In Proceedings of the 21st International Conference on Neural Information Processing Systems, NIPS’08, pp.  897–904, Red Hook, NY, USA, 2008. Curran Associates Inc. ISBN 9781605609492.
  30. Ken Lang. Newsweeder: Learning to filter netnews. In Machine Learning Proceedings 1995, pp.  331–339. Elsevier, 1995.
  31. Neural variational correlated topic modeling. In The World Wide Web Conference, pp.  1142–1152, 2019.
  32. Weakly-supervised hierarchical text classification, 2018.
  33. Spherical text embedding, 2019.
  34. Discovering discrete latent topics with neural variational inference, 2018.
  35. Learning to specialize with knowledge distillation for visual question answering. Advances in neural information processing systems, 31, 2018.
  36. Probabilistic knowledge transfer for lightweight deep representation learning. IEEE Transactions on Neural Networks and Learning Systems, 32(5):2030–2039, 2020.
  37. Alp-kd: Attention-based layer projection for knowledge distillation, 2020.
  38. Labeled lda: A supervised topic model for credit attribution in multi-labeled corpora. In Proceedings of the 2009 conference on empirical methods in natural language processing, pp.  248–256, 2009.
  39. Reutterer Reisenbichler, M. Topic modeling in marketing: recent advances and research opportunities. J Bus Econ, 89, 2019. URL https://doi.org/10.1007/s11573-018-0915-7.
  40. Spherical topic models. In ICML, 2010.
  41. The structural topic model and applied social science. Neural Information Processing Society, 2013.
  42. Fitnets: Hints for thin deep nets, 2015.
  43. Alex Sherstinsky. Fundamentals of recurrent neural network (rnn) and long short-term memory (lstm) network. Physica D: Nonlinear Phenomena, 404:132306, Mar 2020. ISSN 0167-2789. doi: 10.1016/j.physd.2019.132306. URL http://dx.doi.org/10.1016/j.physd.2019.132306.
  44. Super-convergence: Very fast training of neural networks using large learning rates, 2018.
  45. Arnetminer: Extraction and mining of academic social networks. In KDD’08, pp.  990–998, 2008.
  46. A combination approach to web user profiling. ACM TKDD, 5(1):1–44, 2010.
  47. Hierarchical dirichlet processes. Journal of the american statistical association, 101(476):1566–1581, 2006.
  48. A survey on optimal transport for machine learning: Theory and applications, 2021.
  49. Similarity-preserving knowledge distillation, 2019.
  50. Xinyi Wang and Yi Yang. Neural topic model with attention for supervised learning. In Silvia Chiappa and Roberto Calandra (eds.), Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume 108 of Proceedings of Machine Learning Research, pp.  1147–1156. PMLR, 26–28 Aug 2020a. URL https://proceedings.mlr.press/v108/wang20c.html.
  51. Xinyi Wang and Yi Yang. Neural topic model with attention for supervised learning. In International Conference on Artificial Intelligence and Statistics, pp.  1147–1156. PMLR, 2020b.
  52. Distilled wasserstein learning for word embedding and topic modeling. arXiv preprint arXiv:1809.04705, 2018.
  53. Spherical latent spaces for stable variational autoencoders, 2018.
  54. vONTSS: vMF based semi-supervised neural topic modeling with optimal transport. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (eds.), Findings of the Association for Computational Linguistics: ACL 2023, pp.  4433–4457, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.findings-acl.271. URL https://aclanthology.org/2023.findings-acl.271.
  55. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, 32, 2019.
  56. Character-level convolutional networks for text classification, 2016.
  57. Neural topic model via optimal transport, 2020a.
  58. Targeted feedback generation for constructed-response questions. 2021 AAAI Workshop, 2020b.
  59. Medlda: Maximum margin supervised topic models. Journal of Machine Learning Research, 13(74):2237–2278, 2012. URL http://jmlr.org/papers/v13/zhu12a.html.
Citations (2)

Summary

We haven't generated a summary for this paper yet.