Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
98 tokens/sec
Gemini 2.5 Pro Premium
51 tokens/sec
GPT-5 Medium
34 tokens/sec
GPT-5 High Premium
28 tokens/sec
GPT-4o
115 tokens/sec
DeepSeek R1 via Azure Premium
91 tokens/sec
GPT OSS 120B via Groq Premium
453 tokens/sec
Kimi K2 via Groq Premium
140 tokens/sec
2000 character limit reached

Lightweight Conceptual Dictionary Learning for Text Classification Using Information Compression (2405.01584v1)

Published 28 Apr 2024 in cs.CL, cs.LG, and eess.SP

Abstract: We propose a novel, lightweight supervised dictionary learning framework for text classification based on data compression and representation. This two-phase algorithm initially employs the Lempel-Ziv-Welch (LZW) algorithm to construct a dictionary from text datasets, focusing on the conceptual significance of dictionary elements. Subsequently, dictionaries are refined considering label data, optimizing dictionary atoms to enhance discriminative power based on mutual information and class distribution. This process generates discriminative numerical representations, facilitating the training of simple classifiers such as SVMs and neural networks. We evaluate our algorithm's information-theoretic performance using information bottleneck principles and introduce the information plane area rank (IPAR) as a novel metric to quantify the information-theoretic performance. Tested on six benchmark text datasets, our algorithm competes closely with top models, especially in limited-vocabulary contexts, using significantly fewer parameters. \review{Our algorithm closely matches top-performing models, deviating by only ~2\% on limited-vocabulary datasets, using just 10\% of their parameters. However, it falls short on diverse-vocabulary datasets, likely due to the LZW algorithm's constraints with low-repetition data. This contrast highlights its efficiency and limitations across different dataset types.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. J. Li, B. Chiu, S. Shang, and L. Shao, “Neural text segmentation and its application to sentiment analysis,” IEEE Transactions on Knowledge and Data Engineering, 2020.
  2. H. Fei, D. Ji, Y. Zhang, and Y. Ren, “Topic-enhanced capsule network for multi-label emotion classification,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 1839–1848, 2020.
  3. S. Seo, J. Huang, H. Yang, and Y. Liu, “Interpretable convolutional neural networks with dual local and global attention for review rating prediction,” in Proceedings of the eleventh ACM conference on recommender systems, 2017, pp. 297–305.
  4. R. Johnson and T. Zhang, “Deep pyramid convolutional neural networks for text categorization,” in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2017, pp. 562–570.
  5. S. Guo and N. Yao, “Document vector extension for documents classification,” IEEE Transactions on Knowledge and Data Engineering, vol. 33, no. 8, pp. 3062–3074, 2019.
  6. N. Tishby, F. C. Pereira, and W. Bialek, “The information bottleneck method,” arXiv preprint physics/0004057, 2000.
  7. A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language models are unsupervised multitask learners,” 2019.
  8. M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, and L. Zettlemoyer, “Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension,” arXiv preprint arXiv:1910.13461, 2019.
  9. L. Wan, T. Alpcan, and M. Kuijper, “Interpretable dictionary learning using information theory,” in GLOBECOM 2020 IEEE Global Communications Conference, 2020.
  10. Y. Goldberg, “Neural network methods for natural language processing,” Synthesis lectures on human language technologies, vol. 10, no. 1, pp. 1–309, 2017.
  11. T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
  12. J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation,” in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532–1543.
  13. Y. Zhang and B. Wallace, “A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification,” arXiv preprint arXiv:1510.03820, 2015.
  14. Y.-P. Ruan, Q. Chen, and Z.-H. Ling, “A sequential neural encoder with latent structured description for modeling sentences,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 2, pp. 231–242, 2017.
  15. J. Mairal, J. Ponce, G. Sapiro, A. Zisserman, and F. R. Bach, “Supervised dictionary learning,” in Advances in neural information processing systems, 2009, pp. 1033–1040.
  16. J. Mairal, F. Bach, and J. Ponce, “Task-driven dictionary learning,” IEEE transactions on pattern analysis and machine intelligence, vol. 34, no. 4, pp. 791–804, 2011.
  17. J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE transactions on pattern analysis and machine intelligence, vol. 31, no. 2, pp. 210–227, 2008.
  18. N. Akhtar, F. Shafait, and A. Mian, “Discriminative bayesian dictionary learning for classification,” IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 12, pp. 2374–2388, 2016.
  19. M. J. Gangeh, A. Ghodsi, and M. S. Kamel, “Kernelized supervised dictionary learning,” IEEE Transactions on Signal Processing, vol. 61, no. 19, pp. 4753–4767, 2013.
  20. B. Fulkerson, A. Vedaldi, and S. Soatto, “Localizing objects with smart dictionaries,” in European Conference on Computer Vision.   Springer, 2008, pp. 179–192.
  21. M. Yang, L. Zhang, X. Feng, and D. Zhang, “Fisher discrimination dictionary learning for sparse representation,” in 2011 International Conference on Computer Vision.   IEEE, 2011, pp. 543–550.
  22. M. J. Gangeh, A. Ghodsi, and M. S. Kamel, “Dictionary learning in texture classification,” in International Conference Image Analysis and Recognition.   Springer, 2011, pp. 335–343.
  23. S. Tariyal, A. Majumdar, R. Singh, and M. Vatsa, “Deep dictionary learning,” IEEE Access, vol. 4, pp. 10 096–10 109, 2016.
  24. S. Mahdizadehaghdam, A. Panahi, H. Krim, and L. Dai, “Deep dictionary learning: A parametric network approach,” IEEE Transactions on Image Processing, vol. 28, no. 10, pp. 4790–4802, 2019.
  25. H. Tang, H. Liu, W. Xiao, and N. Sebe, “When dictionary learning meets deep learning: Deep dictionary learning and coding network for image recognition with limited data,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–13, 2020.
  26. A. Mensch, J. Mairal, B. Thirion, and G. Varoquaux, “Dictionary learning for massive matrix factorization,” in International Conference on Machine Learning, 2016, pp. 1737–1746.
  27. T. Kim, G. Shakhnarovich, and R. Urtasun, “Sparse coding for learning interpretable spatio-temporal primitives,” in Advances in neural information processing systems, 2010, pp. 1117–1125.
  28. B.-D. Liu, B. Shen, L. Gui, Y.-X. Wang, X. Li, F. Yan, and Y.-J. Wang, “Face recognition using class specific dictionary learning for sparse representation and collaborative representation,” Neurocomputing, vol. 204, pp. 198–210, 2016.
  29. D. Sculley and C. E. Brodley, “Compression and machine learning: A new perspective on feature space vectors,” in Data Compression Conference (DCC’06).   IEEE, 2006, pp. 332–341.
  30. D. Benedetto, E. Caglioti, and V. Loreto, “Language trees and zipping,” Physical Review Letters, vol. 88, no. 4, p. 048702, 2002.
  31. E. Frank, C. Chui, and I. H. Witten, “Text categorization using compression models,” 2000.
  32. D. P. Coutinho and M. A. Figueiredo, “Text classification using compression-based dissimilarity measures,” International Journal of Pattern Recognition and Artificial Intelligence, vol. 29, no. 05, p. 1553004, 2015.
  33. N. Kasturi and I. L. Markov, “Text ranking and classification using data compression,” in I (Still) Can’t Believe It’s Not Better! Workshop at NeurIPS 2021.   PMLR, 2022, pp. 48–53.
  34. N. Tishby and N. Zaslavsky, “Deep learning and the information bottleneck principle,” in 2015 ieee information theory workshop (itw).   IEEE, 2015, pp. 1–5.
  35. T. A. Welch, “A technique for high-performance data compression,” Computer, no. 6, pp. 8–19, 1984.
  36. J. Ziv and A. Lempel, “A universal algorithm for sequential data compression,” IEEE Transactions on information theory, vol. 23, no. 3, pp. 337–343, 1977.
  37. ——, “Compression of individual sequences via variable-rate coding,” IEEE transactions on Information Theory, vol. 24, no. 5, pp. 530–536, 1978.
  38. M. Gabrié, A. Manoel, C. Luneau, N. Macris, F. Krzakala, L. Zdeborová et al., “Entropy and mutual information in models of deep neural networks,” Advances in Neural Information Processing Systems, vol. 31, 2018.
  39. J. Lee, J. Choi, J. Mok, and S. Yoon, “Reducing information bottleneck for weakly supervised semantic segmentation,” Advances in Neural Information Processing Systems, vol. 34, pp. 27 408–27 421, 2021.
  40. Z. Goldfeld, E. Van Den Berg, K. Greenewald, I. Melnyk, N. Nguyen, B. Kingsbury, and Y. Polyanskiy, “Estimating information flow in deep neural networks,” in International Conference on Machine Learning.   PMLR, 2019, pp. 2299–2308.
  41. T. Tang Nguyen and J. Choi, “Markov information bottleneck to improve information flow in stochastic neural networks,” Entropy, vol. 21, no. 10, p. 976, 2019.
  42. A. Kolchinsky, B. D. Tracey, and D. H. Wolpert, “Nonlinear information bottleneck,” Entropy, vol. 21, no. 12, p. 1181, 2019.
  43. M. I. Belghazi, A. Baratin, S. Rajeswar, S. Ozair, Y. Bengio, A. Courville, and R. D. Hjelm, “Mine: mutual information neural estimation,” arXiv preprint arXiv:1801.04062, 2018.
  44. Q. Zhang and B. Li, “Discriminative k-svd for dictionary learning in face recognition,” in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.   IEEE, 2010, pp. 2691–2698.
  45. C. Zhou, C. Sun, Z. Liu, and F. Lau, “A c-lstm neural network for text classification,” arXiv preprint arXiv:1511.08630, 2015.
  46. B. Jang, M. Kim, G. Harerimana, S.-u. Kang, and J. W. Kim, “Bi-lstm model to increase accuracy in text classification: combining word2vec cnn and attention mechanism,” Applied Sciences, vol. 10, no. 17, p. 5841, 2020.
  47. S. Lai, L. Xu, K. Liu, and J. Zhao, “Recurrent convolutional neural networks for text classification,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29, no. 1, 2015.
  48. Z. Lin, M. Feng, C. N. d. Santos, M. Yu, B. Xiang, B. Zhou, and Y. Bengio, “A structured self-attentive sentence embedding,” arXiv preprint arXiv:1703.03130, 2017.
  49. R. Socher, A. Perelygin, J. Wu, J. Chuang, C. D. Manning, A. Y. Ng, and C. Potts, “Recursive deep models for semantic compositionality over a sentiment treebank,” in Proceedings of the 2013 conference on empirical methods in natural language processing, 2013, pp. 1631–1642.
  50. A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, “Bag of tricks for efficient text classification,” arXiv preprint arXiv:1607.01759, 2016.
  51. L. Yao, C. Mao, and Y. Luo, “Graph convolutional networks for text classification,” in Proceedings of the AAAI conference on artificial intelligence, vol. 33, no. 01, 2019, pp. 7370–7377.
  52. G. V. Cormack, J. M. Gómez Hidalgo, and E. P. Sánz, “Spam filtering for short messages,” in Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, 2007, pp. 313–320.
  53. H. Ahmed, I. Traore, and S. Saad, “Detecting opinion spams and fake news using text classification,” Security and Privacy, vol. 1, no. 1, p. e9, 2018.
  54. R. Socher, A. Perelygin, J. Wu, J. Chuang, C. Manning, A. Ng, and C. Potts, “Parsing With Compositional Vector Grammars,” in EMNLP, 2013.
  55. X. Zhang, J. Zhao, and Y. LeCun, “Character-level convolutional networks for text classification,” Advances in neural information processing systems, vol. 28, 2015.
  56. B. Pang and L. Lee, “Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales,” arXiv preprint cs/0506075, 2005.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube