Lightweight Conceptual Dictionary Learning for Text Classification Using Information Compression (2405.01584v1)

Published 28 Apr 2024 in cs.CL, cs.LG, and eess.SP

Abstract: We propose a novel, lightweight supervised dictionary learning framework for text classification based on data compression and representation. This two-phase algorithm initially employs the Lempel-Ziv-Welch (LZW) algorithm to construct a dictionary from text datasets, focusing on the conceptual significance of dictionary elements. Subsequently, dictionaries are refined considering label data, optimizing dictionary atoms to enhance discriminative power based on mutual information and class distribution. This process generates discriminative numerical representations, facilitating the training of simple classifiers such as SVMs and neural networks. We evaluate our algorithm's information-theoretic performance using information bottleneck principles and introduce the information plane area rank (IPAR) as a novel metric to quantify the information-theoretic performance. Tested on six benchmark text datasets, our algorithm competes closely with top models, especially in limited-vocabulary contexts, using significantly fewer parameters. \review{Our algorithm closely matches top-performing models, deviating by only ~2\% on limited-vocabulary datasets, using just 10\% of their parameters. However, it falls short on diverse-vocabulary datasets, likely due to the LZW algorithm's constraints with low-repetition data. This contrast highlights its efficiency and limitations across different dataset types.

References (56)

Collections

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Lightweight Conceptual Dictionary Learning for Text Classification Using Information Compression (2405.01584v1)

Collections

Summary

Follow-up Questions

Authors (4)

Tweets

Don't miss out on important new AI/ML research

Lightweight Conceptual Dictionary Learning for Text Classification Using Information Compression (2405.01584v1)

Collections

Summary

Follow-up Questions

Related Papers

Authors (4)

Tweets

Don't miss out on important new AI/ML research