Open-world Multi-label Text Classification with Extremely Weak Supervision (2407.05609v1)

Published 8 Jul 2024 in cs.CL

Abstract: We study open-world multi-label text classification under extremely weak supervision (XWS), where the user only provides a brief description for classification objectives without any labels or ground-truth label space. Similar single-label XWS settings have been explored recently, however, these methods cannot be easily adapted for multi-label. We observe that (1) most documents have a dominant class covering the majority of content and (2) long-tail labels would appear in some documents as a dominant class. Therefore, we first utilize the user description to prompt a LLM for dominant keyphrases of a subset of raw documents, and then construct a (initial) label space via clustering. We further apply a zero-shot multi-label classifier to locate the documents with small top predicted scores, so we can revisit their dominant keyphrases for more long-tail labels. We iterate this process to discover a comprehensive label space and construct a multi-label classifier as a novel method, X-MLClass. X-MLClass exhibits a remarkable increase in ground-truth label space coverage on various datasets, for example, a 40% improvement on the AAPD dataset over topic modeling and keyword extraction methods. Moreover, X-MLClass achieves the best end-to-end multi-label classification accuracy.

References (36)

Authors (6)

Xintong Li (48 papers)
Jinya Jiang (1 paper)
Ria Dharmani (2 papers)
Jayanth Srinivasa (23 papers)
Gaowen Liu (60 papers)
Jingbo Shang (141 papers)

Citations (1)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Open-world Multi-label Text Classification with Extremely Weak Supervision (2407.05609v1)

Summary

Related Papers