Description-Enhanced Label Embedding Contrastive Learning for Text Classification (2306.08817v1)
Abstract: Text Classification is one of the fundamental tasks in natural language processing, which requires an agent to determine the most appropriate category for input sentences. Recently, deep neural networks have achieved impressive performance in this area, especially Pre-trained LLMs (PLMs). Usually, these methods concentrate on input sentences and corresponding semantic embedding generation. However, for another essential component: labels, most existing works either treat them as meaningless one-hot vectors or use vanilla embedding methods to learn label representations along with model training, underestimating the semantic information and guidance that these labels reveal. To alleviate this problem and better exploit label information, in this paper, we employ Self-Supervised Learning (SSL) in model learning process and design a novel self-supervised Relation of Relation (R2) classification task for label utilization from a one-hot manner perspective. Then, we propose a novel Relation of Relation Learning Network (R2-Net) for text classification, in which text classification and R2 classification are treated as optimization targets. Meanwhile, triplet loss is employed to enhance the analysis of differences and connections among labels. Moreover, considering that one-hot usage is still short of exploiting label information, we incorporate external knowledge from WordNet to obtain multi-aspect descriptions for label semantic learning and extend R2-Net to a novel Description-Enhanced Label Embedding network (DELE) from a label embedding perspective. ...
- W. B. Dolan and C. Brockett, “Automatically constructing a corpus of sentential paraphrases,” in IWP, 2005.
- S. Kim, J.-H. Hong, I. Kang, and N. Kwak, “Semantic sentence matching with densely-connected recurrent and co-attentive information,” in AAAI, 2019, pp. 6586–6593.
- D. Chen, A. Fisch, J. Weston, and A. Bordes, “Reading wikipedia to answer open-domain questions,” in ACL, 2017, pp. 1870–1879.
- S. F. Yilmaz, E. B. Kaynak, A. Koç, H. Dibeklioğlu, and S. S. Kozat, “Multi-label sentiment analysis on 100 languages with dynamic weighting for label imbalance,” IEEE TNNLS, vol. 34, no. 1, pp. 331–343, 2023.
- F. Huang, X. Li, C. Yuan, S. Zhang, J. Zhang, and S. Qiao, “Attention-emotion-enhanced convolutional lstm for sentiment analysis,” IEEE TNNLS, vol. 33, no. 9, pp. 4332–4345, 2022.
- L. Zhu, W. Li, Y. Shi, and K. Guo, “Sentivec: Learning sentiment-context vector via kernel optimization function for sentiment analysis,” IEEE TNNLS, vol. 32, no. 6, pp. 2561–2572, 2021.
- T.-Y. Liu, “Learning to rank for information retrieval,” in Found. Trends Inf. Retr., vol. 3, 2009, pp. 225–331.
- Q. Liu, Z. Huang, Z. Huang, C. Liu, E. Chen, Y. Su, and G. Hu, “Finding similar exercises in online education systems,” in SIGKDD, 2018, pp. 1821–1830.
- I. V. Serban, A. Sordoni, Y. Bengio, A. C. Courville, and J. Pineau, “Building end-to-end dialogue systems using generative hierarchical neural network models,” in AAAI, 2016, pp. 3776–3783.
- K. Zhang, G. Lv, L. Wang, L. Wu, E. Chen, F. Wu, and X. Xie, “Drr-net: Dynamic re-read network for sentence semantic matching,” in AAAI, vol. 33, 2019, pp. 7442–7449.
- K. Zhang, G. Lv, L. Wu, E. Chen, Q. Liu, and M. Wang, “Ladra-net: Locally aware dynamic reread attention net for sentence semantic matching,” IEEE TNNLS, pp. 1–14, 2021.
- Z. Tan, J. Chen, Q. Kang, M. Zhou, A. Abusorrah, and K. Sedraoui, “Dynamic embedding projection-gated convolutional neural networks for text classification,” IEEE TNNLS, vol. 33, no. 3, pp. 973–982, 2022.
- Z. Kun, L. Guangyi, W. Le, C. Enhong, L. Qi, and W. Han, “Image-enhanced multi-level sentence representation net for natural language inference,” in IEEE ICDM, 2018, pp. 747–756.
- B. Guo, S. Han, X. Han, H. Huang, and T. Lu, “Label confusion learning to enhance text classification models,” Arxiv, vol. abs/2012.04987, 2020.
- Y. Xiong, Y. Feng, H. Wu, H. Kamigaito, and M. Okumura, “Fusing label embedding into BERT: An efficient improvement for text classification,” in Findings of ACL-IJCNLP, 2021, pp. 1743–1750.
- H. Zhang, L. Xiao, W. Chen, Y. Wang, and Y. Jin, “Multi-task label embedding for text classification,” in EMNLP, 2018, pp. 4545–4553.
- S. Gururangan, S. Swayamdipta, O. Levy, R. Schwartz, S. R. Bowman, and N. A. Smith, “Annotation artifacts in natural language inference data,” in NAACL-HLT, 2018, pp. 107–112.
- L. Xiao, X. Huang, B. Chen, and L. Jing, “Label-specific document representation for multi-label text classification,” in EMNLP-IJCNLP, 2019, pp. 466–475.
- R. Zhang, Y.-S. Wang, Y. Yang, D. Yu, T. Vu, and L. Lei, “Long-tailed extreme multi-label text classification with generated pseudo label descriptions,” Arxiv, vol. abs/2204.00958, 2022.
- K. Zhang, L. Wu, G. Lv, M. Wang, E. Chen, and S. Ruan, “Making the relation matters: Relation of relation learning network for sentence semantic matching,” in AAAI, 2021, pp. 14 411–14 419.
- J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” Arxiv, vol. abs/1810.04805, 2018.
- Y. Kim, “Convolutional neural networks for sentence classification,” Arxiv, vol. abs/1408.5882, 2014.
- J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” Arxiv, vol. abs/1412.3555, 2014.
- A. P. Parikh, O. Täckström, D. Das, and J. Uszkoreit, “A decomposable attention model for natural language inference,” in EMNLP, 2016, pp. 2249–2255.
- S. R. Bowman, G. Angeli, C. Potts, and C. D. Manning, “A large annotated corpus for learning natural language inference,” in EMNLP, 2015, p. 632–642.
- S. Iyer, N. Dandekar, and K. Csernai, “First quora dataset release: Question pairs,” 2017.
- R. Socher, A. Perelygin, J. Wu, J. Chuang, C. D. Manning, A. Y. Ng, and C. Potts, “Recursive deep models for semantic compositionality over a sentiment treebank,” in EMNLP, 2013, pp. 1631–1642.
- Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, “Roberta: A robustly optimized bert pretraining approach,” Arxiv, vol. abs/1907.11692, 2019.
- W. Xu and Y. Tan, “Semisupervised text classification by variational autoencoder,” IEEE TNNLS, vol. 31, no. 1, pp. 295–308, 2020.
- D. Zeng, K. Liu, S. Lai, G. Zhou, and J. Zhao, “Relation classification via convolutional deep neural network,” in COLING, 2014, pp. 2335–2344.
- X. Liu, L. Mou, H. Cui, Z. Lu, and S. Song, “Finding decision jumps in text classification,” Neurocomputing, vol. 371, pp. 177–187, 2020.
- L. Cai, Y. Song, T. Liu, and K. Zhang, “A hybrid bert model that incorporates label semantics via adjustive attention for multi-label text classification,” IEEE Access, vol. 8, pp. 152 183–152 192, 2020.
- A. Mueller, J. Krone, S. Romeo, S. Mansour, E. Mansimov, Y. Zhang, and D. Roth, “Label semantic aware pre-training for few-shot text classification,” arXiv preprint arXiv:2204.07128, 2022.
- M. Liu, L. Liu, J. Cao, and Q. Du, “Co-attention network with label embedding for text classification,” Neurocomputing, vol. 471, pp. 61–69, 2022.
- X. Zhu, Z. Peng, J. Guo, and S. Dietze, “Generating effective label description for label-aware sentiment classification,” Expert Systems with Applications, p. 119194, 2022.
- C. Du, Z. Chen, F. Feng, L. Zhu, T. Gan, and L. Nie, “Explicit interaction model towards text classification,” in AAAI, vol. 33, 2019, pp. 6359–6366.
- K. Rivas Rojas, G. Bustamante, A. Oncevay, and M. A. Sobrevilla Cabezudo, “Efficient strategies for hierarchical text classification: External knowledge and auxiliary tasks,” in ACL, 2020, pp. 2252–2257.
- H. Wu, S. Qin, R. Nie, J. Cao, and S. Gorbachev, “Effective collaborative representation learning for multilabel text categorization,” IEEE TNNLS, vol. 33, no. 10, pp. 5200–5214, 2022.
- X. Wang, L. Zhao, B. Liu, T. Chen, F. Zhang, and D. Wang, “Concept-based label embedding via dynamic routing for hierarchical text classification,” in ACL-IJCNLP, 2021, pp. 5010–5019.
- N. Wang, W. Zhou, and H. Li, “Contrastive transformation for self-supervised correspondence learning,” in AAAI, 2021, pp. 10 174–10 182.
- Z. Wu, S. Wang, J. Gu, M. Khabsa, F. Sun, and H. Ma, “Clear: Contrastive learning for sentence representation,” Arxiv, vol. abs/2012.15466, 2020.
- P. Khosla, P. Teterwak, C. Wang, A. Sarna, Y. Tian, P. Isola, A. Maschinot, C. Liu, and D. Krishnan, “Supervised contrastive learning,” in NeurIPS, 2020, pp. 1–13.
- T. T. Cai, J. Frankle, D. J. Schwab, and A. S. Morcos, “Are all negatives created equal in contrastive instance discrimination?” Arxiv, vol. abs/2010.06682, 2020.
- T. Chen, S. Kornblith, M. Norouzi, and G. E. Hinton, “A simple framework for contrastive learning of visual representations,” Arxiv, vol. abs/2002.05709, 2020.
- K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” Arxiv, vol. abs/1911.05722, 2020.
- K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. Girshick, “Masked autoencoders are scalable vision learners,” Arxiv, vol. abs/2111.06377, 2021.
- T. Gao, X. Yao, and D. Chen, “Simcse: Simple contrastive learning of sentence embeddings,” in EMNLP, 2021, pp. 6894–6910.
- Y. Li, P. Hu, Z. Liu, D. Peng, J. T. Zhou, and X. Peng, “Contrastive clustering,” in AAAI, 2021, pp. 8547–8555.
- M. Yang, Y. Li, Z. Huang, Z. Liu, P. Hu, and X. Peng, “Partially view-aligned representation learning with noise-robust contrastive loss,” in CVPR, 2021, pp. 1134–1143.
- M. Yang, Y. Li, P. Hu, J. Bai, J. C. Lv, and X. Peng, “Robust multi-view clustering with incomplete information,” IEEE TPAMI, 2022.
- M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer, “Deep contextualized word representations,” Arxiv, vol. abs/1802.05365, 2018.
- R. Sennrich, B. Haddow, and A. Birch, “Neural machine translation of rare words with subword units,” in ACL, 2016, p. 1715–1725.
- Q. Chen, X. Zhu, Z. Ling, S. Wei, H. Jiang, and D. Inkpen, “Enhanced lstm for natural language inference,” in ACL, 2017, pp. 1657–1668.
- K. Zhang, E. Chen, Q. Liu, C. Liu, and G. Lv, “A context-enriched neural network method for recognizing lexical entailment.” in AAAI, 2017, pp. 3127–3133.
- L. Mou, R. Men, G. Li, Y. Xu, L. Zhang, R. Yan, and Z. Jin, “Natural language inference by tree-based convolution and heuristic matching,” in ACL, 2016, pp. 130–136.
- J. Weeds, D. Clarke, J. Reffin, D. Weir, and B. Keller, “Learning to distinguish hypernyms and co-hyponyms,” in COLING, 2014, pp. 2249–2259.
- F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified embedding for face recognition and clustering,” in CVPR, 2015, pp. 815–823.
- T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in ICML, 2020, pp. 1597–1607.
- Y. Tay, A. T. Luu, and S. C. Hui, “Co-stack residual affinity networks with multi-level attention refinement for matching text sequences,” in ACL, 2018, pp. 4492–4502.
- R. Yang, J. Zhang, X. Gao, F. Ji, and H. Chen, “Simple and effective text matching with richer alignment features,” in ACL, 2019, pp. 4699–4709.
- Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut, “Albert: A lite bert for self-supervised learning of language representations,” in ICLR, 2020, pp. 1–13.
- G. Wang, C. Li, W. Wang, Y. Zhang, D. Shen, X. Zhang, R. Henao, and L. Carin, “Joint embedding of words and labels for text classification,” in ACL, 2018, pp. 2321–2331.
- M. Marelli, L. Bentivogli, M. Baroni, R. Bernardi, S. Menini, and R. Zamparelli, “Semeval-2014 task 1: Evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment,” in SemEval, 2014, pp. 1–8.
- T. Khot, A. Sabharwal, and P. Clark, “Scitail: A textual entailment dataset from science question answering,” in AAAI, 2018, pp. 5189–5197.
- J. Howard and S. Ruder, “Universal language model fine-tuning for text classification,” in ACL, 2018, p. 328–339.
- W. Lan and W. Xu, “Neural network models for paraphrase identification, semantic textual similarity, natural language inference, and question answering,” in COLING, 2018, pp. 3890–3902.
- L. v. d. Maaten and G. Hinton, “Visualizing data using t-sne,” Journal of machine learning research, vol. 9, no. Nov, pp. 2579–2605, 2008.
- Kun Zhang (353 papers)
- Le Wu (47 papers)
- Guangyi Lv (8 papers)
- Enhong Chen (242 papers)
- Shulan Ruan (10 papers)
- Jing Liu (526 papers)
- Zhiqiang Zhang (129 papers)
- Jun Zhou (370 papers)
- Meng Wang (1063 papers)