Learning Label Hierarchy with Supervised Contrastive Learning (2402.00232v1)
Abstract: Supervised contrastive learning (SCL) frameworks treat each class as independent and thus consider all classes to be equally important. This neglects the common scenario in which label hierarchy exists, where fine-grained classes under the same category show more similarity than very different ones. This paper introduces a family of Label-Aware SCL methods (LASCL) that incorporates hierarchical information to SCL by leveraging similarities between classes, resulting in creating a more well-structured and discriminative feature space. This is achieved by first adjusting the distance between instances based on measures of the proximity of their classes with the scaled instance-instance-wise contrastive. An additional instance-center-wise contrastive is introduced to move within-class examples closer to their centers, which are represented by a set of learnable label parameters. The learned label parameters can be directly used as a nearest neighbor classifier without further finetuning. In this way, a better feature representation is generated with improvements of intra-cluster compactness and inter-cluster separation. Experiments on three datasets show that the proposed LASCL works well on text classification of distinguishing a single label among multi-labels, outperforming the baseline supervised approaches. Our code is publicly available.
- Fine-grained category discovery under coarse-grained supervision with hierarchical weighted self-contrastive learning. EMNLP 2022.
- Dbpedia: A nucleus for a web of open data. In The Semantic Web: 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007+ ASWC 2007, Busan, Korea, November 11-15, 2007. Proceedings, pages 722–735. Springer.
- Unsupervised learning of visual features by contrasting cluster assignments. Advances in neural information processing systems, 33:9912–9924.
- Hyperbolic interaction model for hierarchical multi-label classification. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 7496–7503.
- Hierarchy-aware label semantics matching network for hierarchical text classification. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4370–4379.
- Contrastnet: A contrastive learning framework for few-shot text classification. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 10492–10500.
- CIL: Contrastive instance learning framework for distantly supervised relation extraction. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 6191–6200, Online. Association for Computational Linguistics.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR.
- Conditional supervised contrastive learning for fair text classification. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 2736–2756, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- DiffCSE: Difference-based contrastive learning for sentence embeddings. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4207–4218, Seattle, United States. Association for Computational Linguistics.
- GoEmotions: A dataset of fine-grained emotions. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4040–4054, Online. Association for Computational Linguistics.
- SimCSE: Simple contrastive learning of sentence embeddings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6894–6910, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Weifeng Ge. 2018. Deep metric learning with hierarchical triplet loss. In Proceedings of the European Conference on Computer Vision (ECCV), pages 269–285.
- DeCLUTR: Deep contrastive learning for unsupervised textual representations. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 879–895, Online. Association for Computational Linguistics.
- Eleonora Giunchiglia and Thomas Lukasiewicz. 2020. Coherent hierarchical multi-label classification networks. Advances in neural information processing systems, 33:9662–9673.
- Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33:21271–21284.
- Augmenting data with mixup for sentence classification: An empirical study. arXiv preprint arXiv:1905.08941.
- Hierarchical relation extraction with coarse-to-fine grained attention. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2236–2245.
- Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738.
- Supervised contrastive learning. Advances in neural information processing systems, 33:18661–18673.
- Hdltex: Hierarchical deep learning for text classification. In 2017 16th IEEE international conference on machine learning and applications (ICMLA), pages 364–371. IEEE.
- Ken Lang. 1995. Newsweeder: Learning to filter netnews. In Machine learning proceedings 1995, pages 331–339. Elsevier.
- HiCLRE: A hierarchical contrastive learning framework for distantly supervised relation extraction. In Findings of the Association for Computational Linguistics: ACL 2022, pages 2567–2578, Dublin, Ireland. Association for Computational Linguistics.
- Prototypical contrastive learning of unsupervised representations. In International Conference on Learning Representations.
- An effective deployment of contrastive learning in multi-label text classification. In Findings of the Association for Computational Linguistics: ACL 2023, pages 8730–8744.
- Wanqiu Long and Bonnie Webber. 2022. Facilitating contrastive learning of discourse relational senses by exploiting the hierarchy of sense relations. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 10704–10716, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Mikołaj Małkiński and Jacek Mańdziuk. 2022. Multi-label contrastive learning for abstract visual reasoning. IEEE Transactions on Neural Networks and Learning Systems.
- Coco-lm: Correcting and contrasting text sequences for language model pretraining. Advances in Neural Information Processing Systems, 34:23102–23114.
- Blockout: Dynamic model selection for hierarchical deep networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2583–2591.
- Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748.
- Modeling label space interactions in multi-label classification using box embeddings. ICLR 2022 Poster.
- Ashwin Pulijala and Susan Gauch. 2004. Hierarchical text classification. In International Conference on Cybernetics and Information Technologies, Systems and Applications: CITSA, volume 1, pages 257–262.
- Nils Rethmeier and Isabelle Augenstein. 2023. A primer on contrastive pretraining in language processing: Methods, lessons learned, and perspectives. ACM Computing Surveys, 55(10):1–17.
- Supcl-seq: Supervised contrastive learning for downstream optimized sequence representations.
- Robert R. Sokal and F. James Rohlf. 1962. The comparison of dendrograms by objective methods. Taxon, 11(2):33–40.
- Mixup-transformer: dynamic data augmentation for nlp tasks. COLING.
- Varsha Suresh and Desmond Ong. 2021. Not all negatives are equal: Label-aware contrastive loss for fine-grained text classification. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 4381–4394, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Saliencymix: A saliency guided data augmentation strategy for better regularization. In International Conference on Learning Representations.
- Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-sne. Journal of machine learning research, 9(11).
- Learning hierarchical similarity metrics. In 2012 IEEE conference on computer vision and pattern recognition, pages 2280–2287. IEEE.
- Incorporating hierarchy into text encoder: a contrastive learning approach for hierarchical text classification. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7109–7119, Dublin, Ireland. Association for Computational Linguistics.
- Jason Wei and Kai Zou. 2019. EDA: Easy data augmentation techniques for boosting performance on text classification tasks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 6382–6388, Hong Kong, China. Association for Computational Linguistics.
- Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771.
- Clear: Contrastive learning for sentence representation. arXiv preprint arXiv:2012.15466.
- Unsupervised data augmentation for consistency training. Advances in neural information processing systems, 33:6256–6268.
- Learning structured representations by embedding class hierarchy. In The Eleventh International Conference on Learning Representations.
- Pairwise supervised contrastive learning of sentence representations. EMNLP 2021.
- Use all the labels: A hierarchical multi-label contrastive learning framework. In CVPR.
- La-hcn: label-based attention for hierarchical multi-label text classification neural network. Expert Systems with Applications, 187:115922.
- Hierarchy-aware global model for hierarchical text classification. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1106–1117.