Structured Probabilistic Coding (2312.13933v5)
Abstract: This paper presents a new supervised representation learning framework, namely structured probabilistic coding (SPC), to learn compact and informative representations from input related to the target task. SPC is an encoder-only probabilistic coding technology with a structured regularization from the target space. It can enhance the generalization ability of pre-trained LLMs for better language understanding. Specifically, our probabilistic coding simultaneously performs information encoding and task prediction in one module to more fully utilize the effective information from input data. It uses variational inference in the output space to reduce randomness and uncertainty. Besides, to better control the learning process of probabilistic representations, a structured regularization is proposed to promote uniformity across classes in the latent space. With the regularization term, SPC can preserve the Gaussian structure of the latent code and achieve better coverage of the hidden space with class uniformly. Experimental results on 12 natural language understanding tasks demonstrate that our SPC effectively improves the performance of pre-trained LLMs for classification and regression. Extensive experiments show that SPC can enhance the generalization capability, robustness to label noise, and clustering quality of output representations.
- Information dropout: Learning optimal representations through noisy computation. IEEE TAPAMI, 40(12): 2897–2905.
- Deep Variational Information Bottleneck. In ICLR (Poster).
- Learning representations for neural network-based classification using the information bottleneck principle. IEEE TAPAMI, 42(9): 2225–2239.
- Maximum Entropy Information Bottleneck for Uncertainty-aware Stochastic Embedding. In CVPR Workshops, 3809–3818.
- TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification. In EMNLP (Findings), 1644–1650.
- SemEval 2018 Task 2: Multilingual Emoji Prediction. In SemEval@NAACL-HLT, 24–33.
- SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter. In SemEval@NAACL-HLT, 54–63.
- Mutual Information Neural Estimation. In ICML, volume 80, 530–539.
- SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation. In SemEval@ACL, 1–14.
- Data uncertainty learning in face recognition. In CVPR, 5710–5719.
- Support-vector networks. Machine learning, 20: 273–297.
- GoEmotions: A Dataset of Fine-Grained Emotions. In ACL, 4040–4054.
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT, 4171–4186.
- Fischer, I. S. 2020. The Conditional Entropy Bottleneck. Entropy, 22(9): 999.
- The Information Bottleneck Problem and its Applications in Machine Learning. IEEE J. Sel. Areas Inf. Theory, 1(1): 19–38.
- Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning. In ICLR.
- SemEval-2018 Task 3: Irony Detection in English Tweets. In SemEval@NAACL-HLT, 39–50.
- beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. In ICLR (Poster).
- Long short-term memory. Neural computation, 9(8): 1735–1780.
- Supervised Adversarial Contrastive Learning for Emotion Recognition in Conversations. In ACL, 10835–10852.
- VarMAE: Pre-training of Variational Masked Autoencoder for Domain-adaptive Language Understanding. In EMNLP (Findings), 6276–6286.
- DialogueCRN: Contextual Reasoning Networks for Emotion Recognition in Conversations. In ACL/IJCNLP (1), 7042–7052.
- Bag of Tricks for Efficient Text Classification. In EACL, 427–431.
- Supervised Contrastive Learning. In NeurIPS.
- Kim, T. K. 2015. T test as a parametric statistic. Korean journal of anesthesiology, 68(6): 540–546.
- Adam: A Method for Stochastic Optimization. In ICLR (Poster).
- Auto-Encoding Variational Bayes. In ICLR.
- RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR, abs/1907.11692.
- Variational Information Bottleneck for Effective Low-Resource Fine-Tuning. In ICLR.
- Adversarial Training Methods for Semi-Supervised Text Classification. In ICLR (Poster).
- SemEval-2018 Task 1: Affect in Tweets. In SemEval@NAACL-HLT, 1–17.
- SemEval-2016 Task 6: Detecting Stance in Tweets. In SemEval@NAACL-HLT, 31–41.
- Modeling Uncertainty with Hedged Instance Embeddings. In ICLR (Poster).
- Regularizing Neural Networks by Penalizing Confident Output Distributions. In ICLR (Workshop).
- MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations. In ACL, 527–536.
- Learning Unbiased Representations via Mutual Information Backpropagation. In CVPR Workshops, 2729–2738.
- DICE: Diversity in Deep Ensembles via Conditional Redundancy Adversarial Estimation. In ICLR.
- SemEval-2014 Task 9: Sentiment Analysis in Twitter. In SemEval@COLING, 73–80.
- SemEval-2022 Task 7: Identifying Plausible Clarifications of Implicit and Underspecified Phrases in Instructional Texts. In SemEval@NAACL, 1039–1049.
- Evidence for universality and cultural variation of differential emotion response patterning. Journal of personality and social psychology, 66(2): 310.
- Probabilistic Face Embeddings. In ICCV, 6901–6910.
- The information bottleneck method. arXiv preprint physics/0004057.
- Deep learning and the information bottleneck principle. In IEEE Information Theory Workshop, 1–5.
- Word Representations via Gaussian Embedding. In ICLR.
- InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective. In ICLR.
- SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval). In SemEval@NAACL-HLT, 75–86.