Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

InfoCSE: Information-aggregated Contrastive Learning of Sentence Embeddings (2210.06432v3)

Published 8 Oct 2022 in cs.CL

Abstract: Contrastive learning has been extensively studied in sentence embedding learning, which assumes that the embeddings of different views of the same sentence are closer. The constraint brought by this assumption is weak, and a good sentence representation should also be able to reconstruct the original sentence fragments. Therefore, this paper proposes an information-aggregated contrastive learning framework for learning unsupervised sentence embeddings, termed InfoCSE. InfoCSE forces the representation of [CLS] positions to aggregate denser sentence information by introducing an additional Masked LLM task and a well-designed network. We evaluate the proposed InfoCSE on several benchmark datasets w.r.t the semantic text similarity (STS) task. Experimental results show that InfoCSE outperforms SimCSE by an average Spearman correlation of 2.60% on BERT-base, and 1.77% on BERT-large, achieving state-of-the-art results among unsupervised sentence representation learning methods. Our code are available at https://github.com/caskcsg/sentemb/tree/main/InfoCSE.

InfoCSE: A New Direction in Contrastive Sentence Embeddings

The paper presents a novel approach to unsupervised sentence representation learning through an innovative contrastive learning framework, InfoCSE. This framework addresses limitations inherent in previous models like SimCSE by integrating a compelling sentence reconstruction task in addition to the base contrastive objective. This integration provides a richer semantic representation by aggregating information using a novel auxiliary network specifically tailored to avoid traditional pitfalls observed in previous methodologies.

Core Contributions and Methodology

InfoCSE introduces a novel auxiliary network to tackle the over-update issue caused by directly optimizing the Masked LLM (MLM) objective in contrastive learning frameworks. Unlike SimCSE, which experienced performance drops in semantic textual similarity (STS) tasks when incorporating MLM objectives, InfoCSE proposes a specialized network that strategically limits the gradient updates propagated back to the primary BERT encoder. This architectural innovation enhances the usability of the [CLS] embeddings in both reconstructive and contrastive tasks by disentangling these objectives through an auxiliary network that back-propagates only through [CLS], thus stabilizing the encoder's parameter updates and enhancing semantic richness.

InfoCSE's architecture includes an auxiliary eight-layer transformer that integrates outputs from a frozen six-layer subset of the BERT encoder with the [CLS] vector from its twelfth layer. This design choice ensures that sentence representations derive dense semantic information without overwhelming the primary model with the variance introduced by MLM optimization. The joint learning process involves a controlled back-propagation mechanism facilitated by gradient detachments from certain layers, critical in achieving precise updates without destabilizing the embedding framework.

Experimental Validation

The experimental evaluation demonstrates that InfoCSE achieves state-of-the-art performance in unsupervised sentence representation tasks. In semantic textual similarity evaluations, InfoCSE surpassed SimCSE, achieving an average Spearman correlation improvement of 2.60% on the BERT-base model and 1.77% on the BERT-large model across seven STS datasets. Moreover, InfoCSE outperformed competing models on the BEIR benchmark, indicating superior generalization to diverse retrieval scenarios.

Extensive ablation studies confirm the auxiliary network's crucial role, underscoring the importance of pre-training this component to initialize the system effectively and ensure robust joint learning outcomes. Additionally, varying the MLM mask rate and employing gradient detachment were empirically validated as impactful hyperparameters, fine-tuning the delicate balance required for optimal semantic encoding.

Implications and Future Directions

The InfoCSE framework represents a significant stride in the contrastive learning hierarchy by harmonizing reconstructive and discriminative objectives within a unified semantic space. This paradigm shift not only refines the quality of sentence embeddings but opens novel avenues for auxiliary objective integration. The compatibility demonstrated with diverse learning objectives, such as MLM and Replaced Token Detection (RTD), hints at potential collaborative optimization strategies that can fortify semantic encoding even further in contrastive contexts.

Future research may expand upon this framework by exploring different auxiliary objectives and fine-tuning pre-training techniques to enhance transfer learning applications. Additionally, the principles poised by InfoCSE could serve as a catalyst for adapting supervised sentence embedding frameworks to leverage nuanced semantic interrelations without compromising the independent trainable objectives present in supervised settings.

Thus, InfoCSE offers a robust blueprint for enhancing unsupervised sentence embeddings through adept architecture designs and learning processes that capture and maintain semantically rich and task-agnostic sentence representations.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Xing Wu (69 papers)
  2. Chaochen Gao (10 papers)
  3. Zijia Lin (43 papers)
  4. Jizhong Han (48 papers)
  5. Zhongyuan Wang (105 papers)
  6. Songlin Hu (80 papers)
Citations (29)
Github Logo Streamline Icon: https://streamlinehq.com