Deep Fusion: Capturing Dependencies in Contrastive Learning via Transformer Projection Heads (2403.18681v2)

Published 27 Mar 2024 in cs.LG and cs.AI

Abstract: Contrastive Learning (CL) has emerged as a powerful method for training feature extraction models using unlabeled data. Recent studies suggest that incorporating a linear projection head post-backbone significantly enhances model performance. In this work, we investigate the use of a transformer model as a projection head within the CL framework, aiming to exploit the transformer's capacity for capturing long-range dependencies across embeddings to further improve performance. Our key contributions are fourfold: First, we introduce a novel application of transformers in the projection head role for contrastive learning, marking the first endeavor of its kind. Second, our experiments reveal a compelling "Deep Fusion" phenomenon where the attention mechanism progressively captures the correct relational dependencies among samples from the same class in deeper layers. Third, we provide a theoretical framework that explains and supports this "Deep Fusion" behavior. Finally, we demonstrate through experimental results that our model achieves superior performance compared to the existing approach of using a feed-forward layer.

PDF HTML Abstract

Summarize Bookmark Chat (Pro)

References (31)

Authors (2)

Huanran Li (3 papers)
Daniel Pimentel-Alarcón (7 papers)

Deep Fusion: Capturing Dependencies in Contrastive Learning via Transformer Projection Heads (2403.18681v2)

Related Papers