Assess LLM2Vec on Benchmarks Free of Pre-Training Contamination
Investigate the performance of LLM2Vec-transformed decoder-only large language models on newly designed evaluation benchmarks that are guaranteed not to overlap with the models’ pre-training corpora, in order to quantify and mitigate potential test set contamination effects.
References
We leave it to future work to investigate the performance of these models on newly designed benchmarks that are not part of their pre-training data.
                — LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
                
                (2404.05961 - BehnamGhader et al., 9 Apr 2024) in Appendix, Section "Limitations" (Data contamination from pre-training)