Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Evading Data Contamination Detection for Language Models is (too) Easy (2402.02823v2)

Published 5 Feb 2024 in cs.LG, cs.AI, cs.CL, and cs.CR

Abstract: LLMs are widespread, with their performance on benchmarks frequently guiding user preferences for one model over another. However, the vast amount of data these models are trained on can inadvertently lead to contamination with public benchmarks, thus compromising performance measurements. While recently developed contamination detection methods try to address this issue, they overlook the possibility of deliberate contamination by malicious model providers aiming to evade detection. We argue that this setting is of crucial importance as it casts doubt on the reliability of public benchmarks. To more rigorously study this issue, we propose a categorization of both model providers and contamination detection methods. This reveals vulnerabilities in existing methods that we exploit with EAL, a simple yet effective contamination technique that significantly inflates benchmark performance while completely evading current detection methods.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Jasper Dekoninck (8 papers)
  2. Maximilian Baader (20 papers)
  3. Marc Fischer (30 papers)
  4. Martin Vechev (103 papers)
  5. Mark Niklas Müller (23 papers)
Citations (14)