Potential data contamination in the LeetCode Contest benchmark evaluation
Ascertain whether the released 180-problem LeetCode Contest benchmark dataset (collected July 2023–January 2024) used to evaluate DeepSeek-Coder contains any overlap with the DeepSeek-Coder pretraining corpus that would constitute data contamination, thereby verifying the integrity of the reported evaluation results.
References
It is important to acknowledge that despite our diligent efforts to gather the most recent code questions for model testing, the possibility of data contamination cannot be entirely ruled out. We observed that the GPT-4-Turbo and DeepSeek-Coder models achieved higher scores in the LeetCode Contest held in July and August. We encourage the research community to consider the potential issue of data contamination when evaluating models in future studies using our released LeetCode data.