Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Mapping global dynamics of benchmark creation and saturation in artificial intelligence (2203.04592v4)

Published 9 Mar 2022 in cs.AI, cs.CL, and cs.CV

Abstract: Benchmarks are crucial to measuring and steering progress in AI. However, recent studies raised concerns over the state of AI benchmarking, reporting issues such as benchmark overfitting, benchmark saturation and increasing centralization of benchmark dataset creation. To facilitate monitoring of the health of the AI benchmarking ecosystem, we introduce methodologies for creating condensed maps of the global dynamics of benchmark creation and saturation. We curated data for 3765 benchmarks covering the entire domains of computer vision and natural language processing, and show that a large fraction of benchmarks quickly trended towards near-saturation, that many benchmarks fail to find widespread utilization, and that benchmark performance gains for different AI tasks were prone to unforeseen bursts. We analyze attributes associated with benchmark popularity, and conclude that future benchmarks should emphasize versatility, breadth and real-world utility.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Simon Ott (12 papers)
  2. Adriano Barbosa-Silva (2 papers)
  3. Kathrin Blagec (8 papers)
  4. Jan Brauner (9 papers)
  5. Matthias Samwald (36 papers)
Citations (29)

Summary

We haven't generated a summary for this paper yet.