Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SoccerNet: A Scalable Dataset for Action Spotting in Soccer Videos (1804.04527v2)

Published 12 Apr 2018 in cs.CV

Abstract: In this paper, we introduce SoccerNet, a benchmark for action spotting in soccer videos. The dataset is composed of 500 complete soccer games from six main European leagues, covering three seasons from 2014 to 2017 and a total duration of 764 hours. A total of 6,637 temporal annotations are automatically parsed from online match reports at a one minute resolution for three main classes of events (Goal, Yellow/Red Card, and Substitution). As such, the dataset is easily scalable. These annotations are manually refined to a one second resolution by anchoring them at a single timestamp following well-defined soccer rules. With an average of one event every 6.9 minutes, this dataset focuses on the problem of localizing very sparse events within long videos. We define the task of spotting as finding the anchors of soccer events in a video. Making use of recent developments in the realm of generic action recognition and detection in video, we provide strong baselines for detecting soccer events. We show that our best model for classifying temporal segments of length one minute reaches a mean Average Precision (mAP) of 67.8%. For the spotting task, our baseline reaches an Average-mAP of 49.7% for tolerances $\delta$ ranging from 5 to 60 seconds. Our dataset and models are available at https://silviogiancola.github.io/SoccerNet.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Silvio Giancola (47 papers)
  2. Mohieddine Amine (1 paper)
  3. Tarek Dghaily (1 paper)
  4. Bernard Ghanem (256 papers)
Citations (161)

Summary

  • The paper introduces a scalable benchmark dataset for action spotting by annotating 500 full soccer games with 6,637 refined temporal events.
  • It leverages state-of-the-art feature representations and pooling techniques, achieving 67.8% mAP for classification and 49.7% Average-mAP for spotting.
  • The dataset’s design enables minimal human refinement, promoting automated sports analytics and inspiring advanced research in sparse event detection.

SoccerNet: A Scalable Dataset for Action Spotting in Soccer Videos

SoccerNet introduces a new benchmark for action spotting in soccer videos, presenting significant contributions to the field of sports analytics and video understanding. The dataset is composed of 500 complete soccer games from six major European leagues, spanning three seasons from 2014 to 2017, with an overall duration of 764 hours. The paper focuses on addressing the challenge of localizing sparse events within long soccer videos, which is critical for sports analytics.

Dataset and Methodology

The SoccerNet dataset includes 6,637 temporal annotations across three primary classes of events: Goal, Yellow/Red Card, and Substitution. These annotations are initially parsed from online sources at a one-minute resolution and later refined to a one-second resolution, adhering to well-defined soccer rules. The dataset's scalability is emphasized, as annotations can be obtained automatically, with minimal additional human intervention required for refinement.

Recent advances in generic action recognition are leveraged to establish baselines for detecting soccer events. Specifically, the authors use various state-of-the-art feature representations and pooling techniques to create robust classifiers for video chunk classification and event spotting tasks. The paper's experimental section details strong baseline results, reflecting a mean Average Precision (mAP) of 67.8% for the classification task and an Average-mAP of 49.7% for spotting events over predefined temporal tolerances.

Implications and Future Directions

The introduction of SoccerNet has practical implications for automatic sports analytics and highlights generation from soccer broadcasts. The scalability of the dataset provides an opportunity for widespread use among researchers and commercial entities, particularly those focused on automated sports understanding.

From a theoretical standpoint, SoccerNet poses new challenges for sparse event detection in lengthy untrimmed videos. This introduces the potential for novel methods in action localization, encouraging advancements in deep learning techniques for understanding complex video data.

Future research could explore models that incorporate richer semantic context and causal relationships between events. Furthermore, leveraging audio tracks from broadcasts could enrich video analysis with sentiment cues, aiding a more holistic understanding of game dynamics.

Overall, SoccerNet stands as a pivotal dataset for the sports analytics domain, offering a comprehensive benchmark for evaluating methodologies aimed at sparse event spotting in soccer videos. With possibilities for expansion and enhancement, SoccerNet will likely stimulate continued innovation in video-based sports analysis and broader multimedia understanding.