Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

Gemini 2.5 Flash 94 tok/s

Gemini 2.5 Pro 57 tok/s Pro

GPT-5 Medium 28 tok/s

GPT-5 High 38 tok/s Pro

GPT-4o 100 tok/s

GPT OSS 120B 461 tok/s Pro

Kimi K2 208 tok/s Pro

2000 character limit reached

From Broadcast to Minimap: Achieving State-of-the-Art SoccerNet Game State Reconstruction (2504.06357v1)

Published 8 Apr 2025 in cs.CV and cs.LG

Abstract: Game State Reconstruction (GSR), a critical task in Sports Video Understanding, involves precise tracking and localization of all individuals on the football field-players, goalkeepers, referees, and others - in real-world coordinates. This capability enables coaches and analysts to derive actionable insights into player movements, team formations, and game dynamics, ultimately optimizing training strategies and enhancing competitive advantage. Achieving accurate GSR using a single-camera setup is highly challenging due to frequent camera movements, occlusions, and dynamic scene content. In this work, we present a robust end-to-end pipeline for tracking players across an entire match using a single-camera setup. Our solution integrates a fine-tuned YOLOv5m for object detection, a SegFormer-based camera parameter estimator, and a DeepSORT-based tracking framework enhanced with re-identification, orientation prediction, and jersey number recognition. By ensuring both spatial accuracy and temporal consistency, our method delivers state-of-the-art game state reconstruction, securing first place in the SoccerNet Game State Reconstruction Challenge 2024 and significantly outperforming competing methods.

Collections

Summary

The paper introduces a novel pipeline integrating YOLOv5m, SegFormer, and DeepSORT to accurately reconstruct soccer game states from broadcast footage.
It employs advanced techniques in camera calibration and team detection, achieving a GS-HOTA score of 63.81 in the SoccerNet GSR Challenge.
The methodology offers practical applications in professional sports analytics by enabling precise player tracking and tactical evaluation.

From Broadcast to Minimap: Achieving State-of-the-Art SoccerNet Game State Reconstruction

Introduction

The paper "From Broadcast to Minimap: Achieving State-of-the-Art SoccerNet Game State Reconstruction" presents a comprehensive solution for Game State Reconstruction (GSR) in sports analytics, specifically focusing on soccer. This task involves tracking and identifying players, goalkeepers, referees, and other field participants in real-world coordinates from broadcast footage using a single-camera setup. The proposed pipeline achieves state-of-the-art performance in this domain, as evidenced by its successful deployment in the 2024 SoccerNet Game State Reconstruction Challenge.

GSR from video streams presents unique challenges, including occlusions, camera movements, and distinguishing between visually similar players. The authors address these challenges by integrating several cutting-edge approaches: enhanced object detection using YOLOv5m, a SegFormer-based camera parameter estimator, and a DeepSORT-based tracking framework coupled with re-identification (ReID) embeddings and jersey number recognition.

Methodology

The methodology is structured into three primary stages: raw tracking, team detection, and post-processing. Each stage contributes distinct elements essential for effective game state reconstruction.

Raw Tracking Stage

The raw tracking stage involves the use of YOLOv5m for object detection, focusing on optimizing detection for soccer-specific objects, i.e., players and the ball. This stage performs initial processing to generate player tracks in real-time, harnessing pitch localization and jersey number recognition.

Figure 1: Raw tracking stage performs object detection, pitch localization, collects information about players teams required on consequent stages, Re-ID embeddings, jersey numbers and then merges all collected data into preliminary object tracks using the DeepSort-based tracking.

The pipeline employs a customized camera parameter estimation model based on SegFormer architecture. This model predicts camera parameters (position, orientation, field of view) and is refined using detected keypoints (field markings) for improved accuracy in mapping objects from image to real-world coordinates.

Team Detection Stage

Team detection aggregates information about player uniforms and roles, clustering into team-specific embeddings through unsupervised methods like k-means.

Figure 2: Team Detection Process. (a) Frames are clustered into three main groups: the two largest clusters (left and right teams) and the referee cluster. (b) Goalkeeper detection is performed separately by identifying athletes inside the penalty area and clustering them based on embeddings.

During this stage, the pipeline distinguishes team affiliations via uniform-specific ReID embeddings enhanced with role prediction.

Post-Processing Stage

Post-processing focuses on merging fragmented player tracks into coherent trajectories. Advanced techniques use jersey numbers, team labels, and ReID vectors to correct tracking errors, eliminate identity swaps, and ensure temporal consistency.

This stage plays a critical role in the success of the pipeline, with sophisticated fusion techniques that integrate ReID feature vectors and jersey number recognition for transient scenarios where visual differentiation is hindered.

Evaluation and Results

The evaluation of the proposed method was conducted using the SoccerNet Game State Reconstruction (GSR) Challenge dataset. Results were measured with the GS-HOTA metric, extending the standard HOTA by emphasizing roles, team affiliations, and jersey numbers for rigorous tracking requirements.

Constructing Tech's solution achieved the highest score in the challenge with a GS-HOTA score of 63.81, outperforming competing teams and baselines significantly.

Conclusion and Implications

This paper effectively combines multiple advanced techniques for addressing the complexities inherent in GSR from soccer broadcast footage. With its modular design and robust optimization of camera and tracking systems, it sets a new standard in sports video understanding.

Future work is set to refine model integration, with plans to unify camera calibration and field detection models under a more comprehensive architecture. Improvements in orientation prediction and jersey number association are also expected to further enhance the capabilities of this pipeline.

Overall, the implications of this research lie in its potential applications within professional sports analytics, providing coaches and analysts with a sophisticated toolset for player performance evaluation and tactical decision-making. The successful implementation in a competitive challenge underscores its practical viability and lays the groundwork for future explorations in AI-driven sports analysis.

PDF Markdown

Paper Prompts

Explore 10 Community Prompts

Follow-up Questions

Authors (9)

Tweets

https://twitter.com/CSVisionPapers/status/1910285111251153100