Papers
Topics
Authors
Recent
Search
2000 character limit reached

GTATrack: Hierarchical Soccer Player Tracking

Updated 7 February 2026
  • GTATrack is a hierarchical multi-object tracking system that integrates Deep-EIoU for frame-level association and GTA-Link for trajectory-level refinement.
  • It employs iterative geometric expansion combined with deep appearance matching to overcome occlusion, distortion, and rapid motion in fisheye soccer scenarios.
  • Global tracklet clustering and semi-supervised pseudo-labeling enhance identity preservation and reduce false positives, driving HOTA scores up to 0.60.

GTATrack is a hierarchical multi-object tracking (MOT) system that integrates @@@@1@@@@ (Deep-EIoU) for frame-level association with Global Tracklet Association (GTA) for trajectory-level refinement, targeting the challenges of soccer player tracking in fisheye camera scenarios characterized by occlusion, rapid player motion, extreme geometric distortion, and target appearance ambiguity. As the winning solution to the SoccerTrack 2025 Challenge, GTATrack achieved a primary HOTA score of 0.60 and demonstrated substantially improved identity preservation and false positive control compared to previous approaches (Jian et al., 31 Jan 2026).

1. System Architecture and Workflow

GTATrack employs a two-stage tracking stack:

  • Stage 1: Online, real-time association leverages Deep-EIoU, which combines iterative geometric matching and deep appearance similarity while omitting explicit motion prediction or filtering. Association is implemented via a Hungarian solver minimizing a composite cost matrix for each frame.
  • Stage 2: After initial online tracklet construction, an offline global refinement module (GTA-Link) clusters fragmented short-term tracklets into identity-consistent long trajectories by hierarchical clustering over deep appearance embeddings with spatial and temporal constraints.

The high-level pseudocode is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Inputs: video frames {II_T}
Outputs: final trajectories

// Pre-load detector 𝒟 (YOLOv11x) and ReID model ℛ (OSNet)
initialize activeTracklets = 
for t = 1T do
    detections O_t = 𝒟(I_t)
    for each detection o_i in O_t:
        crop c_i from I_t using o_i's bbox
        f_i = ℛ(c_i)  # D-dimensional L2-normalized vector
    C = buildCostMatrix(activeTracklets, O_t)  # Deep-EIoU cost terms
    X* = Hungarian(C)
    update activeTracklets
// After all frames:
finalTrajectories = GTA_Link(activeTracklets)
return finalTrajectories
This modular structure enables both robust frame-level recovery in distorted environments and globally-optimal identity maintenance over long sequences (Jian et al., 31 Jan 2026).

2. Deep Expansion IoU for Frame-Level Association

In contrast to motion-based predictors such as Kalman filters that are brittle under erratic sports motion and fisheye distortion, Deep-EIoU fuses spatial and appearance cues:

  • Expansion IoU (EIoU): For each candidate association, a query bounding box is iteratively expanded (scaling w,hw,h by (1+s) for KK steps), computing IoU against target box at each expansion:

EIoU(bi,bτ)=maxk=0...KIoU(bik,bτ)EIoU(b_i, b_\tau) = \max_{k=0...K} IoU(b^k_i, b_\tau)

  • Deep Appearance Cost: L2-normalized ReID features fi,fτf_i, f_\tau are compared via cosine distance:

dapp(i,τ)=1(fifτ)d_{app}(i, \tau) = 1 - (f_i \cdot f_\tau)

  • Composite Association Cost:

C(i,τ)=λ(1EIoU(bi,bτ))+(1λ)dapp(i,τ)C(i, \tau) = \lambda (1-EIoU(b_i, b_\tau)) + (1-\lambda) d_{app}(i, \tau)

With λ=0.5\lambda=0.5, s=0.2s=0.2, K=3K=3 expansion stages, and proximity thresholds (e.g., τprox=0.9\tau_{prox} = 0.9), optimal discrimination of true matches is achieved in the SoccerTrack protocol (Jian et al., 31 Jan 2026).

This design robustly associates detections even when the initial IoU is low due to lens distortion or abrupt shifts, provided appearance consistency is preserved.

The GTA-Link module addresses three principal tracking errors: intra-tracklet ID switches, fragmented trajectories due to occlusions/re-entries, and false merges.

  • Pairwise Tracklet Distance: For two tracklets τi,τj\tau_i, \tau_j of lengths Li,LjL_i, L_j, average appearance distance is

Dapp(τi,τj)=1LiLjm=1Lin=1Lj(1fi,mfj,n)D_{app}(\tau_i, \tau_j) = \frac{1}{L_i L_j} \sum_{m=1}^{L_i} \sum_{n=1}^{L_j} (1 - f_{i,m}^\top f_{j,n})

  • Temporal Constraint: Merges are permitted only if 0<tstart(τj)tend(τi)Δmax0 < t_{start}(\tau_j) - t_{end}(\tau_i) \leq \Delta_{max} (Δmax=50\Delta_{max} = 50), prventing non-causal associations.
  • Clustering Objective: A graph G=(Tinit,E)G = (\mathcal{T}_{init}, E) is built with edge weights wij=Dapp(τi,τj)w_{ij}=D_{app}(\tau_i, \tau_j). Hierarchical single-linkage or DBSCAN-style clustering (eps=0.5, min_samples=7) assembles tracklets into identity-consistent clusters, optimizing

minclusters Ci<jCwij\min \sum_{clusters\ C} \sum_{i < j \in C} w_{ij}

subject to the temporal constraints. Cycle-free, one-to-one linkages are enforced.

This global association step is responsible for a ∼3–4 HOTA point increase and halving the number of ID switches relative to strong baselines (Jian et al., 31 Jan 2026, Sun et al., 2024).

4. Semi-Supervised Pseudo-Labeling for Detector Training

Recalling that missed detections and false positives undermine trajectory continuity, GTATrack augments the detector’s training with a pseudo-labeling scheme:

  • Generation: YOLOv11x is initially trained on official ground-truth annotations, then run on unlabeled frames. Detections with confidence 0.9\geq 0.9 are retained as pseudo-ground-truth.
  • Training Integration: Batch composition is 1:1 real:pseudo, using standard YOLO losses (Ldet=Lcls+Lbox+LobjL_{det} = L_{cls} + L_{box} + L_{obj}); pseudo losses are down-weighted (wp=0.5w_p = 0.5), accounting for possible label noise.

This strategy improves recall for small and distant players and results in an approximately 90% reduction in false positives: FP drops from 4913 to 494, HOTA improves from 0.38 to 0.49 (Table 3, (Jian et al., 31 Jan 2026)).

5. Performance Metrics and Experimental Evaluation

Tracking diagnostics are established by standardized multi-object metrics:

  • HOTA (DetAAssA\sqrt{DetA \cdot AssA})
  • IDSW (identity switches, lower better)
  • LocA (localization accuracy)
  • DetA (detection accuracy)
  • AssA (association accuracy)
  • FN/FP (false negatives/positives)

On the SoccerTrack 2025 test set (Table 7, (Jian et al., 31 Jan 2026)):

Method HOTA IDSW LocA DetA AssA FN FP
GTATrack 0.60 331 0.84 0.76 0.47 5454.5 982
ByteTrack 0.42 630

Ablation studies show that Deep-EIoU improves HOTA by 12 points over ByteTrack and that GTA-Link with pseudo-labeling delivers the best overall performance.

6. Implementation and Open-Source Availability

GTATrack is implemented on a single NVIDIA RTX 3090 GPU. Key components include:

  • Detection: YOLOv11x, input size 1280 px, batch 12.
  • ReID Backbone: OSNet, D=512, L2-normalized features.
  • Training: AdamW, 200 epochs, lr=1e4lr=1\text{e}{-4}, with multi-scale augmentations.
  • Deep-EIoU Parameters: s=0.2s=0.2, K=3K=3, λ=0.5\lambda=0.5, τprox=0.9\tau_{prox}=0.9.
  • GTA-Link Parameters: Δmax=50\Delta_{max}=50, DBSCAN eps=0.5, min_samples=7.

The full pipeline, including code for detection, Deep-EIoU, GTA-Link, and all training/inference scripts, is available at https://github.com/ron941/GTATrack-STC2025 (Jian et al., 31 Jan 2026).

7. Context, Limitations, and Generalization Potential

GTATrack is tailored for single-camera, fixed-view sports scenarios—specifically, soccer with static fisheye cameras. Spatial constraints within GTA-Link rely on fixed-field geometry, and hyperparameters (for clustering, temporal windows, pseudo-label threshold) were empirically set for SoccerTrack data. The split-and-merge paradigm demonstrated here has applicability to domains with strong visual ReID signals and challenging appearance/geometry (e.g., small distant objects, highly dynamic scenes) but would require adaptation for online or multi-camera settings (Sun et al., 2024). The synergy of geometric expansion, global appearance clustering, and semi-supervised detection refinement underpins its state-of-the-art performance in the competitive SoccerTrack 2025 context.


Key references:

  • "GTATrack: Winner Solution to SoccerTrack 2025 with Deep-EIoU and Global Tracklet Association" (Jian et al., 31 Jan 2026)
  • "GTA: Global Tracklet Association for Multi-Object Tracking in Sports" (Sun et al., 2024)
Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to GTATrack.