Papers
Topics
Authors
Recent
Search
2000 character limit reached

TacEleven: Open-Play Tactic Generation

Updated 4 July 2026
  • TacEleven is a generative framework that models football open-play tactics as spatiotemporal graphs conditioned on natural-language instructions.
  • It combines a language-controlled tactical generator with a multimodal LLM-based tactical critic to produce and rank diverse tactical proposals.
  • Empirical evaluations on counterfactual and multi-step discovery show promising improvements in tactical realism, performance metrics, and expert adoption rates.

Searching arXiv for the specified TacEleven and related TacticGen papers to ground the article in the cited preprints. TacEleven is a generative framework for football open-play tactic discovery developed in close collaboration with domain experts from AJ Auxerre to assist coaches and analysts in tactical decision-making. It is designed for settings in which the potential tactic space grows exponentially as the sequence progresses, making automated tactic discovery difficult. The framework consists of two core components: a language-controlled tactical generator that produces diverse tactical proposals, and a multimodal LLM-based tactical critic that selects the optimal proposal aligned with a high-level stylistic tactical instruction. It is evaluated on counterfactual exploration, single-step discovery, and multi-step discovery, with both quantitative metrics and questionnaire-based qualitative assessment (Zhao et al., 17 Nov 2025).

1. Scope and problem formulation

TacEleven formalizes open-play tactic generation as a sequence-prediction problem over spatiotemporal graphs controlled by natural-language instructions. A graph instance is written as

G=(V,E,X,A,T,s),\mathcal G=(\mathcal V,\mathcal E,\mathbf X,A,T,s),

where V\mathcal V is the set of P+1P+1 entities on the pitch, EV×V\mathcal E\subseteq\mathcal V\times\mathcal V is the fully connected edge set, XRT×V×d\mathbf X\in\mathbb R^{|T|\times|\mathcal V|\times d} is the node-feature tensor, A{0,1}V×VA\in\{0,1\}^{|\mathcal V|\times|\mathcal V|} is the adjacency matrix, TT is the set of discrete time indices, and ss is a text description such as “Pass to Neymar” (Zhao et al., 17 Nov 2025).

Within this formulation, a tactic is represented as a sequence of atomic meta-actions drawn from a vocabulary A\mathbb A:

T=a1,a2,,aN,anA.\mathbf T=\langle a_1,a_2,\dots,a_N\rangle,\quad a_n\in\mathbb A.

At each step V\mathcal V0, TacEleven takes an input graph

V\mathcal V1

and a language condition V\mathcal V2, then predicts

V\mathcal V3

This setup places natural-language tactical control directly inside a graph-sequence forecasting problem rather than treating tactics as isolated labels or post hoc annotations (Zhao et al., 17 Nov 2025).

The framework distinguishes three task settings. In counterfactual exploration, V\mathcal V4 is selected from a small set of alternative event descriptions V\mathcal V5, such as “pass to left wing-back” or “carry through center,” and the generator is trained with an MSE objective over paired ground-truth trajectories. In single-step discovery, TacEleven generates V\mathcal V6 candidates from a high-level instruction V\mathcal V7 and uses a critic V\mathcal V8 to select the best one. In multi-step discovery, the single-step loop is repeated autoregressively for V\mathcal V9 steps, using the selected P+1P+10 as history to generate P+1P+11 (Zhao et al., 17 Nov 2025).

2. Language-controlled tactical generator

The language-controlled tactical generator, abbreviated LTG, is the proposal engine of TacEleven. Its input representation combines temporal positional encoding with language encoding. For each tokenized instruction P+1P+12, TacEleven uses pretrained BERT to produce token embeddings P+1P+13, then applies self-attention to obtain a sentence-level representation P+1P+14. This is concatenated with temporal positional encoding P+1P+15 to produce a temporal-language embedding

P+1P+16

which is replicated across nodes and times to form P+1P+17 (Zhao et al., 17 Nov 2025).

The core network uses spatiotemporal attention blocks. Each layer receives P+1P+18 together with the TL-embedding, computes spatial attention across nodes and temporal attention across time, and then combines them through gated fusion:

P+1P+19

Stacking EV×V\mathcal E\subseteq\mathcal V\times\mathcal V0 such blocks yields an encoder, followed by a mirrored decoder with a cross-time attention module that maps history onto future times. This architecture directly couples language control with relational spatiotemporal modeling, rather than appending text to a downstream ranking stage (Zhao et al., 17 Nov 2025).

TacEleven adds a VAE head to model diversity. The encoder output defines a posterior

EV×V\mathcal E\subseteq\mathcal V\times\mathcal V1

and the model is optimized with an ELBO,

EV×V\mathcal E\subseteq\mathcal V\times\mathcal V2

A plausible implication is that TacEleven treats tactical discovery as a controlled many-to-many generation problem: multiple plausible futures can satisfy the same high-level instruction, and the latent variable EV×V\mathcal E\subseteq\mathcal V\times\mathcal V3 is the mechanism used to preserve that diversity (Zhao et al., 17 Nov 2025).

3. Multimodal LLM-based tactical critic

The second core component is the multimodal LLM-based tactical critic, abbreviated MTC. The critic is a pre-trained multimodal LLM, with Qwen-QVQ-Max given as an example, and it is prompted with four modalities: a historical sketch image, candidate counterfactual sketch images, a high-level tactical instruction EV×V\mathcal E\subseteq\mathcal V\times\mathcal V4, and a style directive chosen from aggressive, neutral, or conservative. Candidate selection is then framed as a tree search: at each node EV×V\mathcal E\subseteq\mathcal V\times\mathcal V5, the LTG generates candidate graphs EV×V\mathcal E\subseteq\mathcal V\times\mathcal V6 via counterfactual descriptions, the LLM reads the sketches and instruction, returns a utility score for each candidate, and the framework chooses the max-score branch (Zhao et al., 17 Nov 2025).

For a single step, the decision rule is

EV×V\mathcal E\subseteq\mathcal V\times\mathcal V7

For multi-step discovery, this procedure is repeated autoregressively. The result is not only a generator of trajectories but a generator–critic loop in which language conditions operate at two distinct levels: low-level counterfactual descriptions for the generator and high-level stylistic instructions for the critic (Zhao et al., 17 Nov 2025).

A common misconception is that the critic is fine-tuned jointly with the generator. The reported implementation states the opposite: in practice the LLM is not fine-tuned. Instead, TacEleven uses a single large prompt that defines the LLM’s role, explains how to interpret red, blue, and yellow trajectories, lists candidate events as JSON, and asks for a per-candidate micro-analysis and a final pick. This makes prompt engineering, rather than learned end-to-end alignment, the operative mechanism for tactical ranking in the published system (Zhao et al., 17 Nov 2025).

4. Data pipeline and training configuration

TacEleven is built on football tracking and event data. The raw data comprises 10 Hz player/ball tracking for 23 entities, event logs such as pass, carry, and shot, player profiles, and match metadata. Timestamp alignment proceeds by detecting local extrema in ball-acceleration and, within EV×V\mathcal E\subseteq\mathcal V\times\mathcal V8 s of the event log time, choosing the anchor where ball–player distance is minimal. This reduces average misalignment from EV×V\mathcal E\subseteq\mathcal V\times\mathcal V9 s to XRT×V×d\mathbf X\in\mathbb R^{|T|\times|\mathcal V|\times d}0 s. After filtering incomplete cases, the preprocessing pipeline yields 1 076 258 valid meta-actions, and for each event TacEleven samples 5 historical and 5 future timesteps uniformly so that XRT×V×d\mathbf X\in\mathbb R^{|T|\times|\mathcal V|\times d}1 (Zhao et al., 17 Nov 2025).

The LTG training description reports 1.08 M text–trajectory pairs after timestamp alignment, with a 70/30 stratified train/test split. Optimization uses Adam with learning rate XRT×V×d\mathbf X\in\mathbb R^{|T|\times|\mathcal V|\times d}2, XRT×V×d\mathbf X\in\mathbb R^{|T|\times|\mathcal V|\times d}3, XRT×V×d\mathbf X\in\mathbb R^{|T|\times|\mathcal V|\times d}4, and weight decay XRT×V×d\mathbf X\in\mathbb R^{|T|\times|\mathcal V|\times d}5. Training is run with batch size XRT×V×d\mathbf X\in\mathbb R^{|T|\times|\mathcal V|\times d}6 for XRT×V×d\mathbf X\in\mathbb R^{|T|\times|\mathcal V|\times d}7 epochs with cosine decay plus warm-up. The reported model size is XRT×V×d\mathbf X\in\mathbb R^{|T|\times|\mathcal V|\times d}8 layers, hidden dimension XRT×V×d\mathbf X\in\mathbb R^{|T|\times|\mathcal V|\times d}9, approximately A{0,1}V×VA\in\{0,1\}^{|\mathcal V|\times|\mathcal V|}0 B parameters, trained on A{0,1}V×VA\in\{0,1\}^{|\mathcal V|\times|\mathcal V|}1 NVIDIA RTX A6000 (Zhao et al., 17 Nov 2025).

These design choices indicate that TacEleven is not a lightweight heuristic layer on top of tactical metadata. It is a large-scale spatiotemporal generative model trained on aligned text–trajectory pairs, with language control built into the representation and multimodal scoring built into inference. This suggests that the system is intended for high-capacity tactical search over open-play situations rather than for isolated event classification (Zhao et al., 17 Nov 2025).

5. Evaluation tasks and empirical findings

TacEleven is evaluated on three tasks with progressive tactical complexity. Counterfactual exploration uses a factual test set of 322 877 pairs and reports Factual Trajectory Error, Counterfactual Alignment Error, and Consistency A{0,1}V×VA\in\{0,1\}^{|\mathcal V|\times|\mathcal V|}2. Single-step discovery generates A{0,1}V×VA\in\{0,1\}^{|\mathcal V|\times|\mathcal V|}3 proposals for each test history and compares the selected proposal with the factual continuation using A{0,1}V×VA\in\{0,1\}^{|\mathcal V|\times|\mathcal V|}4xG, A{0,1}V×VA\in\{0,1\}^{|\mathcal V|\times|\mathcal V|}5xT, and A{0,1}V×VA\in\{0,1\}^{|\mathcal V|\times|\mathcal V|}6PC. Multi-step discovery builds 3-step sequences under aggressive, neutral, or conservative instructions and reports average A{0,1}V×VA\in\{0,1\}^{|\mathcal V|\times|\mathcal V|}7metrics at the final step (Zhao et al., 17 Nov 2025).

The quantitative results are reported separately for counterfactual, single-step, and multi-step settings. In counterfactual evaluation, the factual row has A{0,1}V×VA\in\{0,1\}^{|\mathcal V|\times|\mathcal V|}8, while a random proposal has A{0,1}V×VA\in\{0,1\}^{|\mathcal V|\times|\mathcal V|}9. In single-step discovery, reranked proposals show positive gains over factual across field zones: for the entire team, TT0, TT1xG TT2, TT3xT(A) TT4, TT5xT(D) TT6, TT7PC(A) TT8, and TT9PC(D) ss0; for the midfield, ss1xG reaches ss2; for the front line, ss3xT(A)ss4 and ss5PC(A)ss6. In multi-step discovery, aggressive and neutral instructions produce positive ss7xG, ss8xT, and ss9PC gains, whereas conservative preserves existing advantage (Zhao et al., 17 Nov 2025).

The qualitative assessment is questionnaire-based and involves 36 domain experts: 3 club analysts, 9 coaches, and 24 players. In single-step realism, experts were asked to distinguish factual from discovered tactics in 50 side-by-side cases; accuracy was 43.49%, with mean judgment approximately 0, indicating that discovered tactics were nearly indistinguishable from real. In single-step effectiveness on the same 50 cases, 44.46% of discovered tactics were rated better, 27% worse, and 28% no change; even experts, described as the strictest subgroup, marked 34% as improvements. In multi-step effectiveness on 30 cases, 41.11% were rated better, 31% worse, and 28% no change. In the 5-shot adoption-rate evaluation on 10 PSG vs Monaco failure cases, the average adoptable count was 2.63 out of 5, corresponding to a 52.50% adoption rate (Zhao et al., 17 Nov 2025).

Taken together, these findings position TacEleven as a system for generating and ranking alternatives that are not only realistic under blind comparison but also frequently judged tactically preferable. The reported pattern is especially notable in long-sequence open-play situations, where the paper states that the positive rate is higher in multi-step evaluation than in single-step evaluation (Zhao et al., 17 Nov 2025).

6. Limitations, interpretation, and relation to TacticGen

The published limitations are explicit. The LTG is trained on one league’s data, specified as Ligue 1 plus UCL quarter, semi, and final, so domain shift may occur in other competitions or stylistic environments. The MLLM critic relies on prompt engineering rather than learned alignment, and the paper identifies end-to-end fine-tuning as future work. Defense dynamics are only implicitly captured via pitch-control, and explicit opponent-role modeling is proposed as a possible improvement. Real-time inference latency is reported as approximately 100 ms per step, which may limit in-match usage without further optimization (Zhao et al., 17 Nov 2025).

The same paper identifies several extensions: expanding beyond offensive open play to defensive and off-ball tactics and counterattacks, incorporating richer semantics such as player fatigue and team formation, learning the critic’s scoring via supervised fine-tuning on expert-annotated tactical ratings, and transferring the generation–critique paradigm to other sports or non-sport domains such as autonomous driving maneuvers and robotics swarm planning. These directions reinforce that TacEleven is currently framed as a tactical discovery system rather than a complete game-theoretic simulator (Zhao et al., 17 Nov 2025).

TacEleven also sits within a broader line of football tactic generation research connected to TacticGen. TacticGen presents a generative model for adaptable and scalable tactic generation that formulates tactics as sequences of multi-agent movements and interactions conditioned on game context, using a multi-agent diffusion transformer with agent-wise self-attention and context-aware cross-attention. It is trained with over 3.3 million events and 100 million tracking frames from top-tier leagues, supports inference-time objective steering through classifier guidance specified via rules, natural language, or neural models, and reports expert validation of realistic and strategically valuable tactics (Xu et al., 20 Apr 2026). A related technical reference describes TacticGen as “aka ‘TacEleven’ in follow-on work” (Xu et al., 20 Apr 2026). This suggests a research trajectory in which the name TacEleven is associated not only with open-play tactic discovery through a generator–critic architecture, but also with a broader program of controlled multi-agent tactical generation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to TacEleven.