TS-Agent: Autonomous Forecasting Synthesis
- TS-Agent is an autonomous system that synthesizes, evaluates, and iteratively refines time series forecasting algorithms using self-evolving code generation and review.
- It employs a Metric-Advantage Monte Carlo Tree Search to distinguish marginal improvements from breakthrough advances with statistically normalized rewards.
- The system integrates automated code review, global comparative analysis, and MAP-Elites archives to preserve diversity and enhance forecast architecture performance.
A TS-Agent is an autonomous system designed for end-to-end synthesis, evaluation, and refinement of time series forecasting algorithms. The SEA-TS (Self-Evolving Agent for Time Series Algorithms) framework establishes TS-Agent as a full-stack engineering entity: it plans, generates, evaluates, debugs, and iteratively improves code for forecasting tasks, closing the loop between code generation, automated review, knowledge distillation, and architectural exploration. TS-Agent not only reproduces established best practices but is capable of discovering and validating novel architectural motifs that generalize across varying time series domains (Xu et al., 5 Mar 2026).
1. Agent Architecture and Self-Evolution Mechanism
TS-Agent organizes algorithmic development as a sequential decision process over a tree , with each node corresponding to a complete Python forecasting solution . The process initiates at a reference implementation and recursively expands leaves through Upper Confidence bounds for Trees (UCT) selection. Candidate expansions are generated using a composite prompt consisting of the fixed task description, a dynamically maintained running prompt encoding historical review insights, local code context, global best/worst comparisons, and representative elite exemplars from a MAP-Elites archive.
Child code is synthesized via LLM-driven generation, executed in sandboxed environments, and scored on downstream forecast metrics. Code and logical outcomes are fed into automated review engines, with findings continually propagated to update the running prompt and internal state, thereby instantiating an iterative self-evolution loop (Xu et al., 5 Mar 2026).
2. Metric-Advantage Monte Carlo Tree Search (MA-MCTS)
Traditional MCTS approaches in program synthesis apply fixed or binary reward signals, leading to insufficient discrimination between marginal and breakthrough advances in model performance. TS-Agent innovates by employing a statistically normalized advantage signal:
where is the set of observed metric values, their mean, and the standard deviation. The backpropagated reward also incorporates a binary bug flag determined by the code review:
Rewards are aggregated through the tree to inform subsequent UCT-guided selections. This reward normalization sharpens the distinction of genuine breakthroughs and accelerates the agent's shift from exploration to exploitation as the metric variance diminishes (Xu et al., 5 Mar 2026).
3. Automated Code Review and Prompt Refinement
TS-Agent employs LLM-powered code reviewers to evaluate every candidate solution for logical integrity, targeting issues such as future-leakage in feature engineering, train/test contamination, and inference-stage inconsistencies. If a logical flaw is detected, the candidate is penalized and corrective "pattern fixes" are distilled (e.g., "always apply .shift(1) before rolling statistics"). These insights, along with global analysis comparing best/worst solutions, are integrated into the evolving running prompt , which accumulates prescriptive safeguards, error-avoidance heuristics, and positive design templates. The running prompt thus functions as a persistent, self-updating knowledge base that continually inoculates the agent against recurring failure modes (Xu et al., 5 Mar 2026).
4. Global Steerable Reasoning and Cross-Branch Knowledge Transfer
Rather than constraining decision-making to strictly local context, TS-Agent operationalizes global awareness by inducing explicit comparisons with the current global best and worst solutions. Structured prompts encapsulate this context:
where , , and denote code, associated plans, and metrics. Auxiliary LLMs generate structured comparative analyses, advising emulation of effective strategies and avoidance of deficient tactics. These are appended to the node's review context and propagated into the running prompt, enabling cross-trajectory information transfer and facilitating "jumps" to promising algorithmic regions beyond incremental local optimization (Xu et al., 5 Mar 2026).
5. Quality-Diversity Preservation via MAP-Elites Archive
To circumvent mode collapse onto a narrow set of architectural paradigms, TS-Agent adopts a MAP-Elites archive indexed along axes salient to forecasting: architecture type (tree-based, attention, hybrid), feature engineering sophistication, and training regimen complexity. Each cell records only the highest-performing solutions, with periodic migration across neighboring cells. When constructing prompts for new code generation, the agent samples exemplars from the archive, guaranteeing that both diversity and high-quality innovations inform subsequent explorations (Xu et al., 5 Mar 2026).
| MAP-Elites Axis | Archive Value Examples |
|---|---|
| Architecture Type | Tree-based, Attention, Hybrid |
| Feature Engineering | None, Moderate, Extensive |
| Training Sophistication | Basic, Standard, Advanced |
6. Empirical Performance and Benchmarking
On the Solar-Energy public benchmark (10min resolution, 137 PV plants), TS-Agent, via SEA-TS, reduced test MAE from TimeMixer's 2.929 to 1.757—a 40% improvement. On proprietary solar PV data, WAPE was reduced from 25.75% to 17.12%, and on residential load forecasting, baseline WAPE of 47.47% was improved to 39.74%; MAPE was reduced from 29.34% (TimeMixer) to 26.17%. Ablative studies confirm that MA-MCTS, running prompt refinement, and global reasoning components each measurably enhance search efficiency and final accuracy over ablations and human-designed baselines (Xu et al., 5 Mar 2026).
7. Novel Algorithmic Innovations Autonomous to TS-Agent
TS-Agent autonomously discovered forecast head architectures not previously described in the literature:
- Physics-Informed Monotonic Decay Head: Encodes the monotonic decline in solar irradiance post-meridian via
with , , as learnable parameters, supplemented by a positivity-penalty regularizer.
- Per-Station Diurnal Residual Profiles: Learns station-specific daily cycles
for each site , adapting to unique intra-day demand or generation dynamics.
- Learnable Hourly Bias Correction: Applies scale-sensitive, hour-conditioned corrections:
with trainable per hour.
All heads are integrated via a soft attentive gating over station and hour embeddings:
This suggests that end-to-end self-evolving agents can surpass manual domain engineering by uncovering and validating domain-informed, high-performing motifs (Xu et al., 5 Mar 2026).
8. Contextualization within Agent Testing and Evaluation
Evaluation of TS-Agent systems can leverage methodologies such as the Agent-Testing Agent (ATA), which combines static code analysis, literature mining, and adaptive adversarial testing for reproducible and robust assessment. ATA orchestrates evidence-grounded reasoning modules, identifies failure modes (e.g., unsatisfiable constraints, ambiguity), and outputs severity metrics, failure diversity, and test coverage, ensuring that the tested TS-Agent's weak points are systematically surfaced and addressed (Komoravolu et al., 24 Aug 2025). The integration of such meta-evaluation frameworks is essential for closing the loop between agent generation and deployment-level reliability.
TS-Agent, as instantiated by frameworks like SEA-TS, represents a paradigm for autonomous algorithmic innovation in time series forecasting, coupling search, review, and learning in a unified loop. This enables measurable advances in accuracy, efficiency, and novelty, and establishes a generalizable methodology for self-evolving engineering agents in scientific computing (Xu et al., 5 Mar 2026, Komoravolu et al., 24 Aug 2025).