Trading-R1 Terminal: Evidence-Based Trading
- Trading-R1 Terminal is a large language model–driven trading system that uses structured, XML-tagged investment theses for risk-sensitive and interpretable financial decisions.
- It employs a multi-stage curriculum combining supervised fine-tuning and reinforcement learning to ensure evidence-backed output and disciplined trading strategies.
- The system leverages multi-modal financial data with expert modules to generate automated, transparent investment theses aligned with key performance metrics like Sharpe Ratio and cumulative returns.
The Trading-R1 Terminal is a LLM–driven financial trading system that systematically integrates structured, evidence-based reasoning and volatility-adjusted decision-making to support risk-sensitive, interpretable, and disciplined trading operations. Developed through a progressive curriculum that spans supervised fine-tuning and reinforcement learning, the system is constructed to bridge the gap between the explainability of financial analysis and the systematic rigor required for real-world market trading decisions (Xiao et al., 14 Sep 2025).
1. Model Architecture and Structured Reasoning
Trading-R1 is built on a backbone transformer (Qwen3–4B in reported experiments) that has been pre-adapted for reasoning, and incorporates expert-designed modules for professional, multi-step financial analysis. The architecture features a unique “reverse reasoning distillation” process: output recommendations from high-utility, but non-interpretable, API models are re-infused into a lighter planning LLM to generate detailed, step-by-step thesis traces from model recommendations.
The decision-making pipeline is further reinforced via Group Relative Policy Optimization (GRPO), an extension of standard reinforcement learning methods that computes a group-relative advantage for candidate completions: where is the realized reward (such as risk-adjusted return) for candidate . The total RL objective regularizes for policy drift: This structured objective encourages outputs that achieve superior group-normalized returns but regularizes against excessive deviation from supervised pretraining.
A signature characteristic is the disciplined transformation of plain language or tabular market data into XML-tagged, multi-sectioned, professional-grade thesis outputs (e.g., with <fundamentals>, <technical>, <news>, <risk_assessment>, <conclusion>).
2. Training Methodology and Curriculum Design
Trading-R1 uses a multi-stage easy-to-hard curriculum:
- Stage I ("Structure"): Supervised fine-tuning encourages reliably formatted, sectioned, and tagged thesis outputs; reinforcement learning reinforces output structure and prevents collapse into unstructured or generic completions.
- Stage II ("Claims"): Both SFT and RL reward explicit, evidentiary claims—every nontrivial assertion must be linked to direct citations or in-text quotes, reducing hallucinations.
- Stage III ("Decision"): SFT and RL are jointly used to align final model outputs to a five-point market action space (Strong Sell, Sell, Hold, Buy, Strong Buy), with supervision-labels derived from volatility-adjusted, multi-horizon realized returns emulating professional financial decision thresholds.
In each stage, SFT sets the target template and RL—via the GRPO objective—fine-tunes for optimal risk-return tradeoffs along entire output trajectories, with explicit regularization to maintain learned structure.
3. Data Corpus and Modalities
Training is performed on Tauric-TR1-DB, a dataset comprising 100,000 carefully curated instances covering 18 months, 14 major equities/ETFs, and five heterogeneous financial data sources:
- Technical features: time-series data (prices, volatility, moving averages, MACD, Bollinger Bands)
- Fundamental signals: balance sheets, income statements, cash flow, 10-K/10-Q SEC filings
- Textual news: newswire articles aggregated and aligned by recency
- Sentiment/insider data: transaction records, analyst ratings, buyback announcements
- Macroeconomic indicators: interest and inflation rates, labor statistics
Each training example is context-rich (20–30K tokens), with the model tasked to condense and explicate cross-modal information into a full investment thesis and actionable decision.
4. Evaluation Metrics and Experimental Benchmarking
Performance is quantified with both profitability and risk-adjusted metrics:
- Cumulative Return (CR): aggregate portfolio growth over test period
- Sharpe Ratio (SR): mean excess return divided by realized volatility
- Hit Rate (HR): proportion of correct directional predictions
- Maximum Drawdown (MDD): largest peak-to-trough portfolio loss
In backtests spanning NVDA, AAPL, MSFT, AMZN, META, and SPY, Trading-R1 consistently produces improved cumulative returns and reduced drawdowns relative to both open-source LLMs and commercial/proprietary reasoning models. The model hierarchy is empirically: SLM < RLM < LLM < Trading-SFT ≈ Trading-RFT < Trading-R1.
5. Investment Thesis Generation and Interpretability
A central feature of Trading-R1 is the ability to generate machine-verifiable, evidence-based, structured theses that align with professional analyst standards. Each output is XML-tagged into analytical sections, with claims grounded in direct facts (e.g., quoted earnings results, regulatory events, technical indicator thresholds) and accompanied by explicit risk assessment. This format supports both interpretability in automated trading systems and compliance in institutional research workflows. The output can be directly utilized for generating analyst-style reports or buy-side/portfolio management support.
6. Integration, Release, and Practical Significance
Trading-R1 Terminal is scheduled for public release at https://github.com/TauricResearch/Trading-R1 and is engineered for local deployment on commercial GPUs, supporting privacy-preserving workflows.
Potential applications include:
- Institutional quantitative research support for high-throughput, interpretable screening
- Automated sell-side financial report generation with traceable analyst logic
- Buy-side use for backtesting, risk scenario analysis, and model-driven trade ticketing
The system is designed for modular extensibility—users can re-train or adapt the model to further markets or additional data sources as needed.
7. Relationship to Preceding Research and Position in the Literature
Trading-R1 synthesizes advancements in LLM reasoning, group-based policy optimization, and structured output generation. By explicitly interleaving supervised and reinforcement learning phases across a staged curriculum, and by grounding outputs in multi-modal, evidence-based analyses, it addresses both the lack of interpretability in time-series models and the insufficient trading discipline in natural-language LLMs. The approach contrasts with classic quant models and pure LLM prompt engineering, evidencing performance gains both in return and in robust risk management (Xiao et al., 14 Sep 2025).
The system explicitly aims to match or exceed the reasoning traceability and structured trade discipline of human professionals, providing a comprehensive toolchain for interpretable, structured, and high-throughput financial trading.