TradExpert Framework Overview
- TradExpert is a dual framework that integrates a TARexp platform for reproducible technology-assisted review and a mixture-of-expert LLM system for quantitative trading.
- TARexp employs modular components, declarative workflows, and stateful ledger management to streamline large-scale information retrieval experiments.
- The TradExpert LLM system leverages specialized experts for multimodal financial data, achieving superior prediction accuracy and risk-adjusted trading performance.
TradExpert denotes two unrelated frameworks: (1) TARexp (“TradExpert”), a Python platform for technology-assisted review (TAR) experiments in information retrieval and machine learning (Yang et al., 2022); and (2) TradExpert, a mixture-of-experts LLM system for multimodal quantitative trading (Ding et al., 2024). This entry distinguishes both, reflecting their distinct design and application domains.
1. TARexp (“TradExpert”): Software Framework for Technology-Assisted Review
TARexp (“TradExpert”) is an open-source Python framework enabling researchers to declare, run, and analyze large-scale, reproducible experiments for TAR algorithms (Yang et al., 2022). It is architected for modularity, workflow transparency, and extensibility, targeting industrial applications in information retrieval and machine learning.
1.1 System Architecture and Modules
TARexp’s architecture is composed of four core modules:
| Module | Primary Function | Notable Features |
|---|---|---|
| tarexp.components | Abstract roles: samplers, rankers, labelers, stopping rules | Component “combining”; multi-role support |
| tarexp.workflow | Iterative workflows (OnePhase, TwoPhase) | Workflow definition and orchestration |
| tarexp.ledger | Labeling/model ledger | Persistent state, dumping & replay |
| tarexp.experiment | Experiment plans and execution engine | Declarative experiment grid, parallelism |
A standard workflow instantiates a combination of these components. Each experimental round executes: sampler → labeler → trainer → ranker → stopping_rule, with decisions and intermediate state written to the ledger.
1.2 Declarative Workflow and Experiment Plan Design
Workflows and experiment plans are specified at the Python declaration layer, not via static configuration files:
- Component “combining” enables flexible assignment of roles (e.g., the same class may act as both sampler and stopping rule).
- Workflows accept explicit parameterization: dataset, component-setting, batch_size, seed_doc, random_seed.
- TARExperiment objects permit grid declaration over metrics, topics, workflow classes, and hyperparameter sweeps. The
run(…)interface supports parallel execution, checkpointing, and resumption.
1.3 Component Abstractions and IR/ML Design Patterns
TARexp adapts established IR/ML toolkit patterns, e.g.:
- From libact: separation of sampler, labeler, and model; extended so a single object may implement multiple roles.
- From scikit-learn: wrapper components for use of any estimator conforming to
.decision_functionor.predict_proba. - From pyTerrier: TaskFeeder and pipeline composition model for flexible declarative pipelines.
- From ir-measures: metric computation with correct tiebreaking for both evaluation and stopping criteria.
A single component may serve as Sampler (batch selection), Labeler (simulated or human), Trainer/Ranker (model updating and scoring), StoppingRule (termination control), or Assessor/Estimator (interim recall/precision inference).
1.4 State Management, Restart, and “Frozen Mode”
The workflow ledger records each round’s state: round number, doc IDs selected, assigned labels, scores (optional). Key features:
- State is periodically dumped based on configurable intervals.
- Experiments may be paused and resumed, with the random generator and all state deterministically reloaded from the ledger.
- “Frozen mode” permits replaying a completed ledger without retraining for efficient benchmarking of new stopping rules or estimators.
High-level workflow pseudocode as specified in the source:
8
1.5 Mathematical Formulation and Key Metrics
TARexp incorporates standard IR metrics such as precision, recall, and F₁-score:
- Precision
- Recall
- Uncertainty sampling (margin): for the two top predicted classes.
- Cost structure for target recall with per-step costs :
1.6 Extensibility and Experimental Workflow
Researchers may subclass or combine new component types and conduct custom studies as illustrated:
9
Results integration uses Pandas DataFrames with multi-level indices (component, batch size, topic, round), supporting further aggregation and visualization.
1.7 Best Practices
Recommended usage includes: fixing random seeds and dump intervals for reproducibility, keeping experiment specifications under version control, unit-testing custom components in isolation, utilizing Colab for rapid prototyping, and leveraging cost-structure visualizations for workflow comparison. The official GitHub repository is the recommended source for up-to-date reference implementations.
2. TradExpert: Mixture-of-Expert LLMs for Quantitative Trading
TradExpert is a framework for quantitative trading based on a Mixture-of-Experts (MoE) LLM architecture, developed for structured and unstructured multimodal financial information synthesis (Ding et al., 2024). It is designed to address data heterogeneity in trading by specializing LLM “agents” for different modalities, and integrating via a generalist fusion mechanism.
2.1 Architecture: MoE and Expert Specialization
TradExpert’s architecture comprises two tiers:
| Expert LLM | Data Modality | Specialization |
|---|---|---|
| News Analyst | Raw news articles | Unstructured text |
| Market Analyst | OHLCV time series | Quantitative market |
| Alpha Expert | Expression-based alpha factors | Symbolic, numerical |
| Fundamental Analyst | Earnings transcripts, financial ratios | Text, numeric |
| General Expert | Synthesis of above | Decision fusion |
Given input , each expert generates an embedding and summary . The General Expert computes weights 0 via:
1
and fuses expert outputs: 2
The General Expert’s transformer layers implicitly realize gating and fusion—no separate network is instantiated, although gating is made explicit in the formulation.
2.2 Multimodal Data Processing
Each expert is fine-tuned on its pertinent modality:
- News Analyst uses tokenized, truncated articles (<2K tokens) and outputs CoT reasoning with a binary prediction.
- Market Analyst applies reprogramming to OHLCV time series: data is patch-embedded, aligned with text-prototype embeddings from the LLM vocabulary, and further summarized using TSFresh statistics appended as prompts.
- Alpha Expert selects top-3 (typically 4) alphas from 108 generic formulae using LightGBM importance scores; each prompt presents factor description and value.
- Fundamental Analyst processes up to 4K-token earnings transcripts, appending normalized (z-score) financial ratios as numeric tokens, and outputs a five-way movement label plus justification.
2.3 Prediction and Ranking Modes
TradExpert flexibly supports two major modes:
- Prediction (binary classification): General Expert receives all expert summaries. Prompt includes instruction to produce a “Rise/Fall” outcome for the next prediction window.
- Ranking (pairwise): Each pair of stocks 5, 6 is presented via their summaries; the model is prompted to indicate which will outperform in the prediction horizon. Ranking is derived from the count of pairwise “wins” (requiring 7 comparisons). This relaxed BubbleSort-style aggregation mitigates inconsistencies due to non-transitive LLM judgments.
2.4 Large-Scale Financial Benchmark
TradExpert accompanies a benchmark dataset spanning all S&P500 stocks from 2020–2023:
| Modality | Count | Features/Notes |
|---|---|---|
| News Articles | 524,995 | Avg. 596.4 words, ticker links |
| OHLCV Records | 481,484 | 5 fields per trading day |
| Alpha Factors | 108 factors | Formula + GPT-4 description per factor |
| Transcripts | 500 × 16 | Quarterly earning calls |
| Financial Ratios | ~5 per call | EPS, P/E, BVPS, etc. |
- Training: 2020-01-01–2022-06-30; Validation: 2022-07-01–2022-12-31; Test: 2023-01-01–2023-12-31.
2.5 Empirical Results
Stock Movement Prediction
TradExpert-NM (News+Market only) delivers state-of-the-art performance over single-model LLMs on S&P500 binary movement prediction, using Accuracy (Acc) and Matthews Correlation Coefficient (MCC):
| Method | Acc ↑ | MCC ↑ |
|---|---|---|
| InternLM-7B | 0.60 | 0.06 |
| Gemini | 0.59 | 0.11 |
| TradExpert-NM | 0.64 | 0.19 |
Trading Simulation on DOW30
Measured by Annualized Return (AR), Annualized Volatility (AV), Sharpe Ratio (SR), and Max Drawdown (MD):
| Method | AR ↑ | AV ↓ | SR ↑ | MD ↓ |
|---|---|---|---|---|
| DeepTrader | 32.45% | 17.86% | 1.82 | 15.32% |
| TradExpert | 49.79% | 9.95% | 5.01 | 6.56% |
TradExpert’s MoE substantially improves both return and risk-adjusted performance relative to baseline traditional and deep RL-based strategies.
2.6 Insights, Limitations, and Future Work
Specialization of LLM agents enables explicit division of labor across textual, time-series, symbolic, and numeric modalities. Synthesis in the General Expert allows for nuanced weighting of complementary signals. Reasoning is enabled through chain-of-thought prompting and data reprogramming.
Reported limitations include per-stock latency (~4.7 seconds on Nvidia A5000), unsuitable for high-frequency contexts, and context length constraints due to LLaMA-2 token windows, requiring expert summaries.
Anticipated directions include reduction of latency for higher-frequency trading, expanding to global multi-market universes, and instantiating dedicated learning-based gating networks for adaptive expert weighting.
3. Context, Significance, and Comparative Summary
Both frameworks—TARexp and TradExpert—are exemplars of modular, declarative experiment design in their respective fields: document review (IR/ML) and financial time-series prediction/trading. TARexp advances reproducibility and rapid experimentation for technology-assisted review, lowering implementation barriers with component-based design and immutable experiment logs (Yang et al., 2022). TradExpert applies LLMs to heterogeneous financial data, achieving robust, interpretable synthesis and outperforming both deep LLM and tabular baselines via explicit domain specialization and fusion (Ding et al., 2024).
4. References
- TARexp (“TradExpert”): (Yang et al., 2022)
- TradExpert (Mixture-of-Expert LLMs for Trading): (Ding et al., 2024)