TradExpert Framework Overview

Updated 1 June 2026

TradExpert is a dual framework that integrates a TARexp platform for reproducible technology-assisted review and a mixture-of-expert LLM system for quantitative trading.
TARexp employs modular components, declarative workflows, and stateful ledger management to streamline large-scale information retrieval experiments.
The TradExpert LLM system leverages specialized experts for multimodal financial data, achieving superior prediction accuracy and risk-adjusted trading performance.

TradExpert denotes two unrelated frameworks: (1) TARexp (“TradExpert”), a Python platform for technology-assisted review (TAR) experiments in information retrieval and machine learning (Yang et al., 2022); and (2) TradExpert, a mixture-of-experts LLM system for multimodal quantitative trading (Ding et al., 2024). This entry distinguishes both, reflecting their distinct design and application domains.

1. TARexp (“TradExpert”): Software Framework for Technology-Assisted Review

TARexp (“TradExpert”) is an open-source Python framework enabling researchers to declare, run, and analyze large-scale, reproducible experiments for TAR algorithms (Yang et al., 2022). It is architected for modularity, workflow transparency, and extensibility, targeting industrial applications in information retrieval and machine learning.

1.1 System Architecture and Modules

TARexp’s architecture is composed of four core modules:

Module	Primary Function	Notable Features
tarexp.components	Abstract roles: samplers, rankers, labelers, stopping rules	Component “combining”; multi-role support
tarexp.workflow	Iterative workflows (OnePhase, TwoPhase)	Workflow definition and orchestration
tarexp.ledger	Labeling/model ledger	Persistent state, dumping & replay
tarexp.experiment	Experiment plans and execution engine	Declarative experiment grid, parallelism

A standard workflow instantiates a combination of these components. Each experimental round executes: sampler → labeler → trainer → ranker → stopping_rule, with decisions and intermediate state written to the ledger.

1.2 Declarative Workflow and Experiment Plan Design

Workflows and experiment plans are specified at the Python declaration layer, not via static configuration files:

Component “combining” enables flexible assignment of roles (e.g., the same class may act as both sampler and stopping rule).
Workflows accept explicit parameterization: dataset, component-setting, batch_size, seed_doc, random_seed.
TARExperiment objects permit grid declaration over metrics, topics, workflow classes, and hyperparameter sweeps. The run(…) interface supports parallel execution, checkpointing, and resumption.

1.3 Component Abstractions and IR/ML Design Patterns

TARexp adapts established IR/ML toolkit patterns, e.g.:

From libact: separation of sampler, labeler, and model; extended so a single object may implement multiple roles.
From scikit-learn: wrapper components for use of any estimator conforming to .decision_function or .predict_proba.
From pyTerrier: TaskFeeder and pipeline composition model for flexible declarative pipelines.
From ir-measures: metric computation with correct tiebreaking for both evaluation and stopping criteria.

A single component may serve as Sampler (batch selection), Labeler (simulated or human), Trainer/Ranker (model updating and scoring), StoppingRule (termination control), or Assessor/Estimator (interim recall/precision inference).

1.4 State Management, Restart, and “Frozen Mode”

The workflow ledger records each round’s state: round number, doc IDs selected, assigned labels, scores (optional). Key features:

State is periodically dumped based on configurable intervals.
Experiments may be paused and resumed, with the random generator and all state deterministically reloaded from the ledger.
“Frozen mode” permits replaying a completed ledger without retraining for efficient benchmarking of new stopping rules or estimators.

High-level workflow pseudocode as specified in the source:

$R = \frac{|\{\text{relevant}\}\cap \{\text{retrieved}\}|}{|\{\text{relevant}\}|}$ 8

1.5 Mathematical Formulation and Key Metrics

TARexp incorporates standard IR metrics such as precision, recall, and F₁-score:

Precision $P = \frac{|\{\text{relevant}\}\cap \{\text{retrieved}\}|}{|\{\text{retrieved}\}|}$
Recall $R = \frac{|\{\text{relevant}\}\cap \{\text{retrieved}\}|}{|\{\text{relevant}\}|}$
$F_1 = \frac{2PR}{P+R}$
Uncertainty sampling (margin): $\operatorname{acq}(x) = \arg\min_{x\in\mathcal{U}} (p_{\theta}(y_1|x) - p_{\theta}(y_2|x))$ for the two top predicted classes.
Cost structure for target recall $\tau$ with per-step costs $[c_T, c_R, c_L, c_H]$ :

$\text{Cost}(\tau) = \min_{r: R(r)\ge \tau} \bigl(c_T r + c_R R(r) N + c_L L(r) + c_H H(r)\bigr)$

1.6 Extensibility and Experimental Workflow

Researchers may subclass or combine new component types and conduct custom studies as illustrated:

$R = \frac{|\{\text{relevant}\}\cap \{\text{retrieved}\}|}{|\{\text{relevant}\}|}$ 9

Results integration uses Pandas DataFrames with multi-level indices (component, batch size, topic, round), supporting further aggregation and visualization.

1.7 Best Practices

Recommended usage includes: fixing random seeds and dump intervals for reproducibility, keeping experiment specifications under version control, unit-testing custom components in isolation, utilizing Colab for rapid prototyping, and leveraging cost-structure visualizations for workflow comparison. The official GitHub repository is the recommended source for up-to-date reference implementations.

2. TradExpert: Mixture-of-Expert LLMs for Quantitative Trading

TradExpert is a framework for quantitative trading based on a Mixture-of-Experts (MoE) LLM architecture, developed for structured and unstructured multimodal financial information synthesis (Ding et al., 2024). It is designed to address data heterogeneity in trading by specializing LLM “agents” for different modalities, and integrating via a generalist fusion mechanism.

2.1 Architecture: MoE and Expert Specialization

TradExpert’s architecture comprises two tiers:

Expert LLM	Data Modality	Specialization
News Analyst	Raw news articles	Unstructured text
Market Analyst	OHLCV time series	Quantitative market
Alpha Expert	Expression-based alpha factors	Symbolic, numerical
Fundamental Analyst	Earnings transcripts, financial ratios	Text, numeric
General Expert	Synthesis of above	Decision fusion

Given input $x=(x_\mathrm{news}, x_\mathrm{market}, x_\mathrm{alpha}, x_\mathrm{fund})$ , each expert generates an embedding $E_i(x)$ and summary $h_i$ . The General Expert computes weights $R = \frac{|\{\text{relevant}\}\cap \{\text{retrieved}\}|}{|\{\text{relevant}\}|}$ 0 via:

$R = \frac{|\{\text{relevant}\}\cap \{\text{retrieved}\}|}{|\{\text{relevant}\}|}$ 1

and fuses expert outputs: $R = \frac{|\{\text{relevant}\}\cap \{\text{retrieved}\}|}{|\{\text{relevant}\}|}$ 2

The General Expert’s transformer layers implicitly realize gating and fusion—no separate network is instantiated, although gating is made explicit in the formulation.

2.2 Multimodal Data Processing

Each expert is fine-tuned on its pertinent modality:

News Analyst uses tokenized, truncated articles (<2K tokens) and outputs CoT reasoning with a binary prediction.
Market Analyst applies reprogramming to OHLCV time series: data is patch-embedded, aligned with text-prototype embeddings from the LLM vocabulary, and further summarized using TSFresh statistics appended as prompts.
Alpha Expert selects top- $R = \frac{|\{\text{relevant}\}\cap \{\text{retrieved}\}|}{|\{\text{relevant}\}|}$ 3 (typically $R = \frac{|\{\text{relevant}\}\cap \{\text{retrieved}\}|}{|\{\text{relevant}\}|}$ 4) alphas from 108 generic formulae using LightGBM importance scores; each prompt presents factor description and value.
Fundamental Analyst processes up to 4K-token earnings transcripts, appending normalized (z-score) financial ratios as numeric tokens, and outputs a five-way movement label plus justification.

2.3 Prediction and Ranking Modes

TradExpert flexibly supports two major modes:

Prediction (binary classification): General Expert receives all expert summaries. Prompt includes instruction to produce a “Rise/Fall” outcome for the next prediction window.
Ranking (pairwise): Each pair of stocks $R = \frac{|\{\text{relevant}\}\cap \{\text{retrieved}\}|}{|\{\text{relevant}\}|}$ 5, $R = \frac{|\{\text{relevant}\}\cap \{\text{retrieved}\}|}{|\{\text{relevant}\}|}$ 6 is presented via their summaries; the model is prompted to indicate which will outperform in the prediction horizon. Ranking is derived from the count of pairwise “wins” (requiring $R = \frac{|\{\text{relevant}\}\cap \{\text{retrieved}\}|}{|\{\text{relevant}\}|}$ 7 comparisons). This relaxed BubbleSort-style aggregation mitigates inconsistencies due to non-transitive LLM judgments.

2.4 Large-Scale Financial Benchmark

TradExpert accompanies a benchmark dataset spanning all S&P500 stocks from 2020–2023:

Modality	Count	Features/Notes
News Articles	524,995	Avg. 596.4 words, ticker links
OHLCV Records	481,484	5 fields per trading day
Alpha Factors	108 factors	Formula + GPT-4 description per factor
Transcripts	500 × 16	Quarterly earning calls
Financial Ratios	~5 per call	EPS, P/E, BVPS, etc.

Training: 2020-01-01–2022-06-30; Validation: 2022-07-01–2022-12-31; Test: 2023-01-01–2023-12-31.

2.5 Empirical Results

Stock Movement Prediction

TradExpert-NM (News+Market only) delivers state-of-the-art performance over single-model LLMs on S&P500 binary movement prediction, using Accuracy (Acc) and Matthews Correlation Coefficient (MCC):

Method	Acc ↑	MCC ↑
InternLM-7B	0.60	0.06
Gemini	0.59	0.11
TradExpert-NM	0.64	0.19

Trading Simulation on DOW30

Measured by Annualized Return (AR), Annualized Volatility (AV), Sharpe Ratio (SR), and Max Drawdown (MD):

Method	AR ↑	AV ↓	SR ↑	MD ↓
DeepTrader	32.45%	17.86%	1.82	15.32%
TradExpert	49.79%	9.95%	5.01	6.56%

TradExpert’s MoE substantially improves both return and risk-adjusted performance relative to baseline traditional and deep RL-based strategies.

2.6 Insights, Limitations, and Future Work

Specialization of LLM agents enables explicit division of labor across textual, time-series, symbolic, and numeric modalities. Synthesis in the General Expert allows for nuanced weighting of complementary signals. Reasoning is enabled through chain-of-thought prompting and data reprogramming.

Reported limitations include per-stock latency (~4.7 seconds on Nvidia A5000), unsuitable for high-frequency contexts, and context length constraints due to LLaMA-2 token windows, requiring expert summaries.

Anticipated directions include reduction of latency for higher-frequency trading, expanding to global multi-market universes, and instantiating dedicated learning-based gating networks for adaptive expert weighting.

3. Context, Significance, and Comparative Summary

Both frameworks—TARexp and TradExpert—are exemplars of modular, declarative experiment design in their respective fields: document review (IR/ML) and financial time-series prediction/trading. TARexp advances reproducibility and rapid experimentation for technology-assisted review, lowering implementation barriers with component-based design and immutable experiment logs (Yang et al., 2022). TradExpert applies LLMs to heterogeneous financial data, achieving robust, interpretable synthesis and outperforming both deep LLM and tabular baselines via explicit domain specialization and fusion (Ding et al., 2024).

4. References

TARexp (“TradExpert”): (Yang et al., 2022)
TradExpert (Mixture-of-Expert LLMs for Trading): (Ding et al., 2024)

Markdown Report Issue Upgrade to Chat

References (2)

TARexp: A Python Framework for Technology-Assisted Review Experiments (2022)

TradExpert: Revolutionizing Trading with Mixture of Expert LLMs (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to TradExpert Framework.