- The paper presents a novel multi-agent framework that automates joint optimization of factors and models to enhance quantitative strategy performance.
- It employs iterative modules for hypothesis generation, implementation, and validation, achieving superior predictive metrics and robust trading outcomes.
- The framework demonstrates cost-effective, transparent R&D automation with strong empirical results, paving the way for scalable applications in quantitative finance.
Financial markets present significant challenges for asset return prediction due to their complex, high-dimensional, non-stationary, and volatile nature. Traditional quantitative research pipelines suffer from limitations in automation, interpretability, and coordinated optimization across key components like factor mining and model innovation. Existing methods, while advancing, often require extensive human intervention, lack transparency, or operate in silos, hindering rapid adaptation to dynamic market conditions.
The paper "R&D-Agent-Quant: A Multi-Agent Framework for Data-Centric Factors and Model Joint Optimization" (2505.15155) proposes RD-Agent(Q), a novel data-centric multi-agent framework designed to automate the full-stack research and development of quantitative strategies. It tackles the limitations of existing approaches by implementing a coordinated factor-model co-optimization loop.
RD-Agent(Q) decomposes the quantitative research process into five interconnected units operating in a continuous iterative cycle:
- Specification Unit: This unit dynamically configures the task context and constraints for downstream modules, such as data schemas, execution environments, and output formats. It formalizes the requirements for both factors and models, ensuring consistency and reproducibility across the research pipeline. This can be conceptualized as defining a tuple S=(B,D,F,M) encoding background knowledge, data interfaces, output formats, and the execution environment.
- Synthesis Unit: Simulating human reasoning, this unit generates new hypotheses for factor mining or model innovation based on historical experimental outcomes. It maintains a set of State-of-the-Art (SOTA) solutions and uses a generative mapping function G to produce new hypotheses h(t+1) from historical hypothesis (Ht(a)) and feedback (Ft(a)) subsets. The process adaptively explores diverse ideas while refining promising ones, creating an "idea forest". Hypotheses are then mapped to concrete, executable tasks.
- Implementation Unit: This unit translates the tasks generated by the Synthesis Unit into functional code using a specialized agent called Co-STEER. Co-STEER is designed for data-centric tasks and incorporates systematic scheduling and code-generation strategies.
- Co-STEER Mechanism: For factor development, it constructs a Directed Acyclic Graph (DAG) to manage task dependencies and uses adaptive task complexity scores (αj) for scheduling. The code generation itself is feedback-driven; Co-STEER maintains a knowledge base K of past task-code-feedback triples and retrieves solutions for similar tasks to improve efficiency and success rate. The agent iteratively refines code based on execution feedback.
- Validation Unit: This unit evaluates the practical effectiveness of implemented factors and models. For factors, it first performs a de-duplication process using cross-sectional Information Coefficient (IC) to filter out redundant signals correlated with existing SOTA factors. Remaining candidates, or candidate models, are then evaluated through real-market backtests using the Qlib platform, assessing performance under realistic trading conditions.
- Analysis Unit: This unit serves as both an evaluator and analyst. After each experiment, it assesses the hypothesis, tasks, and results, comparing them against SOTA metrics. If an experiment surpasses the SOTA for its action type (factor or model), the new result is added to the SOTA set. The unit diagnoses failure strategies and generates targeted feedback (ft) for the Synthesis Unit to guide future hypothesis generation. Crucially, the Analysis Unit also determines the next optimization direction (factor refinement or model optimization) using a contextual two-armed bandit algorithm based on Linear Thompson Sampling. It uses an 8-dimensional performance state vector xt encoding key metrics to adaptively balance exploration and exploitation between the two optimization paths.
The framework establishes a closed hypothesis–implementation–validation–feedback loop, supporting continuous, goal-directed evolution of quantitative strategies. Outputs are persistently stored, enabling cumulative knowledge growth.
The paper highlights several key contributions:
- End-to-end automation with transparency: Automates the entire R&D process for quant strategies, producing verifiable code outputs that enhance interpretability and reduce hallucination risks compared to end-to-end LLM-based trading signals.
- High-performance R&D tools: Introduces a structured knowledge forest for hypothesis generation (mimicking analyst workflows) and Co-STEER, a knowledge-evolving agent tailored for data-centric code generation tasks in quantitative finance.
- Strong empirical performance: Achieves significantly higher annualized returns (ARR) than classical factor libraries with fewer factors and outperforms state-of-the-art deep time-series models on real market data under smaller resource budgets. Joint factor-model optimization balances predictive accuracy and strategy robustness.
The experimental evaluation is conducted on the CSI 300 dataset, split into training (2008-2014), validation (2015-2016), and testing (2017-2020) periods. The paper compares RD-Agent(Q) against a wide range of baselines, including traditional factor libraries (Alpha 101, 158, 360, AutoAlpha) and various machine learning and deep learning models (Linear, MLP, LightGBM, XGBoost, CatBoost, DoubleEnsemble, Transformer, GRU, LSTM, ALSTM, GATs, PatchTST, iTransformer, Mamba, TRA, MASTER). Evaluation metrics cover both factor predictive power (IC, ICIR, Rank IC, Rank ICIR) and strategy performance (ARR, IR, MDD, CR), based on a simulated daily long-short trading strategy with realistic transaction costs.
Key experimental findings include:
- Factor Optimization (RD-Factor): Outperforms static factor libraries in ARR and IC/ICIR, demonstrating that dynamic hypothesis refinement and factor screening yield more informative signals.
- Model Optimization (RD-Model): Achieves better Rank IC, MDD, and IR than baseline models, highlighting the benefit of adaptive model configuration guided by automated hypothesis evaluation, particularly compared to generic time-series models.
- Joint Optimization (RD-Agent(Q)): Achieves the highest overall performance across metrics (IC, ARR, IR), demonstrating that co-optimizing factors and models unlocks complementary improvements superior to optimizing either component in isolation.
- Research Dynamics: Analysis of factor hypotheses shows a pattern of local refinement, strategic revisitation, and exploration across diverse conceptual clusters, leading to compact, diverse, and high-performing factor libraries.
- Development Dynamics: Co-STEER exhibits efficient self-correction in code generation, with success rates converging quickly through iterative refinement, especially for complex tasks.
- Ablation Study: The contextual bandit scheduler significantly outperforms random and LLM-based scheduling strategies, confirming its effectiveness in prioritizing promising optimization targets under limited computational budgets.
- Backend Comparisons: The framework demonstrates robustness across different LLM backends, though performance can vary.
- Cost Efficiency: The framework is shown to be cost-effective, with total API costs remaining under \$10 for the experimental setup.
- Real-World Validation: The framework demonstrated adaptability and strong performance in a real-world quantitative competition (Optiver Realized Volatility Prediction on Kaggle), successfully identifying effective factors based on temporal bid-ask spread dynamics.
The paper positions RD-Agent(Q) as a significant advancement over traditional siloed quantitative pipelines and existing LLM agents in finance, which often lack transparency, interpretability, or mechanisms for joint factor-model optimization.
Despite its strengths, the authors identify limitations and suggest future work, including integrating more diverse multimodal data, incorporating structured domain expertise using techniques like Retrieval-Augmented Generation (RAG), and developing mechanisms for real-time market adaptation.
The broader impacts highlight the framework's potential for generalizable R&D automation beyond finance, its ability to produce reproducible and deployable code outputs, and its contribution toward a new paradigm for interpretable and adaptive financial AI systems. A disclaimer is included emphasizing the research nature of the framework and the necessity of rigorous validation before real-world deployment.