Expert Investment Teams via Multi-Agent LLMs with Fine-Grained Trading Tasks
This presentation explores a groundbreaking multi-agent LLM system for equity trading that achieves superior risk-adjusted returns through fine-grained task decomposition. Unlike traditional coarse-grained approaches, this architecture assigns concrete, low-level financial tasks to specialist agents—Quantitative, Technical, News, and Qualitative—whose outputs flow through sector and macro layers to a Portfolio Manager. Rigorous backtesting on Japan's TOPIX 100 demonstrates statistically significant performance gains, with technical analysis emerging as the primary driver. The work establishes that explicit task engineering, not just role diversity, unlocks both alpha generation and interpretability in LLM-based finance systems.Script
A trading system powered by language models just outperformed traditional approaches by engineering tasks, not just roles. The secret lies in how you ask the AI to think, not how many agents you deploy.
The researchers decompose investment analysis into precise, low-level tasks mirroring how professionals actually work. Quantitative and Technical agents receive structured indicators like normalized momentum and oscillators. News and Qualitative agents extract risk catalysts from unstructured data. This granularity isn't cosmetic; it operationalizes reasoning at the level where financial edge lives.
Let's see how these agents coordinate to build portfolios.
Four specialist agents process company-level signals. Their outputs are aligned by sector, benchmarked, and adjusted by a Sector Agent. A Macro Agent independently handles economic conditions. The Portfolio Manager synthesizes everything to allocate long and short positions across the TOPIX 100 universe. Strict look-ahead control prevents data leakage, and GPT-4o powers all reasoning. The hierarchy mirrors institutional investment committee structure, but with task-level precision baked into every prompt.
The difference is night and day. Coarse-grained prompts hand the agent raw data and vague instructions, yielding superficial language. Fine-grained prompts deliver computed indicators and explicit tasks, producing reasoning grounded in financial semantics. Vector similarity analysis confirms fine-grained outputs align better across agent layers, meaning the hierarchy actually propagates signal instead of noise.
The architecture stands or falls on out-of-sample performance.
Rigorous backtesting reveals fine-grained task agents deliver persistent, statistically significant Sharpe ratio gains. The Technical Agent is the star: leave-one-out ablation shows removing it destroys risk-adjusted returns, while dropping Quantitative, Qualitative, Macro, or News agents often improves results. This means effective signal transmission is task-dependent, not a simple function of agent diversity. More agents don't help if they inject redundancy or noise.
The system isn't just academically interesting. Blending the agent composite with the market index via risk parity allocation beats either alone, net of costs. Low correlation means real diversification. Crucially, the hierarchical design admits full interpretability: every decision can be traced through specialist rationale, sector adjustment, and macro overlay. That auditability is non-negotiable for institutional deployment.
The lesson is architectural. Gains come from injecting robust, expert-designed feature engineering into prompts, not from stacking more agents or abstractions. Future work should address temporal generalization limits and test the framework across markets and models. But the principle is established: if you want language models to trade, teach them the tasks traders actually perform.
Fine-grained task decomposition turns language models into systematic alpha engines. Visit EmergentMind.com to explore more research and create your own video presentations.