Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 99 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 36 tok/s
GPT-5 High 40 tok/s Pro
GPT-4o 99 tok/s
GPT OSS 120B 461 tok/s Pro
Kimi K2 191 tok/s Pro
2000 character limit reached

MARS: A Meta-Adaptive Reinforcement Learning Framework for Risk-Aware Multi-Agent Portfolio Management (2508.01173v1)

Published 2 Aug 2025 in cs.LG and cs.MA

Abstract: Reinforcement Learning (RL) has shown significant promise in automated portfolio management; however, effectively balancing risk and return remains a central challenge, as many models fail to adapt to dynamically changing market conditions. In this paper, we propose Meta-controlled Agents for a Risk-aware System (MARS), a novel RL framework designed to explicitly address this limitation through a multi-agent, risk-aware approach. Instead of a single monolithic model, MARS employs a Heterogeneous Agent Ensemble where each agent possesses a unique, intrinsic risk profile. This profile is enforced by a dedicated Safety-Critic network and a specific risk-tolerance threshold, allowing agents to specialize in behaviors ranging from capital preservation to aggressive growth. To navigate different market regimes, a high-level Meta-Adaptive Controller (MAC) learns to dynamically orchestrate the ensemble. By adjusting its reliance on conservative versus aggressive agents, the MAC effectively lowers portfolio volatility during downturns and seeks higher returns in bull markets, thus minimizing maximum drawdown and enhancing overall stability. This two-tiered structure allows MARS to generate a disciplined and adaptive portfolio that is robust to market fluctuations. The framework achieves a superior balance between risk and return by leveraging behavioral diversity rather than explicit market-feature engineering. Experiments on major international stock indexes, including periods of significant financial crisis, demonstrate the efficacy of our framework on risk-adjusted criteria, significantly reducing maximum drawdown and volatility while maintaining competitive returns.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper presents a novel meta-adaptive DRL framework (MARS) integrating a heterogeneous agent ensemble and a Meta-Adaptive Controller for dynamic risk management.
  • The methodology leverages multiple Safety-Critic Agents with distinct risk profiles, aggregating actions through a risk management overlay to ensure compliance with trading constraints.
  • Experimental results on DJI and HSI data show that MARS outperforms baselines in capital preservation and risk-adjusted returns, validating its adaptive strategy.

MARS: A Meta-Adaptive Reinforcement Learning Framework for Risk-Aware Multi-Agent Portfolio Management

This paper introduces MARS, a meta-learning-controlled multi-agent DRL framework designed for risk-aware portfolio management. MARS addresses the challenges of non-stationarity and risk management in financial markets by employing a heterogeneous ensemble of Safety-Critic Agents, orchestrated by a Meta-Adaptive Controller (MAC). The framework distinguishes itself by integrating risk preferences directly into the agent design, enabling adaptive strategies that balance risk and return across varying market conditions.

Methodology: MARS Framework

The MARS framework (Figure 1) consists of a Heterogeneous Agent Ensemble (HAE) and a Meta-Adaptive Controller (MAC). The HAE comprises multiple Safety-Critic agents, each characterized by a unique risk profile defined by a risk tolerance threshold (θi\theta_i) and a risk aversion penalty (λi\lambda_i). Each agent consists of an Actor, Critic, and SafetyCritic network. The MAC learns to dynamically assign weights to each agent based on the current market state, enabling the framework to adapt its strategy from conservative to aggressive depending on market dynamics. The final action is determined by aggregating the actions of individual agents, weighted by the MAC's output, and subsequently refined by a risk management overlay to ensure compliance with real-world trading constraints. Figure 1

Figure 1: The MARS framework architecture. The system processes the Market State (sts_t) through two parallel components. The Meta-Adaptive Controller (MAC) produces agent weights (wtw_t), while the Heterogeneous Agent Ensemble (HAE) generates proposed actions (atia_t^i). These outputs are aggregated and passed through a Risk Management Overlay to produce the final executed action (AtA'_t).

Heterogeneous Agent Ensemble (HAE)

The HAE is composed of NN distinct agents, each implementing a DDPG architecture extended with a Safety-Critic network. The Actor network maps the state sts_t to a deterministic action atia^i_t and is updated using a policy gradient that includes a Conditional Safety Penalty (CSP). This CSP penalizes the policy when the predicted risk CξiC_{\xi_i} of a proposed action exceeds the agent's risk tolerance θi\theta_i, as calculated by the Safety-Critic network. The Critic network approximates the state-action value function by minimizing the Temporal Difference (TD) error. The Safety-Critic network predicts the extrinsic risk of an action by learning an environment risk function (Cenv\mathcal{C}_{env}) that considers portfolio concentration, leverage, and simulated volatility.

Meta-Adaptive Controller (MAC)

The MAC serves as a high-level orchestrator, learning a meta-policy πω(wtst)\pi_\omega(\mathbf{w}_t | s_t) that dynamically assigns weights to the agents in the HAE based on the current market state sts_t. The controller outputs a vector of logits, which are then passed through a softmax function to generate the weight distribution wt\mathbf{w}_t. The final action AtA_t is an aggregation of the individual agents' proposed actions, weighted by MAC's output. The MAC is trained to maximize a risk-adjusted utility function that balances the mean and standard deviation of the ensemble's predicted Q-values while penalizing the ensemble's predicted risk.

Trading Procedure and Risk Management

At each time step, the agents propose individual actions, and the MAC generates corresponding agent weights. The final system action is computed as a weighted average of the proposed actions, which is then refined by a risk management overlay. This overlay enforces rules such as limits on position concentration, maintenance of a cash buffer, and a ban on short-selling, ensuring that all actions comply with institutional standards.

Experimental Results

The MARS framework was evaluated on historical daily data from the Dow Jones Industrial Average (DJI) and the Hang Seng Index (HSI), comparing its performance against a passive investment strategy and state-of-the-art DRL models such as DeepTrader, HRPM, and AlphaStock.

Performance Across Diverse Markets

In the Dow Jones Industrial Average (DJIA) environments, MARS consistently delivered strong performance (Figure 2 and Figure 3). During the challenging 2022 bear market, it demonstrated superior performance across all metrics, achieving the lowest loss (CR -0.86\%) and the best maximum drawdown (-16.77\%). During the more favorable 2024 bull market, MARS maintained its dominance, achieving the highest Cumulative Return (29.50\%), Annual Return (31.19\%), Sharpe Ratio (2.84), and the lowest Maximum Drawdown (-5.39\%). Figure 2

Figure 2: Performance comparison on the DJI 2022 dataset. MARS (red) shows superior capital preservation with a significantly shallower drawdown compared to baselines.

Figure 3

Figure 3: Performance comparison on the DJI 2024 dataset. MARS (red) achieves the highest return while maintaining a competitive drawdown profile.

In the 2022 bear market for the Hang Seng Index (HSI), MARS again excelled in capital preservation, securing the best Cumulative Return (-14.50\%), lowest volatility (22.56\%), and smallest Maximum Drawdown (-32.72%). In the 2024 bull market, MARS achieved the highest Sharpe Ratio across all models, demonstrating the most effective balance between risk and reward.

Ablation Studies

Ablation studies (Figure 4) were conducted to assess the necessity of MARS's core architectural components. The MARS-Static variant, which removes MAC’s dynamic agent weighting, performed markedly worse than the full MARS framework. The MARS-Homogeneous variant, which removes agent heterogeneity, underperformed the full model, underscoring the importance of diverse risk profiles in the HAE for effectively managing downside risk. Varying the number of agents from 10 to 5 (MARS-Div5) and 15 (MARS-Div15) indicated that an ensemble of 10 agents offers the best balance between strategic diversity and model complexity. Figure 4

Figure 4: Ablation paper performance on the DJI 2024 dataset. The main MARS model (red) outperforms all variants, validating its architectural components.

Analysis of Adaptive Strategy

Analysis of the Meta-Adaptive Controller’s (MAC) behavior under contrasting market conditions (Figure 5) revealed two distinct meta-strategies. In the turbulent 2022 bear market, the MAC adopted a highly reactive, defensive posture. In contrast, during the 2024 bull market, the MAC settled into a remarkably stable and confident meta-strategy. Figure 5

Figure 5: Comparison of agent allocation strategies under Meta-Adaptive Controller (MAC) during the 2022 bear market (top) and the 2024 bull market (bottom) for DJI portfolio.

Conclusion

The MARS framework demonstrates a robust and effective solution for risk-aware portfolio management. The two-tier architecture, comprising a Heterogeneous Agent Ensemble (HAE) and a high-level Meta-Adaptive Controller (MAC), enables MARS to leverage behavioral diversity to navigate changing market conditions. Experimental results confirm that both the MAC and the heterogeneity of the agent ensemble are critical to the framework's success.

Youtube Logo Streamline Icon: https://streamlinehq.com