- The paper presents a novel meta-adaptive DRL framework (MARS) integrating a heterogeneous agent ensemble and a Meta-Adaptive Controller for dynamic risk management.
- The methodology leverages multiple Safety-Critic Agents with distinct risk profiles, aggregating actions through a risk management overlay to ensure compliance with trading constraints.
- Experimental results on DJI and HSI data show that MARS outperforms baselines in capital preservation and risk-adjusted returns, validating its adaptive strategy.
This paper introduces MARS, a meta-learning-controlled multi-agent DRL framework designed for risk-aware portfolio management. MARS addresses the challenges of non-stationarity and risk management in financial markets by employing a heterogeneous ensemble of Safety-Critic Agents, orchestrated by a Meta-Adaptive Controller (MAC). The framework distinguishes itself by integrating risk preferences directly into the agent design, enabling adaptive strategies that balance risk and return across varying market conditions.
Methodology: MARS Framework
The MARS framework (Figure 1) consists of a Heterogeneous Agent Ensemble (HAE) and a Meta-Adaptive Controller (MAC). The HAE comprises multiple Safety-Critic agents, each characterized by a unique risk profile defined by a risk tolerance threshold (θi) and a risk aversion penalty (λi). Each agent consists of an Actor, Critic, and SafetyCritic network. The MAC learns to dynamically assign weights to each agent based on the current market state, enabling the framework to adapt its strategy from conservative to aggressive depending on market dynamics. The final action is determined by aggregating the actions of individual agents, weighted by the MAC's output, and subsequently refined by a risk management overlay to ensure compliance with real-world trading constraints.
Figure 1: The MARS framework architecture. The system processes the Market State (st) through two parallel components. The Meta-Adaptive Controller (MAC) produces agent weights (wt), while the Heterogeneous Agent Ensemble (HAE) generates proposed actions (ati). These outputs are aggregated and passed through a Risk Management Overlay to produce the final executed action (At′).
Heterogeneous Agent Ensemble (HAE)
The HAE is composed of N distinct agents, each implementing a DDPG architecture extended with a Safety-Critic network. The Actor network maps the state st to a deterministic action ati and is updated using a policy gradient that includes a Conditional Safety Penalty (CSP). This CSP penalizes the policy when the predicted risk Cξi of a proposed action exceeds the agent's risk tolerance θi, as calculated by the Safety-Critic network. The Critic network approximates the state-action value function by minimizing the Temporal Difference (TD) error. The Safety-Critic network predicts the extrinsic risk of an action by learning an environment risk function (Cenv) that considers portfolio concentration, leverage, and simulated volatility.
The MAC serves as a high-level orchestrator, learning a meta-policy πω(wt∣st) that dynamically assigns weights to the agents in the HAE based on the current market state st. The controller outputs a vector of logits, which are then passed through a softmax function to generate the weight distribution wt. The final action At is an aggregation of the individual agents' proposed actions, weighted by MAC's output. The MAC is trained to maximize a risk-adjusted utility function that balances the mean and standard deviation of the ensemble's predicted Q-values while penalizing the ensemble's predicted risk.
Trading Procedure and Risk Management
At each time step, the agents propose individual actions, and the MAC generates corresponding agent weights. The final system action is computed as a weighted average of the proposed actions, which is then refined by a risk management overlay. This overlay enforces rules such as limits on position concentration, maintenance of a cash buffer, and a ban on short-selling, ensuring that all actions comply with institutional standards.
Experimental Results
The MARS framework was evaluated on historical daily data from the Dow Jones Industrial Average (DJI) and the Hang Seng Index (HSI), comparing its performance against a passive investment strategy and state-of-the-art DRL models such as DeepTrader, HRPM, and AlphaStock.
In the Dow Jones Industrial Average (DJIA) environments, MARS consistently delivered strong performance (Figure 2 and Figure 3). During the challenging 2022 bear market, it demonstrated superior performance across all metrics, achieving the lowest loss (CR -0.86\%) and the best maximum drawdown (-16.77\%). During the more favorable 2024 bull market, MARS maintained its dominance, achieving the highest Cumulative Return (29.50\%), Annual Return (31.19\%), Sharpe Ratio (2.84), and the lowest Maximum Drawdown (-5.39\%).
Figure 2: Performance comparison on the DJI 2022 dataset. MARS (red) shows superior capital preservation with a significantly shallower drawdown compared to baselines.
Figure 3: Performance comparison on the DJI 2024 dataset. MARS (red) achieves the highest return while maintaining a competitive drawdown profile.
In the 2022 bear market for the Hang Seng Index (HSI), MARS again excelled in capital preservation, securing the best Cumulative Return (-14.50\%), lowest volatility (22.56\%), and smallest Maximum Drawdown (-32.72%). In the 2024 bull market, MARS achieved the highest Sharpe Ratio across all models, demonstrating the most effective balance between risk and reward.
Ablation Studies
Ablation studies (Figure 4) were conducted to assess the necessity of MARS's core architectural components. The MARS-Static variant, which removes MAC’s dynamic agent weighting, performed markedly worse than the full MARS framework. The MARS-Homogeneous variant, which removes agent heterogeneity, underperformed the full model, underscoring the importance of diverse risk profiles in the HAE for effectively managing downside risk. Varying the number of agents from 10 to 5 (MARS-Div5) and 15 (MARS-Div15) indicated that an ensemble of 10 agents offers the best balance between strategic diversity and model complexity.
Figure 4: Ablation paper performance on the DJI 2024 dataset. The main MARS model (red) outperforms all variants, validating its architectural components.
Analysis of Adaptive Strategy
Analysis of the Meta-Adaptive Controller’s (MAC) behavior under contrasting market conditions (Figure 5) revealed two distinct meta-strategies. In the turbulent 2022 bear market, the MAC adopted a highly reactive, defensive posture. In contrast, during the 2024 bull market, the MAC settled into a remarkably stable and confident meta-strategy.
Figure 5: Comparison of agent allocation strategies under Meta-Adaptive Controller (MAC) during the 2022 bear market (top) and the 2024 bull market (bottom) for DJI portfolio.
Conclusion
The MARS framework demonstrates a robust and effective solution for risk-aware portfolio management. The two-tier architecture, comprising a Heterogeneous Agent Ensemble (HAE) and a high-level Meta-Adaptive Controller (MAC), enables MARS to leverage behavioral diversity to navigate changing market conditions. Experimental results confirm that both the MAC and the heterogeneity of the agent ensemble are critical to the framework's success.