Generalized Mean Absolute Directional Loss (GMADL)

Updated 5 September 2025

GMADL is a differentiable loss function that extends mean absolute deviation concepts by incorporating directional and magnitude considerations for risk-sensitive applications.
It replaces non-differentiable sign functions with a smooth sigmoid and employs parametric smoothing to optimize directional losses in regression tasks.
Empirical studies in high-frequency trading show GMADL yields superior risk-weighted returns, reduced drawdowns, and improved optimization stability.

The Generalized Mean Absolute Directional Loss (GMADL) is a differentiable loss function designed for machine learning estimation tasks where directional accuracy and magnitude-weighted assessment are critical to application efficacy—most notably in algorithmic trading with high-frequency financial data. The GMADL formalism extends core ideas from classical mean absolute deviations about the mean by generalizing across directions and introducing parametric smoothing, allowing for robust and interpretable learning objectives closely aligned with real-world outcomes such as risk-weighted returns and transaction cost mitigation.

1. Mathematical Definition and Differentiability

GMADL is formalized as follows:

$\mathrm{GMADL} = \frac{1}{N} \sum_{i=1}^{N} \left[ -\left(\frac{1}{1 + \exp(-a R_i \hat{R}_i)} - 0.5\right) |R_i|^{b} \right]$

where $R_i$ is the observed return at the $i$ -th interval, $\hat{R}_i$ is the model's predicted return, $N$ denotes the sample size, $a$ is a slope parameter controlling the sigmoid sharpness (and hence gradient sensitivity around the zero-crossing), and $b$ is a magnitude exponent rewarding larger true returns.

By replacing the non-differentiable sign function present in the earlier Mean Absolute Directional Loss (MADL) with a smooth sigmoid, GMADL is fully differentiable everywhere. This property makes GMADL amenable to contemporary gradient-based optimization routines required for training architectures such as Transformers, LSTMs, and Informer-based models in financial time series and other regression tasks (Michańków et al., 24 Dec 2024, Stefaniuk et al., 23 Mar 2025).

2. Directional Optimization and Connection to Cut Norms

GMADL's technical novelty is rooted in the broader concept of directional loss optimization over functional partitions. The relationship between mean absolute deviations about the mean ( $d$ ) and optimized gains over directions is illustrated by:

$d = \frac{1}{n} \sum_{i=1}^n |y_i - \bar{y}| = \max_{u \in \{-1, +1\}^n} \frac{x^T u}{n},\quad x_i = y_i - \bar{y}$

In this context, the optimal direction $u$ partitions the data into maximizing the cut norm, and $d$ equals twice its value (Vartan et al., 2020). GMADL generalizes this concept by smoothing the partition via continuous parameterization and directionally weighted aggregation, moving beyond binary sign vectors to soft differentiable partitions. The result is a robust, gain-maximizing aggregation that balances magnitude and sign over the data, critical in multidimensional applications and specifically in algorithmic asset allocation strategies.

3. Robustness to Outliers, Overfitting, and Transaction Costs

GMADL inherits the robustness features of $d$ and MAE-type losses. In taxicab correspondence analysis and high-frequency algorithmic trading, this robustness is numerically evident in the bounded “relative contribution”:

$RC_d(y_i) = \frac{|y_i - \bar{y}|}{n d},\quad 0 \leq RC_d(y_i) \leq 0.5$

No single error or asset movement can dominate the aggregate statistic, curbing the effect of outliers. By further raising $|R_i|$ to the power $b$ in GMADL and modulating with parameter $a$ , the function can penalize overtrading (reducing transaction cost impact) and encourage fewer, higher impact signals (Michańków et al., 24 Dec 2024). The model learns to avoid frequent trading in periods of low return, a property especially beneficial for high-frequency trading environments where transaction costs threaten profitability.

4. Error Decomposition and Generalization Bounds

The transition from MAE/MADL to GMADL maintains compatibility with theoretical error decomposition frameworks. The overall error in deep vector-to-vector regression tasks can be decomposed as:

$\mathcal{L}(\hat{f}_v) \leq \inf_{f_v \in \mathcal{F}} \mathcal{L}(f_v) + \text{Estimation} + \text{Optimization}$

Estimation error (e.g., via the Rademacher complexity) benefits from the Lipschitz continuity retained under reasonable choices of $a$ and $b$ . The optimization error is reduced compared to non-differentiable directional losses, allowing for smooth and efficient stochastic gradient descent or Adam-based updates (Qi et al., 2020).

A plausible implication is that GMADL can facilitate high generalization performance without sacrificing optimization tractability or necessitating extensive hyperparameter regularization.

5. Empirical Performance in Financial Time Series Forecasting

Multiple studies demonstrate that GMADL-trained models produce superior risk-weighted returns compared to benchmarks (buy-and-hold, technical indicators, and RMSE/quantile loss-trained models) (Stefaniuk et al., 23 Mar 2025). Notably:

For high-frequency Bitcoin data, Informer models using GMADL outperformed all alternatives with respect to annualized returns, reduced drawdowns, and improved information ratios.
The performance advantage grows with higher sampling frequency; RMSE-based models degrade as the time scale tightens, while GMADL benefits from increasing trading opportunities.

Such empirical findings confirm that the focus on directional correctness and magnitude weighting is not only theoretically sound but also practically crucial for real-world financial decision systems (Michańków et al., 24 Dec 2024, Michańków et al., 22 Jul 2025).

6. Implementation Practices and Scalability

GMADL has been integrated into state-of-the-art deep learning frameworks, supporting LSTM, Transformer, Informer, and hybrid architectures. Hyperparameter tuning for $a$ and $b$ is typically performed via grid search or walk-forward optimization, with architecture testing and cross-validation. The flexibility in parameterization allows adaptation to daily, hourly, or tick-level trading, addressing strategy design under varied market regimes (Michańków et al., 24 Dec 2024).

The differentiable formulation also supports backpropagation, enabling stable convergence even in nonconvex settings where vanilla MADL would encounter optimization challenges due to the presence of sign discontinuities.

7. Future Applications and Generalization

The parametric flexibility and robustness of GMADL imply broader utility beyond algorithmic trading. Potential applications include portfolio optimization, derivatives pricing, and any domain where outcome-driven performance evaluation must combine magnitude and directional factors. The prospects for dynamic adjustment of $a$ / $b$ (possibly via meta-learning or reinforcement mechanisms) permit further adaptation to regime shifts or multidimensional optimization contexts (e.g., balancing Sharpe ratio with drawdown constraints).

Research directions include integrating GMADL with dynamic transaction cost models, extending to constraint-aware optimization, or fusing with ensemble and Bayesian methods for improved uncertainty quantification.

GMADL represents an evolution in loss function design where differentiable, direction- and magnitude-sensitive optimization frameworks can be directly aligned with practical performance metrics. Its formal construction and empirical outcomes indicate its relevance not only within high-frequency trading but wherever directional and scale robustness are necessary for interpretable and effective regression model deployment.