- The paper establishes MAE's Lipschitz continuity and derives an upper bound on the empirical Rademacher complexity for DNN regression.
- It demonstrates that DNNs trained with MAE are more robust against noise compared to those using MSE.
- Empirical experiments in speech enhancement show that MAE leads to lower regression errors and enhanced perceptual quality scores.
Evaluation of Mean Absolute Error as a Loss Function in Deep Neural Network-Based Vector-to-Vector Regression
The paper under review explores the use of Mean Absolute Error (MAE) as a loss function in the context of Deep Neural Networks (DNNs) for vector-to-vector regression. It aims to establish the theoretical underpinnings and practical advantages of MAE over the more commonly used Mean Squared Error (MSE) in this field. This investigation is significant given the growing application of DNNs in large-scale regression tasks, such as speech enhancement, where the precision of regression models is critical.
Theoretical Insights
Two main theoretical contributions are presented. First, the paper examines the Lipschitz continuity of MAE, a mathematical property that ensures that changes in input lead to proportionate changes in output. Establishing that MAE maintains this property is crucial as it facilitates the derivation of an upper bound on the empirical Rademacher complexity. This is pivotal for understanding the generalization capabilities of DNN-based regression models. In contrast, the paper demonstrates that MSE lacks Lipschitz continuity, which poses limitations on its utility as a loss function in promoting robust generalization.
Secondly, the paper examines the robustness of DNNs against additive noise when trained with MAE, introducing a generalized upper bound on regression errors. This bound highlights the efficacy of MAE in maintaining model performance under noisy conditions, attributable to the intrinsic characteristics of the Laplacian distribution, which MAE implicitly models. This contrasts with the Gaussian distribution assumed by MSE, providing a fresh perspective on how error modeling can impact regression performance.
Empirical Validation
To substantiate the theoretical claims, the paper presents a series of speech enhancement experiments using the Edinburgh noisy speech corpus. The empirical results underscore the advantages of MAE over MSE across various noisy scenarios, with MAE consistently yielding lower regression error values and higher perceptual speech quality scores. Specifically, DNNs trained using MAE outperform those using MSE in terms of both MAE and MSE metrics, as well as PESQ and STOI scores. These results empirically affirm that MAE offers superior robustness and generalization, aligning with the theoretical insights regarding its distributional assumptions.
Implications and Future Directions
This work suggests that MAE should be considered a compelling alternative to MSE for loss function selection in vector-to-vector regression tasks, especially under conditions involving significant noise. The connection between MAE and the Laplacian distribution offers intriguing avenues for further exploration. Future research could extend these findings by applying the MAE framework to other domains where regression is critical and investigating the integration of other statistical assumptions to further enhance model robustness and performance.
In summary, the investigation presented in this paper enriches the understanding of loss functions in DNN-based regression, providing both a theoretical and empirical basis for the adoption of MAE over MSE in overcoming limitations associated with traditional loss functions. The demonstrated advantages in the context of speech enhancement suggest broader applicability across various machine learning and signal processing domains.