Derivative-based regularization for regression (2405.00555v1)
Abstract: In this work, we introduce a novel approach to regularization in multivariable regression problems. Our regularizer, called DLoss, penalises differences between the model's derivatives and derivatives of the data generating function as estimated from the training data. We call these estimated derivatives data derivatives. The goal of our method is to align the model to the data, not only in terms of target values but also in terms of the derivatives involved. To estimate data derivatives, we select (from the training data) 2-tuples of input-value pairs, using either nearest neighbour or random, selection. On synthetic and real datasets, we evaluate the effectiveness of adding DLoss, with different weights, to the standard mean squared error loss. The experimental results show that with DLoss (using nearest neighbour selection) we obtain, on average, the best rank with respect to MSE on validation data sets, compared to no regularization, L2 regularization, and Dropout.
- Avrutskiy, V. I. (2017). Backpropagation generalized for output derivatives. CoRR, abs/1712.04185.
- Avrutskiy, V. I. (2021). Enhancing function approximation abilities of neural networks by training derivatives. IEEE Transactions on Neural Networks and Learning Systems, 32(2):916–924.
- Understanding dropout. Advances in neural information processing systems, 26.
- An overview on deep learning-based approximation methods for partial differential equations. Discrete & Continuous Dynamical Systems-Series B, 28(6).
- Bishop, C. M. (1995). Neural networks for pattern recognition. Clarendon Press, Oxford.
- Neural ordinary differential equations. In Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc.
- Scientific machine learning through physics–informed neural networks: Where we are and what’s next. Journal of Scientific Computing, 92(3):88.
- Fukushima, K. (1975). Cognitron: A self-organizing multilayer neural network. Biological Cybernetics, 20:121–136.
- Deep learning. MIT press.
- The Elements of Statistical Learning. Springer Series in Statistics. Springer New York Inc., New York, NY, USA.
- Improving neural networks by preventing co-adaptation of feature detectors.
- Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural Networks, 3(5):551–560.
- Kidger, P. (2022). On neural differential equations. CoRR, abs/2202.02435.
- Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR), San Diega, CA, USA.
- Regularization for deep learning: A taxonomy. CoRR, abs/1710.10686.
- Neural algorithm for solving differential equations. Journal of Computational Physics, 91(1):110–131.
- Numerical solution for high order differential equations using a hybrid neural network—optimization method. Applied Mathematics and Computation, 183(1):260–271.
- Solving differential equations with unsupervised neural networks. Chemical Engineering and Processing: Process Intensification, 42(8-9):715–721.
- Deep learning driven self-adaptive hp finite element method. In International Conference on Computational Science, pages 114–121. Springer.
- Experiments on learning by back propagation. Technical report, Carnegie-Mellon University.
- Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational physics, 378:686–707.
- Rennie, J. (2003). On l2-norm regularization and the Gaussian prior.
- Introduction to Differential Calculus. Wiley.
- An energy approach to the solution of partial differential equations in computational mechanics via machine learning: Concepts, implementation and applications. Computer Methods in Applied Mechanics and Engineering, 362:112790.
- Tangent prop - A formalism for specifying selected invariances in an adaptive network. In Moody, J. E., Hanson, S. J., and Lippmann, R., editors, Advances in Neural Information Processing Systems 4, [NIPS Conference, Denver, Colorado, USA, December 2-5, 1991], pages 895–903. Morgan Kaufmann.
- Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(56):1929–1958.
- A brief review of nearest neighbor algorithm for learning and classification. In 2019 International Conference on Intelligent Computing and Control Systems (ICCS), pages 1255–1260.
- Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288.
- Tikhonov, A. N. (1943). On the stability of inverse problems. Proceedings of the USSR Academy of Sciences, 39:195–198.
- Neural stochastic differential equations: Deep latent gaussian models in the diffusion limit. arXiv preprint arXiv:1905.09883.
- Williams, P. M. (1995). Bayesian regularization and pruning using a laplace prior. Neural computation, 7(1):117–143.