Stationary Activations for Uncertainty Calibration in Deep Learning (2010.09494v1)

Published 19 Oct 2020 in cs.LG

Abstract: We introduce a new family of non-linear neural network activation functions that mimic the properties induced by the widely-used Mat\'ern family of kernels in Gaussian process (GP) models. This class spans a range of locally stationary models of various degrees of mean-square differentiability. We show an explicit link to the corresponding GP models in the case that the network consists of one infinitely wide hidden layer. In the limit of infinite smoothness the Mat\'ern family results in the RBF kernel, and in this case we recover RBF activations. Mat\'ern activation functions result in similar appealing properties to their counterparts in GP models, and we demonstrate that the local stationarity property together with limited mean-square differentiability shows both good performance and uncertainty calibration in Bayesian deep learning tasks. In particular, local stationarity helps calibrate out-of-distribution (OOD) uncertainty. We demonstrate these properties on classification and regression benchmarks and a radar emitter classification task.

Citations (19)

View on Semantic Scholar

Summary

Stationary Activations for Uncertainty Calibration in Deep Learning

The paper introduces an innovative class of non-linear activation functions designed to enhance uncertainty calibration in deep learning models, specifically within Bayesian frameworks. These functions draw inspiration from the Matérn family of kernels traditionally utilized in Gaussian process (GP) models. By integrating properties of local stationarity and varying degrees of mean-square differentiability, the paper addresses a pivotal need within Bayesian deep learning: improved uncertainty quantification. Furthermore, the authors explore how these new activation functions can effectively calibrate out-of-distribution uncertainties in diverse tasks, including classification, regression, and radar emitter identification.

Linking Neural Networks with Gaussian Processes

The paper begins by revisiting a foundational idea: neural networks with infinitely wide hidden layers converge to Gaussian processes under certain assumptions. Within this context, activation functions take on a kernel-like role. Prior research has established this equivalence for several common activations such as ReLU, step functions, and the exponential function. This paper extends the paradigm by introducing Matérn-inspired activations that are mapped to Gaussian process kernels. As neural network layers approach infinite width, the Matérn class offers a bridge to understanding the probabilistic behavior of these networks.

Matérn Activation Functions

In the pursuit of designing activation functions that emulate GP kernels, the Matérn class emerges as a versatile candidate. The authors construct these functions to provide a spectrum of locally stationary models that are adaptable to different levels of smoothness. With local stationarity as a defining characteristic, the activations mitigate overconfidence in network predictions, particularly for inputs that lack training data support. Theoretical insights and numerical evidence support their efficacy in maintaining calibration across a range of distributions.

Empirical Evaluation

Empirical validation forms a core component of the paper. Benchmark comparisons highlight the performance of Matérn activations in standard and radar emitter classification tasks. Bayesian neural networks utilizing these activations demonstrate superior uncertainty calibration relative to traditional methods. This suggests practical improvements in tasks sensitive to precise uncertainty measures and prediction reliability.

Implications and Future Directions

The implications of this research are multi-faceted. On a practical level, these activation functions promise enhancements in fields where uncertainty management is critical, such as autonomous systems and financial forecasting. Theoretically, the work prompts reconsideration of neural network design in light of GP approximations, encouraging an overview of probabilistic models and deep learning architectures.

As future developments, this approach could spur further exploration of kernel-inspired neural networks and their applications beyond uncertainty calibration. Adaptations for different types of kernels and their influence on model interpretability represent intriguing avenues for research. Additionally, as computing capacity advances, the investigation of infinitely wide neural networks may yield novel insights.

In conclusion, the paper provides a mathematical and empirical foundation for using stationary Matérn-like activations in enhancing the uncertainty calibration capabilities of deep learning models, fostering a deeper integration between neural network architectures and Gaussian process principles.

Related Papers

GitHub

GitHub - AaltoML/stationary-activations: Codes for 'Stationary Activations for Uncertainty Calibration in Deep Learning' (NeurIPS 2020) (11 stars)

YouTube

Show All Videos