- The paper introduces novel GPU algorithms to compute logarithms of modified Bessel functions, overcoming underflow and overflow issues in traditional methods.
- The methods leverage asymptotic expressions and recurrence relations to achieve machine precision, with speed improvements up to 6150x over existing libraries.
- Empirical validation using high-dimensional CIFAR10 neural features demonstrates robust performance in fitting von Mises-Fisher distributions.
Accurate Computation of the Logarithm of Modified Bessel Functions on GPUs
The paper "Accurate Computation of the Logarithm of Modified Bessel Functions on GPUs" by Andreas Plesner, Hans Henrik Brandenborg Sørensen, and Søren Hauberg addresses critical limitations in the existing numerical implementations of modified Bessel functions, particularly when computed on GPUs. Modified Bessel functions, specifically those of the first and second kinds, denoted as Iv(x) and Kv(x), are prevalent in scientific computing applications ranging from machine learning to physics. Current libraries are either insufficiently precise, fail for certain input ranges, or lack efficient GPU implementations. This paper introduces two novel algorithms correcting these deficiencies and demonstrates significant improvements in computational speed and numerical stability.
Numerical Stability and Precision
The authors propose algorithms that compute the logarithm of the modified Bessel functions on a logarithmic scale to avoid common numerical issues such as underflow and overflow. Traditional approaches, including library functions from SciPy and mpmath, are prone to failing under these conditions, particularly for large values of v. The proposed algorithms exhibit relative errors on the scale of machine precision across a range of inputs where traditional implementations fail.
Key contributions include the introduction and detailed evaluation of algorithms leveraging asymptotic expressions and recurrence relations. For instance, the series expression for Iv(x) and the integral expression for Kv(x) are augmented with logarithmic operations to maintain stability. Moreover, the algorithms are efficiently parallelized for GPU execution, leveraging CUDA to optimize runtime further.
Performance evaluations against existing libraries highlight substantial speed improvements:
- C++/CUDA implementations exhibit median and maximum speedups of 45x and 6150x (GPU), respectively.
- When compared to SciPy, the proposed methods demonstrate median and maximum speedups of 77x and 300x (GPU).
These results underscore the significant computational gains, with the GPU implementations achieving runtimes one to two orders of magnitude faster than those reported by existing libraries.
Empirical Validation
A practical validation is achieved by fitting von Mises-Fisher (vMF) distributions to high-dimensional neural network features extracted from CIFAR10 training images using ResNet50's convolutional layers. These features span dimensions of 2048, 8192, and 32768. Traditional libraries were either unable to perform these computations or produced results with significant errors. In contrast, the proposed algorithms fitted vMF distributions successfully, emphasizing the utility and robustness of the new methods.
Broader Implications
The accuracy and computational efficiency offered by these algorithms enable applications in Bayesian deep learning, protein structure modeling, robotics, and other domains where precise and fast computation of special functions is critical. Additionally, the implementation of these algorithms as open-source CUDA libraries facilitates broader adoption and integration within GPU-based scientific computing workflows.
Future Directions
Future work could focus on extending these algorithms to other classes of special functions, improving the speed for small input ranges of logKv(x), and further optimizing GPU utilization. Additionally, integrating these algorithms with automatic differentiation frameworks would enable their use in gradient-based optimization methods, expanding their applicability in machine learning and beyond.
In conclusion, the paper presents a substantial contribution to the field of numerical computing by addressing key limitations in the robust and efficient evaluation of logarithmic modified Bessel functions. The provided algorithms not only enhance numerical stability and precision but also significantly reduce computational time, especially on GPU platforms, thereby opening up new possibilities for high-dimensional and precision-critical applications across various scientific domains.