Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
117 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Accurate Computation of the Logarithm of Modified Bessel Functions on GPUs (2409.08729v1)

Published 13 Sep 2024 in cs.DC

Abstract: Bessel functions are critical in scientific computing for applications such as machine learning, protein structure modeling, and robotics. However, currently, available routines lack precision or fail for certain input ranges, such as when the order $v$ is large, and GPU-specific implementations are limited. We address the precision limitations of current numerical implementations while dramatically improving the runtime. We propose two novel algorithms for computing the logarithm of modified Bessel functions of the first and second kinds by computing intermediate values on a logarithmic scale. Our algorithms are robust and never have issues with underflows or overflows while having relative errors on the order of machine precision, even for inputs where existing libraries fail. In C++/CUDA, our algorithms have median and maximum speedups of 45x and 6150x for GPU and 17x and 3403x for CPU, respectively, over the ranges of inputs and third-party libraries tested. Compared to SciPy, the algorithms have median and maximum speedups of 77x and 300x for GPU and 35x and 98x for CPU, respectively, over the tested inputs. The ability to robustly compute a solution and the low relative errors allow us to fit von Mises-Fisher, vMF, distributions to high-dimensional neural network features. This is, e.g., relevant for uncertainty quantification in metric learning. We obtain image feature data by processing CIFAR10 training images with the convolutional layers of a pre-trained ResNet50. We successfully fit vMF distributions to 2048-, 8192-, and 32768-dimensional image feature data using our algorithms. Our approach provides fast and accurate results while existing implementations in SciPy and mpmath fail to fit successfully. Our approach is readily implementable on GPUs, and we provide a fast open-source implementation alongside this paper.

Summary

  • The paper introduces novel GPU algorithms to compute logarithms of modified Bessel functions, overcoming underflow and overflow issues in traditional methods.
  • The methods leverage asymptotic expressions and recurrence relations to achieve machine precision, with speed improvements up to 6150x over existing libraries.
  • Empirical validation using high-dimensional CIFAR10 neural features demonstrates robust performance in fitting von Mises-Fisher distributions.

Accurate Computation of the Logarithm of Modified Bessel Functions on GPUs

The paper "Accurate Computation of the Logarithm of Modified Bessel Functions on GPUs" by Andreas Plesner, Hans Henrik Brandenborg Sørensen, and Søren Hauberg addresses critical limitations in the existing numerical implementations of modified Bessel functions, particularly when computed on GPUs. Modified Bessel functions, specifically those of the first and second kinds, denoted as Iv(x)I_v(x) and Kv(x)K_v(x), are prevalent in scientific computing applications ranging from machine learning to physics. Current libraries are either insufficiently precise, fail for certain input ranges, or lack efficient GPU implementations. This paper introduces two novel algorithms correcting these deficiencies and demonstrates significant improvements in computational speed and numerical stability.

Numerical Stability and Precision

The authors propose algorithms that compute the logarithm of the modified Bessel functions on a logarithmic scale to avoid common numerical issues such as underflow and overflow. Traditional approaches, including library functions from SciPy and mpmath, are prone to failing under these conditions, particularly for large values of vv. The proposed algorithms exhibit relative errors on the scale of machine precision across a range of inputs where traditional implementations fail.

Algorithm and Performance

Key contributions include the introduction and detailed evaluation of algorithms leveraging asymptotic expressions and recurrence relations. For instance, the series expression for Iv(x)I_v(x) and the integral expression for Kv(x)K_v(x) are augmented with logarithmic operations to maintain stability. Moreover, the algorithms are efficiently parallelized for GPU execution, leveraging CUDA to optimize runtime further.

Performance evaluations against existing libraries highlight substantial speed improvements:

  • C++/CUDA implementations exhibit median and maximum speedups of 45x and 6150x (GPU), respectively.
  • When compared to SciPy, the proposed methods demonstrate median and maximum speedups of 77x and 300x (GPU).

These results underscore the significant computational gains, with the GPU implementations achieving runtimes one to two orders of magnitude faster than those reported by existing libraries.

Empirical Validation

A practical validation is achieved by fitting von Mises-Fisher (vMF) distributions to high-dimensional neural network features extracted from CIFAR10 training images using ResNet50's convolutional layers. These features span dimensions of 2048, 8192, and 32768. Traditional libraries were either unable to perform these computations or produced results with significant errors. In contrast, the proposed algorithms fitted vMF distributions successfully, emphasizing the utility and robustness of the new methods.

Broader Implications

The accuracy and computational efficiency offered by these algorithms enable applications in Bayesian deep learning, protein structure modeling, robotics, and other domains where precise and fast computation of special functions is critical. Additionally, the implementation of these algorithms as open-source CUDA libraries facilitates broader adoption and integration within GPU-based scientific computing workflows.

Future Directions

Future work could focus on extending these algorithms to other classes of special functions, improving the speed for small input ranges of logKv(x)\log K_v(x), and further optimizing GPU utilization. Additionally, integrating these algorithms with automatic differentiation frameworks would enable their use in gradient-based optimization methods, expanding their applicability in machine learning and beyond.

In conclusion, the paper presents a substantial contribution to the field of numerical computing by addressing key limitations in the robust and efficient evaluation of logarithmic modified Bessel functions. The provided algorithms not only enhance numerical stability and precision but also significantly reduce computational time, especially on GPU platforms, thereby opening up new possibilities for high-dimensional and precision-critical applications across various scientific domains.

X Twitter Logo Streamline Icon: https://streamlinehq.com