Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Quasi-Monte Carlo Data Structure for Smooth Kernel Evaluations (2401.02562v1)

Published 4 Jan 2024 in cs.DS

Abstract: In the kernel density estimation (KDE) problem one is given a kernel $K(x, y)$ and a dataset $P$ of points in a Euclidean space, and must prepare a data structure that can quickly answer density queries: given a point $q$, output a $(1+\epsilon)$-approximation to $\mu:=\frac1{|P|}\sum_{p\in P} K(p, q)$. The classical approach to KDE is the celebrated fast multipole method of [Greengard and Rokhlin]. The fast multipole method combines a basic space partitioning approach with a multidimensional Taylor expansion, which yields a $\approx \logd (n/\epsilon)$ query time (exponential in the dimension $d$). A recent line of work initiated by [Charikar and Siminelakis] achieved polynomial dependence on $d$ via a combination of random sampling and randomized space partitioning, with [Backurs et al.] giving an efficient data structure with query time $\approx \mathrm{poly}{\log(1/\mu)}/\epsilon2$ for smooth kernels. Quadratic dependence on $\epsilon$, inherent to the sampling methods, is prohibitively expensive for small $\epsilon$. This issue is addressed by quasi-Monte Carlo methods in numerical analysis. The high level idea in quasi-Monte Carlo methods is to replace random sampling with a discrepancy based approach -- an idea recently applied to coresets for KDE by [Phillips and Tai]. The work of Phillips and Tai gives a space efficient data structure with query complexity $\approx 1/(\epsilon \mu)$. This is polynomially better in $1/\epsilon$, but exponentially worse in $1/\mu$. We achieve the best of both: a data structure with $\approx \mathrm{poly}{\log(1/\mu)}/\epsilon$ query time for smooth kernel KDE. Our main insight is a new way to combine discrepancy theory with randomized space partitioning inspired by, but significantly more efficient than, that of the fast multipole methods. We hope that our techniques will find further applications to linear algebra for kernel matrices.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. On the estimation of the gradient lines of a density and the consistency of the mean-shift algorithm. Journal of Machine Learning Research, 2015.
  2. Algorithms and hardness for linear algebra on geometric graphs. In Sandy Irani, editor, 61st IEEE Annual Symposium on Foundations of Computer Science, FOCS 2020, Durham, NC, USA, November 16-19, 2020, pages 541–552. IEEE, 2020.
  3. Faster kernel ridge regressionusing sketching and preconditioning. SIAM Journal of Matrix Analysis and Applications, 38(4):1116–1138, 2017.
  4. Sharper bounds for regularized data fitting. In Approximation, Randomization and Combinatorial Optimization. Algorithms and Techniques, 2017.
  5. Beyond locality-sensitive hashing. In Proceedings of the 25th ACM-SIAM Symposium on Discrete Algorithms (SODA ’2014), pages 1018–1028, 2014. Available as arXiv:1306.1547.
  6. Oblivious sketching of high-degree polynomial kernels. In SODA (to appear), 2020.
  7. Oblivious sketching of high-degree polynomial kernels. In Proceedings of the 31st ACM-SIAM Symposium on Discrete Algorithms (SODA ’2020), 2020.
  8. Random fourier features for kernel ridge regression: Approximation bounds and statistical guarantees. In Proceedings of the 34th International Conference on Machine Learning (ICML ’2017), 2017.
  9. Optimal hashing-based time–space trade-offs for approximate near neighbors. In Proceedings of the 28th ACM-SIAM Symposium on Discrete Algorithms (SODA ’2017), 2017. Available as arXiv:1608.03580.
  10. Discrepancy minimization via a self-balancing walk. In Proceedings of the 53rd ACM Symposium on the Theory of Computing (STOC ’2021), 2021.
  11. Data-dependent hashing via non-linear spectral gaps. In Proceedings of the 50th ACM Symposium on the Theory of Computing (STOC ’2018), 2018.
  12. Subspace embeddings for the polynomial kernel. In Proceedings of Advances in Neural Information Processing Systems 25 (NIPS ’2014), 2014.
  13. Optimal data-dependent hashing for approximate near neighbors. In Proceedings of the 47th ACM Symposium on the Theory of Computing (STOC ’2015), pages 793–801, 2015. Available as arXiv:1501.01062.
  14. Fast attention requires bounded entries. CoRR, abs/2302.13214, 2023.
  15. Efficient density evaluation for smooth kernels. In Proceedings of the 59th Annual IEEE Symposium on Foundations of Computer Science (FOCS ’2018), 2018.
  16. Efficient density evaluation for smooth kernels. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pages 615–626. IEEE, 2018.
  17. R Beatson and Leslie Greengard. A short course on fast multipole methods, pages 1–37. Numerical Mathematics and Scientific Computation. Oxford University Press, 1997.
  18. J. Barnes and P. Hut. A hierarchical O⁢(N⁢log⁡N)𝑂𝑁𝑁O(N\log N)italic_O ( italic_N roman_log italic_N ) force-calculation algorithm. Nature, 324(4):446–449, 1986.
  19. Space and time efficient kernel density estimation in high dimensions. In Advances in Neural Information Processing Systems, 2019.
  20. New streaming algorithms for high dimensional emd and mst. In Proceedings of the 54th ACM Symposium on the Theory of Computing (STOC ’2022), 2022.
  21. Kernel density estimation through density constrained near neighbor search. In Sandy Irani, editor, 61st IEEE Annual Symposium on Foundations of Computer Science, FOCS 2020, Durham, NC, USA, November 16-19, 2020, pages 172–183. IEEE, 2020.
  22. Kernel density estimation through density constrained near neighbor search. In Proceedings of the 61st Annual IEEE Symposium on Foundations of Computer Science (FOCS ’2020), 2020.
  23. Hashing-based-estimators for kernel density in high dimensions. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 1032–1043. IEEE, 2017.
  24. Hashing-based-estimators for kernel density in high dimensions. In Proceedings of the 58th Annual IEEE Symposium on Foundations of Computer Science (FOCS ’2017), 2017.
  25. Multi-resolution hashing for fast pairwise summations. In 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS). IEEE, 2019.
  26. Local polynomial modelling and its applications: monographs on statistics and applied probability 66, volume 66. CRC Press, 1996.
  27. Scalable kernel density classification via threshold-based pruning. In Proceedings of the 2017 ACM International Conference on Management of Data, pages 945–959. ACM, 2017.
  28. Multi-instance kernels. In ICML, volume 2, pages 179–186, 2002.
  29. N-body’problems in statistical learning. In Advances in neural information processing systems, pages 521–527, 2001.
  30. Nonparametric density estimation: Toward computational tractability. In Proceedings of the 2003 SIAM International Conference on Data Mining, pages 203–211. SIAM, 2003.
  31. Nonparametric ridge estimation. The Annals of Statistics, 42(4):1511–1545, 2014.
  32. Mean shift based clustering in high dimensions: a texture classification example. In Proceedings Ninth IEEE International Conference on Computer Vision, pages 456–463 vol.1, 2003.
  33. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Jeffrey Scott Vitter, editor, Proceedings of the Thirtieth Annual ACM Symposium on the Theory of Computing, Dallas, Texas, USA, May 23-26, 1998, pages 604–613. ACM, 1998.
  34. Comparing distributions and shapes using the kernel distance. In Proceedings of the twenty-seventh annual symposium on Computational geometry, pages 47–56. ACM, 2011.
  35. Dual-tree fast gauss transforms. In Advances in Neural Information Processing Systems, pages 747–754, 2006.
  36. Kernel mean embedding of distributions: A review and beyond. Foundations and Trends® in Machine Learning, 10(1-2):1–141, 2017.
  37. Recursive sampling for the nyström method. In Proceedings of Advances in Neural Information Processing Systems (NeurIPS ’2017), 2017.
  38. Near-optimal coresets for kernel density estimates. Discrete and Computational Geometry, 63(4):867–887, 2020.
  39. Linear-time algorithms for pairwise statistical problems. In Advances in Neural Information Processing Systems, pages 1527–1535, 2009.
  40. Random features for large-scale kernel machines. In Proceedings of Advances in Neural Information Processing Systems 21 (NIPS ’2008), 2008.
  41. Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian processes for machine learning. Adaptive computation and machine learning. MIT Press, 2006.
  42. Generalized density clustering. The Annals of Statistics, pages 2678–2722, 2010.
  43. Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press, 2001.
  44. Learning theory for distribution regression. The Journal of Machine Learning Research, 17(1):5272–5311, 2016.
  45. Kernel methods for pattern analysis. Cambridge university press, 2004.
  46. Generalized outlier detection with flexible kernel density estimates. In Proceedings of the 2014 SIAM International Conference on Data Mining, pages 542–550. SIAM, 2014.
  47. Martin J. Wainwright. High-dimensional statistics: a non-asymptotic viewpoint, volume 48. Cambridge University Press, 2019.
  48. Holger Wendland. Scattered Data Approximation. Cambridge University Press, 2004.
  49. Improved fast gauss transform and efficient kernel density estimation. In Proceedings of the Ninth IEEE International Conference on Computer Vision-Volume 2, page 464. IEEE Computer Society, 2003.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Moses Charikar (68 papers)
  2. Michael Kapralov (55 papers)
  3. Erik Waingarten (32 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com