Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Dynamic Maintenance of Kernel Density Estimation Data Structure: From Practice to Theory (2208.03915v2)

Published 8 Aug 2022 in cs.LG and stat.ML

Abstract: Kernel density estimation (KDE) stands out as a challenging task in machine learning. The problem is defined in the following way: given a kernel function $f(x,y)$ and a set of points ${x_1, x_2, \cdots, x_n } \subset \mathbb{R}d$, we would like to compute $\frac{1}{n}\sum_{i=1}{n} f(x_i,y)$ for any query point $y \in \mathbb{R}d$. Recently, there has been a growing trend of using data structures for efficient KDE. However, the proposed KDE data structures focus on static settings. The robustness of KDE data structures over dynamic changing data distributions is not addressed. In this work, we focus on the dynamic maintenance of KDE data structures with robustness to adversarial queries. Especially, we provide a theoretical framework of KDE data structures. In our framework, the KDE data structures only require subquadratic spaces. Moreover, our data structure supports the dynamic update of the dataset in sublinear time. Furthermore, we can perform adaptive queries with the potential adversary in sublinear time.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Algorithms and hardness for linear algebra on geometric graphs. In 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pages 541–552. IEEE, 2020.
  2. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In 2006 47th annual IEEE symposium on foundations of computer science (FOCS’06), pages 459–468. IEEE, 2006.
  3. Fast attention requires bounded entries. arXiv preprint arXiv:2302.13214, 2023.
  4. Space and time efficient kernel density estimation in high dimensions. Advances in Neural Information Processing Systems, 32, 2019.
  5. One-pass diversified sampling with application to terabyte-scale genomic sequence streams. In International Conference on Machine Learning, pages 4202–4218. PMLR, 2022.
  6. Somke: Kernel density estimation over data streams by sequences of self-organizing maps. IEEE transactions on neural networks and learning systems, 23(8):1254–1268, 2012.
  7. A fair classifier using kernel density estimation. Advances in Neural Information Processing Systems, 33:15088–15099, 2020.
  8. Sws: a complexity-optimized solution for spatial-temporal kernel density visualization. Proceedings of the VLDB Endowment, 15(4):814–827, 2021.
  9. Kernel density estimation through density constrained near neighbor search. In 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pages 172–183. IEEE, 2020.
  10. Mongoose: A learnable lsh framework for efficient neural network training. In International Conference on Learning Representations, 2020.
  11. Slide: In defense of smart algorithms over hardware acceleration for large-scale deep learning systems. Proceedings of Machine Learning and Systems, 2:291–306, 2020.
  12. Kyle Cranmer. Kernel estimation in high-energy physics. Computer Physics Communications, 136(3):198–207, 2001.
  13. The use of kernel density estimation with a bio-physical model provides a method to quantify connectivity among salmon farms: spatial planning and management with epidemiological relevance. Frontiers in Veterinary Science, page 269, 2018.
  14. Sparse approximation of a kernel mean. IEEE Transactions on Signal Processing, 65(5):1310–1323, 2016.
  15. Hashing-based-estimators for kernel density in high dimensions. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 1032–1043. IEEE, 2017.
  16. Sub-linear race sketches for approximate kernel density estimation on streaming data. In Proceedings of The Web Conference 2020, pages 1739–1749, 2020.
  17. Super-samples from kernel herding. In Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence, pages 109–116, 2010.
  18. Dynamic kernel sparsifiers. arXiv preprint arXiv:2211.14825, 2022.
  19. Randomized and deterministic attention sparsification algorithms for over-parameterized feature dimension. arXiv preprint arXiv:2304.04397, 2023.
  20. A new kernel density estimator for accurate home-range and species-range area estimation. Methods in Ecology and Evolution, 8(5):571–579, 2017.
  21. Classifying anomalies through outer density estimation (cathode). arXiv preprint arXiv:2109.00546, 2021.
  22. Probability density forecasting of wind power using quantile regression neural network and kernel density estimation. Energy conversion and management, 164:374–384, 2018.
  23. Cluster kernels: Resource-aware kernel density estimators over streaming data. IEEE Transactions on Knowledge and Data Engineering, 20(7):880–893, 2008.
  24. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the thirtieth annual ACM symposium on Theory of computing, pages 604–613, 1998.
  25. Deann: Speeding up kernel-density estimation using approximate nearest neighbor search. In International Conference on Artificial Intelligence and Statistics, pages 3108–3137. PMLR, 2022.
  26. Winner-take-all column row sampling for memory efficient adaptation of language model. arXiv preprint arXiv:2305.15265, 2023.
  27. Kernel mean embedding of distributions: A review and beyond. Foundations and Trends® in Machine Learning, 10(1-2):1–141, 2017.
  28. Estimation of mutual information using kernel density estimators. Physical Review E, 52(3):2318, 1995.
  29. Near-optimal coresets of kernel density estimates. Discrete & Computational Geometry, 63(4):867–887, 2020.
  30. Rehashing kernel evaluation in high dimensions. In International Conference on Machine Learning, pages 5789–5798. PMLR, 2019.
  31. Mutual information estimation using lsh sampling. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, pages 2807–2815, 2021.
  32. Sublinear least-squares value iteration via locality sensitive hashing. arXiv preprint arXiv:2105.08285, 2021.
  33. Accelerating frank-wolfe algorithm using low-dimensional and adaptive data structures. arXiv preprint arXiv:2207.09002, 2022.
  34. Speeding up sparsification using inner product search data structures. arXiv preprint arXiv:2204.03209, 2022.
  35. Algorithm and hardness for dynamic attention maintenance in large language models. arXiv e-prints, pages arXiv–2304, 2023.
  36. Semi-supervised kernel density estimation for video annotation. Computer Vision and Image Understanding, 113(3):384–396, 2009.
  37. Semi-supervised learning by augmented distribution alignment. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1466–1475, 2019.
  38. Dragonn: Distributed randomized approximate gradients of neural networks. In International Conference on Machine Learning, pages 23274–23291. PMLR, 2022.
  39. Locality sensitive teaching. Advances in Neural Information Processing Systems, 34, 2021.
  40. Breaking the linear iteration cost barrier for some well-known conditional gradient methods using maxip data-structures. Advances in Neural Information Processing Systems, 34, 2021.
  41. Lichen Zhang. Speeding up optimizations via data structures: Faster search, sample and maintenance. Master’s thesis, Carnegie Mellon University, 2022.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com