Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Algorithmic Complexity Attacks on Dynamic Learned Indexes (2403.12433v1)

Published 19 Mar 2024 in cs.DB and cs.CR

Abstract: Learned Index Structures (LIS) view a sorted index as a model that learns the data distribution, takes a data element key as input, and outputs the predicted position of the key. The original LIS can only handle lookup operations with no support for updates, rendering it impractical to use for typical workloads. To address this limitation, recent studies have focused on designing efficient dynamic learned indexes. ALEX, as the pioneering dynamic learned index structures, enables dynamism by incorporating a series of design choices, including adaptive key space partitioning, dynamic model retraining, and sophisticated engineering and policies that prioritize read/write performance. While these design choices offer improved average-case performance, the emphasis on flexibility and performance increases the attack surface by allowing adversarial behaviors that maximize ALEX's memory space and time complexity in worst-case scenarios. In this work, we present the first systematic investigation of algorithmic complexity attacks (ACAs) targeting the worst-case scenarios of ALEX. We introduce new ACAs that fall into two categories, space ACAs and time ACAs, which target the memory space and time complexity, respectively. First, our space ACA on data nodes exploits ALEX's gapped array layout and uses Multiple-Choice Knapsack (MCK) to generate an optimal adversarial insertion plan for maximizing the memory consumption at the data node level. Second, our space ACA on internal nodes exploits ALEX's catastrophic cost mitigation mechanism, causing an out-of-memory error with only a few hundred adversarial insertions. Third, our time ACA generates pathological insertions to increase the disparity between the actual key distribution and the linear models of data nodes, deteriorating the runtime performance by up to 1,641X compared to ALEX operating under legitimate workloads.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (66)
  1. 2010. Amazon Mechanical Turkon. Retrieved Decmber 10, 2023 from https://www.mturk.com/
  2. 2017. OpenStreetMap Public Data Set Now Available on AWS. Retrieved Decmber 10, 2023 from https://aws.amazon.com/about-aws/whats-new/2017/06/openstreetmap-public-data-set-now-available-on-aws/
  3. 2023. KernelDensity. Retrieved Decmber 10, 2023 from https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KernelDensity.html
  4. Learned Indexes for a Google-scale Disk-based Database. https://arxiv.org/pdf/2012.12501.pdf
  5. SurgeProtector: Mitigating Temporal Algorithmic Complexity Attacks Using Adversarial Scheduling. In Proceedings of the ACM SIGCOMM 2022 Conference (Amsterdam, Netherlands) (SIGCOMM ’22). Association for Computing Machinery, New York, NY, USA, 723–738. https://doi.org/10.1145/3544216.3544250
  6. Noa Bar-Yosef and Avishai Wool. 2007. Remote Algorithmic Complexity Attacks against Randomized Hash Tables. In E-business and Telecommunications - 4th International Conference, ICETE 2007, Barcelona, Spain, July 28-31, 2007, Revised Selected Papers (Communications in Computer and Information Science), Joaquim Filipe and Mohammad S. Obaidat (Eds.), Vol. 23. Springer, 162–174. https://doi.org/10.1007/978-3-540-88653-2_12
  7. Vulnerability of Network Mechanisms to Sophisticated DDoS Attacks. IEEE Trans. Comput. 62, 5 (2013), 1031–1043. https://doi.org/10.1109/TC.2012.49
  8. Is Data Clustering in Adversarial Settings Secure?. In Proceedings of the 2013 ACM Workshop on Artificial Intelligence and Security (Berlin, Germany) (AISec ’13). Association for Computing Machinery, New York, NY, USA, 87–98. https://doi.org/10.1145/2517312.2517321
  9. Tastes Great! Less Filling! High Performance and Accurate Training Data Collection for Self-Driving Database Management Systems. In Proceedings of the 2022 International Conference on Management of Data (Philadelphia, PA, USA) (SIGMOD ’22). Association for Computing Machinery, New York, NY, USA, 617–630. https://doi.org/10.1145/3514221.3517845
  10. Exploiting Unix File-System Races via Algorithmic Complexity Attacks. In 2009 30th IEEE Symposium on Security and Privacy. 27–41. https://doi.org/10.1109/SP.2009.10
  11. Characterizing, Modeling, and Benchmarking RocksDB Key-Value Workloads at Facebook. In 18th USENIX Conference on File and Storage Technologies (FAST 20). USENIX Association, Santa Clara, CA, 209–223. https://www.usenix.org/conference/fast20/presentation/cao-zhichao
  12. Bigtable: A Distributed Storage System for Structured Data. In 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI 06). USENIX Association, Seattle, WA. https://www.usenix.org/conference/osdi-06/bigtable-distributed-storage-system-structured-data
  13. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (Indianapolis, Indiana, USA) (SoCC ’10). Association for Computing Machinery, New York, NY, USA, 143–154. https://doi.org/10.1145/1807128.1807152
  14. Scott A. Crosby and Dan S. Wallach. 2003. Denial of Service via Algorithmic Complexity Attacks. In 12th USENIX Security Symposium (USENIX Security 03). USENIX Association, Washington, D.C. https://www.usenix.org/conference/12th-usenix-security-symposium/denial-service-algorithmic-complexity-attacks
  15. Tuple Space Explosion: A Denial-of-Service Attack against a Software Packet Classifier. In Proceedings of the 15th International Conference on Emerging Networking Experiments And Technologies (Orlando, Florida) (CoNEXT ’19). Association for Computing Machinery, New York, NY, USA, 292–304. https://doi.org/10.1145/3359989.3365431
  16. On the Feasibility and Enhancement of the Tuple Space Explosion Attack against Open vSwitch. https://arxiv.org/abs/2011.09107.
  17. The Impact of Regular Expression Denial of Service (ReDoS) in Practice: An Empirical Study at the Ecosystem Scale. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Lake Buena Vista, FL, USA) (ESEC/FSE 2018). Association for Computing Machinery, New York, NY, USA, 246–256. https://doi.org/10.1145/3236024.3236027
  18. Sarang Dharmapurikar and Vern Paxson. 2005. Robust TCP Stream Reassembly in the Presence of Adversaries. In 14th USENIX Security Symposium (USENIX Security 05). USENIX Association, Baltimore, MD. https://www.usenix.org/conference/14th-usenix-security-symposium/robust-tcp-stream-reassembly-presence-adversaries
  19. ALEX: An Updatable Adaptive Learned Index. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (Portland, OR, USA) (SIGMOD ’20). Association for Computing Machinery, New York, NY, USA, 969–984. https://doi.org/10.1145/3318464.3389711
  20. Tsunami: A Learned Multi-Dimensional Index for Correlated Data and Skewed Workloads. Proc. VLDB Endow. 14, 2 (oct 2020), 74–86. https://doi.org/10.14778/3425879.3425880
  21. Local Model Poisoning Attacks to Byzantine-Robust Federated Learning. In 29th USENIX Security Symposium (USENIX Security 20). USENIX Association, 1605–1622. https://www.usenix.org/conference/usenixsecurity20/presentation/fang
  22. Paolo Ferragina and Giorgio Vinciguerra. 2020. The PGM-Index: A Fully-Dynamic Compressed Learned Index with Provable Worst-Case Bounds. Proc. VLDB Endow. 13, 8 (apr 2020), 1162–1175. https://doi.org/10.14778/3389133.3389135
  23. CrowdDB: Answering Queries with Crowdsourcing. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (Athens, Greece) (SIGMOD ’11). Association for Computing Machinery, New York, NY, USA, 61–72. https://doi.org/10.1145/1989323.1989331
  24. FITing-Tree: A Data-Aware Index Structure. In Proceedings of the 2019 International Conference on Management of Data (Amsterdam, Netherlands) (SIGMOD ’19). Association for Computing Machinery, New York, NY, USA, 1189–1206. https://doi.org/10.1145/3299869.3319860
  25. Gaston H. Gonnet. 1981. Expected Length of the Longest Probe Sequence in Hash Code Searching. J. ACM 28, 2 (apr 1981), 289–304. https://doi.org/10.1145/322248.322254
  26. Ali Hadian and Thomas Heinis. 2019. Considerations for Handling Updates in Learned Index Structures. In Proceedings of the Second International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (Amsterdam, Netherlands) (aiDM ’19). Association for Computing Machinery, New York, NY, USA, Article 3, 4 pages. https://doi.org/10.1145/3329859.3329874
  27. Nathan Hauke and David Renardy. 2019. Denial of Service with a Fistful of Packets: Exploiting Algorithmic Complexity Vulnerabilities. Retrieved December 10, 2023 from https://www.blackhat.com/us-19/briefings/schedule/#denial-of-service-with-a-fistful-of-packets-exploiting-algorithmic-complexity-vulnerabilities-16445
  28. Adam Jacobson and David Renardy. 2019. Algorithmic Complexity Vulnerabilities: An Introduction. Retrieved Decmber 10, 2023 from https://twosixtech.com/algorithmic-complexity-vulnerabilities-an-introduction/
  29. Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning. arXiv:1804.00308 [cs.CR]
  30. SOSD: A Benchmark for Learned Indexes. https://arxiv.org/abs/1911.13014.
  31. RadixSpline: A Single-Pass Learned Index. In Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (Portland, Oregon) (aiDM ’20). Association for Computing Machinery, New York, NY, USA, Article 5, 5 pages. https://doi.org/10.1145/3401071.3401659
  32. CrowdForge: Crowdsourcing Complex Work. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (Santa Barbara, California, USA) (UIST ’11). Association for Computing Machinery, New York, NY, USA, 43–52. https://doi.org/10.1145/2047196.2047202
  33. The Price of Tailoring the Index to Your Data: Poisoning Attacks on Learned Index Structures. In Proceedings of the 2022 International Conference on Management of Data (Philadelphia, PA, USA) (SIGMOD ’22). Association for Computing Machinery, New York, NY, USA, 1331–1344. https://doi.org/10.1145/3514221.3517867
  34. SageDB: A Learned Database System. In 9th Biennial Conference on Innovative Data Systems Research, CIDR 2019, Asilomar, CA, USA, January 13-16, 2019, Online Proceedings. www.cidrdb.org. http://cidrdb.org/cidr2019/papers/p117-kraska-cidr19.pdf
  35. The Case for Learned Index Structures. In Proceedings of the 2018 International Conference on Management of Data (Houston, TX, USA) (SIGMOD ’18). Association for Computing Machinery, New York, NY, USA, 489–504. https://doi.org/10.1145/3183713.3196909
  36. PerfFuzz: Automatically Generating Pathological Inputs. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis (Amsterdam, Netherlands) (ISSTA 2018). Association for Computing Machinery, New York, NY, USA, 254–265. https://doi.org/10.1145/3213846.3213874
  37. FINEdex: A Fine-Grained Learned Index Scheme for Scalable and Concurrent Memory Systems. Proc. VLDB Endow. 15, 2 (oct 2021), 321–334. https://doi.org/10.14778/3489496.3489512
  38. APEX: A High-Performance Learned Index on Persistent Memory. Proc. VLDB Endow. 15, 3 (nov 2021), 597–610. https://doi.org/10.14778/3494124.3494141
  39. FILM: A Fully Learned Index for Larger-Than-Memory Databases. Proc. VLDB Endow. 16, 3 (nov 2022), 561–573. https://doi.org/10.14778/3570690.3570704
  40. Resource Management with Deep Reinforcement Learning. In Proceedings of the 15th ACM Workshop on Hot Topics in Networks (Atlanta, GA, USA) (HotNets ’16). Association for Computing Machinery, New York, NY, USA, 50–56. https://doi.org/10.1145/3005745.3005750
  41. Learning Scheduling Algorithms for Data Processing Clusters. In Proceedings of the ACM Special Interest Group on Data Communication (Beijing, China) (SIGCOMM ’19). Association for Computing Machinery, New York, NY, USA, 270–288. https://doi.org/10.1145/3341302.3342080
  42. Human-Powered Sorts and Joins. Proc. VLDB Endow. 5, 1 (sep 2011), 13–24. https://doi.org/10.14778/2047485.2047487
  43. Benchmarking Learned Indexes. Proc. VLDB Endow. 14, 1 (sep 2020), 1–13. https://doi.org/10.14778/3421424.3421425
  44. Learning Multi-Dimensional Indexes. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (Portland, OR, USA) (SIGMOD ’20). Association for Computing Machinery, New York, NY, USA, 985–1000. https://doi.org/10.1145/3318464.3380579
  45. The Log-Structured Merge-Tree (LSM-Tree). Acta Inf. 33, 4 (jun 1996), 351–385. https://doi.org/10.1007/s002360050048
  46. Deco: A System for Declarative Crowdsourcing. Proc. VLDB Endow. 5, 12 (aug 2012), 1990–1993. https://doi.org/10.14778/2367502.2367555
  47. Automated Synthesis of Adversarial Workloads for Network Functions. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication (Budapest, Hungary) (SIGCOMM ’18). Association for Computing Machinery, New York, NY, USA, 372–385. https://doi.org/10.1145/3230543.3230573
  48. Laurent Perron and Vincent Furnon. 2023. OR-Tools. Google. Retrieved Decmber 10, 2023 from https://developers.google.com/optimization/cp/cp_solver/
  49. SlowFuzz: Automated Domain-Independent Detection of Algorithmic Complexity Vulnerabilities. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (Dallas, Texas, USA) (CCS ’17). Association for Computing Machinery, New York, NY, USA, 2155–2168. https://doi.org/10.1145/3133956.3134073
  50. Raghu Ramakrishnan and Johannes Gehrke. 2002. Database Management Systems (3 ed.). McGraw-Hill, Inc., USA.
  51. Learned Systems Security.
  52. ReScue: Crafting Regular Expression DoS Attacks. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (Montpellier, France) (ASE ’18). Association for Computing Machinery, New York, NY, USA, 225–235. https://doi.org/10.1145/3238147.3238159
  53. Improving the Resilience of an IDS against Performance Throttling Attacks. In Security and Privacy in Communication Networks - 8th International ICST Conference, SecureComm 2012, Padua, Italy, September 3-5, 2012. Revised Selected Papers (Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering), Angelos D. Keromytis and Roberto Di Pietro (Eds.), Vol. 106. Springer, 167–184. https://doi.org/10.1007/978-3-642-36883-7_11
  54. Learning Relaxed Belady for Content Distribution Network Caching. In 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20). USENIX Association, Santa Clara, CA, 529–544. https://www.usenix.org/conference/nsdi20/presentation/song
  55. When Does Machine Learning FAIL? Generalized Transferability for Evasion and Poisoning Attacks. In 27th USENIX Security Symposium (USENIX Security 18). USENIX Association, Baltimore, MD, 1299–1316. https://www.usenix.org/conference/usenixsecurity18/presentation/suciu
  56. Learned Index: A Comprehensive Experimental Evaluation. Proc. VLDB Endow. 16, 8 (jun 2023), 1992–2004. https://doi.org/10.14778/3594512.3594528
  57. Learned Indexes for Dynamic Workloads. https://arxiv.org/abs/1902.00655.
  58. Juha-Matti Tilli. 2018. CVE-2018-5390: Linux Kernel TCP Reassembly Algorithm Lets Remote Users Consume Excessive CPU Resources on the Target System. Retrieved Decmber 10, 2023 from https://ubuntu.com/security/cve-2018-5390
  59. Automatic Database Management System Tuning Through Large-Scale Machine Learning. In Proceedings of the 2017 ACM International Conference on Management of Data (Chicago, Illinois, USA) (SIGMOD ’17). Association for Computing Machinery, New York, NY, USA, 1009–1024. https://doi.org/10.1145/3035918.3064029
  60. Are Updatable Learned Indexes Ready? Proc. VLDB Endow. 15, 11 (jul 2022), 3004–3017. https://doi.org/10.14778/3551793.3551848
  61. Updatable Learned Index with Precise Positions. Proc. VLDB Endow. 14, 8 (apr 2021), 1276–1288. https://doi.org/10.14778/3457390.3457393
  62. NFL: Robust Learned Index via Distribution Transformation. Proc. VLDB Endow. 15, 10 (jun 2022), 2188–2200. https://doi.org/10.14778/3547305.3547322
  63. Static Detection of DoS Vulnerabilities in Programs That Use Regular Expressions. In Proceedings, Part II, of the 23rd International Conference on Tools and Algorithms for the Construction and Analysis of Systems - Volume 10206. Springer-Verlag, Berlin, Heidelberg, 3–20. https://doi.org/10.1007/978-3-662-54580-5_1
  64. Is Feature Selection Secure against Training Data Poisoning?. In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37 (Lille, France) (ICML’15). JMLR.org, 1689–1698.
  65. Generative Poisoning Attack Method Against Neural Networks. Retrieved Decmber 10, 2023 from https://arxiv.org/abs/1703.01340
  66. PLIN: A Persistent Learned Index for Non-Volatile Memory with High Performance and Instant Recovery. Proc. VLDB Endow. 16, 2 (oct 2022), 243–255. https://doi.org/10.14778/3565816.3565826
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets