Algorithmic Complexity Attacks on Dynamic Learned Indexes (2403.12433v1)
Abstract: Learned Index Structures (LIS) view a sorted index as a model that learns the data distribution, takes a data element key as input, and outputs the predicted position of the key. The original LIS can only handle lookup operations with no support for updates, rendering it impractical to use for typical workloads. To address this limitation, recent studies have focused on designing efficient dynamic learned indexes. ALEX, as the pioneering dynamic learned index structures, enables dynamism by incorporating a series of design choices, including adaptive key space partitioning, dynamic model retraining, and sophisticated engineering and policies that prioritize read/write performance. While these design choices offer improved average-case performance, the emphasis on flexibility and performance increases the attack surface by allowing adversarial behaviors that maximize ALEX's memory space and time complexity in worst-case scenarios. In this work, we present the first systematic investigation of algorithmic complexity attacks (ACAs) targeting the worst-case scenarios of ALEX. We introduce new ACAs that fall into two categories, space ACAs and time ACAs, which target the memory space and time complexity, respectively. First, our space ACA on data nodes exploits ALEX's gapped array layout and uses Multiple-Choice Knapsack (MCK) to generate an optimal adversarial insertion plan for maximizing the memory consumption at the data node level. Second, our space ACA on internal nodes exploits ALEX's catastrophic cost mitigation mechanism, causing an out-of-memory error with only a few hundred adversarial insertions. Third, our time ACA generates pathological insertions to increase the disparity between the actual key distribution and the linear models of data nodes, deteriorating the runtime performance by up to 1,641X compared to ALEX operating under legitimate workloads.
- 2010. Amazon Mechanical Turkon. Retrieved Decmber 10, 2023 from https://www.mturk.com/
- 2017. OpenStreetMap Public Data Set Now Available on AWS. Retrieved Decmber 10, 2023 from https://aws.amazon.com/about-aws/whats-new/2017/06/openstreetmap-public-data-set-now-available-on-aws/
- 2023. KernelDensity. Retrieved Decmber 10, 2023 from https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KernelDensity.html
- Learned Indexes for a Google-scale Disk-based Database. https://arxiv.org/pdf/2012.12501.pdf
- SurgeProtector: Mitigating Temporal Algorithmic Complexity Attacks Using Adversarial Scheduling. In Proceedings of the ACM SIGCOMM 2022 Conference (Amsterdam, Netherlands) (SIGCOMM ’22). Association for Computing Machinery, New York, NY, USA, 723–738. https://doi.org/10.1145/3544216.3544250
- Noa Bar-Yosef and Avishai Wool. 2007. Remote Algorithmic Complexity Attacks against Randomized Hash Tables. In E-business and Telecommunications - 4th International Conference, ICETE 2007, Barcelona, Spain, July 28-31, 2007, Revised Selected Papers (Communications in Computer and Information Science), Joaquim Filipe and Mohammad S. Obaidat (Eds.), Vol. 23. Springer, 162–174. https://doi.org/10.1007/978-3-540-88653-2_12
- Vulnerability of Network Mechanisms to Sophisticated DDoS Attacks. IEEE Trans. Comput. 62, 5 (2013), 1031–1043. https://doi.org/10.1109/TC.2012.49
- Is Data Clustering in Adversarial Settings Secure?. In Proceedings of the 2013 ACM Workshop on Artificial Intelligence and Security (Berlin, Germany) (AISec ’13). Association for Computing Machinery, New York, NY, USA, 87–98. https://doi.org/10.1145/2517312.2517321
- Tastes Great! Less Filling! High Performance and Accurate Training Data Collection for Self-Driving Database Management Systems. In Proceedings of the 2022 International Conference on Management of Data (Philadelphia, PA, USA) (SIGMOD ’22). Association for Computing Machinery, New York, NY, USA, 617–630. https://doi.org/10.1145/3514221.3517845
- Exploiting Unix File-System Races via Algorithmic Complexity Attacks. In 2009 30th IEEE Symposium on Security and Privacy. 27–41. https://doi.org/10.1109/SP.2009.10
- Characterizing, Modeling, and Benchmarking RocksDB Key-Value Workloads at Facebook. In 18th USENIX Conference on File and Storage Technologies (FAST 20). USENIX Association, Santa Clara, CA, 209–223. https://www.usenix.org/conference/fast20/presentation/cao-zhichao
- Bigtable: A Distributed Storage System for Structured Data. In 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI 06). USENIX Association, Seattle, WA. https://www.usenix.org/conference/osdi-06/bigtable-distributed-storage-system-structured-data
- Benchmarking Cloud Serving Systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (Indianapolis, Indiana, USA) (SoCC ’10). Association for Computing Machinery, New York, NY, USA, 143–154. https://doi.org/10.1145/1807128.1807152
- Scott A. Crosby and Dan S. Wallach. 2003. Denial of Service via Algorithmic Complexity Attacks. In 12th USENIX Security Symposium (USENIX Security 03). USENIX Association, Washington, D.C. https://www.usenix.org/conference/12th-usenix-security-symposium/denial-service-algorithmic-complexity-attacks
- Tuple Space Explosion: A Denial-of-Service Attack against a Software Packet Classifier. In Proceedings of the 15th International Conference on Emerging Networking Experiments And Technologies (Orlando, Florida) (CoNEXT ’19). Association for Computing Machinery, New York, NY, USA, 292–304. https://doi.org/10.1145/3359989.3365431
- On the Feasibility and Enhancement of the Tuple Space Explosion Attack against Open vSwitch. https://arxiv.org/abs/2011.09107.
- The Impact of Regular Expression Denial of Service (ReDoS) in Practice: An Empirical Study at the Ecosystem Scale. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Lake Buena Vista, FL, USA) (ESEC/FSE 2018). Association for Computing Machinery, New York, NY, USA, 246–256. https://doi.org/10.1145/3236024.3236027
- Sarang Dharmapurikar and Vern Paxson. 2005. Robust TCP Stream Reassembly in the Presence of Adversaries. In 14th USENIX Security Symposium (USENIX Security 05). USENIX Association, Baltimore, MD. https://www.usenix.org/conference/14th-usenix-security-symposium/robust-tcp-stream-reassembly-presence-adversaries
- ALEX: An Updatable Adaptive Learned Index. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (Portland, OR, USA) (SIGMOD ’20). Association for Computing Machinery, New York, NY, USA, 969–984. https://doi.org/10.1145/3318464.3389711
- Tsunami: A Learned Multi-Dimensional Index for Correlated Data and Skewed Workloads. Proc. VLDB Endow. 14, 2 (oct 2020), 74–86. https://doi.org/10.14778/3425879.3425880
- Local Model Poisoning Attacks to Byzantine-Robust Federated Learning. In 29th USENIX Security Symposium (USENIX Security 20). USENIX Association, 1605–1622. https://www.usenix.org/conference/usenixsecurity20/presentation/fang
- Paolo Ferragina and Giorgio Vinciguerra. 2020. The PGM-Index: A Fully-Dynamic Compressed Learned Index with Provable Worst-Case Bounds. Proc. VLDB Endow. 13, 8 (apr 2020), 1162–1175. https://doi.org/10.14778/3389133.3389135
- CrowdDB: Answering Queries with Crowdsourcing. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (Athens, Greece) (SIGMOD ’11). Association for Computing Machinery, New York, NY, USA, 61–72. https://doi.org/10.1145/1989323.1989331
- FITing-Tree: A Data-Aware Index Structure. In Proceedings of the 2019 International Conference on Management of Data (Amsterdam, Netherlands) (SIGMOD ’19). Association for Computing Machinery, New York, NY, USA, 1189–1206. https://doi.org/10.1145/3299869.3319860
- Gaston H. Gonnet. 1981. Expected Length of the Longest Probe Sequence in Hash Code Searching. J. ACM 28, 2 (apr 1981), 289–304. https://doi.org/10.1145/322248.322254
- Ali Hadian and Thomas Heinis. 2019. Considerations for Handling Updates in Learned Index Structures. In Proceedings of the Second International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (Amsterdam, Netherlands) (aiDM ’19). Association for Computing Machinery, New York, NY, USA, Article 3, 4 pages. https://doi.org/10.1145/3329859.3329874
- Nathan Hauke and David Renardy. 2019. Denial of Service with a Fistful of Packets: Exploiting Algorithmic Complexity Vulnerabilities. Retrieved December 10, 2023 from https://www.blackhat.com/us-19/briefings/schedule/#denial-of-service-with-a-fistful-of-packets-exploiting-algorithmic-complexity-vulnerabilities-16445
- Adam Jacobson and David Renardy. 2019. Algorithmic Complexity Vulnerabilities: An Introduction. Retrieved Decmber 10, 2023 from https://twosixtech.com/algorithmic-complexity-vulnerabilities-an-introduction/
- Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning. arXiv:1804.00308 [cs.CR]
- SOSD: A Benchmark for Learned Indexes. https://arxiv.org/abs/1911.13014.
- RadixSpline: A Single-Pass Learned Index. In Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (Portland, Oregon) (aiDM ’20). Association for Computing Machinery, New York, NY, USA, Article 5, 5 pages. https://doi.org/10.1145/3401071.3401659
- CrowdForge: Crowdsourcing Complex Work. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (Santa Barbara, California, USA) (UIST ’11). Association for Computing Machinery, New York, NY, USA, 43–52. https://doi.org/10.1145/2047196.2047202
- The Price of Tailoring the Index to Your Data: Poisoning Attacks on Learned Index Structures. In Proceedings of the 2022 International Conference on Management of Data (Philadelphia, PA, USA) (SIGMOD ’22). Association for Computing Machinery, New York, NY, USA, 1331–1344. https://doi.org/10.1145/3514221.3517867
- SageDB: A Learned Database System. In 9th Biennial Conference on Innovative Data Systems Research, CIDR 2019, Asilomar, CA, USA, January 13-16, 2019, Online Proceedings. www.cidrdb.org. http://cidrdb.org/cidr2019/papers/p117-kraska-cidr19.pdf
- The Case for Learned Index Structures. In Proceedings of the 2018 International Conference on Management of Data (Houston, TX, USA) (SIGMOD ’18). Association for Computing Machinery, New York, NY, USA, 489–504. https://doi.org/10.1145/3183713.3196909
- PerfFuzz: Automatically Generating Pathological Inputs. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis (Amsterdam, Netherlands) (ISSTA 2018). Association for Computing Machinery, New York, NY, USA, 254–265. https://doi.org/10.1145/3213846.3213874
- FINEdex: A Fine-Grained Learned Index Scheme for Scalable and Concurrent Memory Systems. Proc. VLDB Endow. 15, 2 (oct 2021), 321–334. https://doi.org/10.14778/3489496.3489512
- APEX: A High-Performance Learned Index on Persistent Memory. Proc. VLDB Endow. 15, 3 (nov 2021), 597–610. https://doi.org/10.14778/3494124.3494141
- FILM: A Fully Learned Index for Larger-Than-Memory Databases. Proc. VLDB Endow. 16, 3 (nov 2022), 561–573. https://doi.org/10.14778/3570690.3570704
- Resource Management with Deep Reinforcement Learning. In Proceedings of the 15th ACM Workshop on Hot Topics in Networks (Atlanta, GA, USA) (HotNets ’16). Association for Computing Machinery, New York, NY, USA, 50–56. https://doi.org/10.1145/3005745.3005750
- Learning Scheduling Algorithms for Data Processing Clusters. In Proceedings of the ACM Special Interest Group on Data Communication (Beijing, China) (SIGCOMM ’19). Association for Computing Machinery, New York, NY, USA, 270–288. https://doi.org/10.1145/3341302.3342080
- Human-Powered Sorts and Joins. Proc. VLDB Endow. 5, 1 (sep 2011), 13–24. https://doi.org/10.14778/2047485.2047487
- Benchmarking Learned Indexes. Proc. VLDB Endow. 14, 1 (sep 2020), 1–13. https://doi.org/10.14778/3421424.3421425
- Learning Multi-Dimensional Indexes. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (Portland, OR, USA) (SIGMOD ’20). Association for Computing Machinery, New York, NY, USA, 985–1000. https://doi.org/10.1145/3318464.3380579
- The Log-Structured Merge-Tree (LSM-Tree). Acta Inf. 33, 4 (jun 1996), 351–385. https://doi.org/10.1007/s002360050048
- Deco: A System for Declarative Crowdsourcing. Proc. VLDB Endow. 5, 12 (aug 2012), 1990–1993. https://doi.org/10.14778/2367502.2367555
- Automated Synthesis of Adversarial Workloads for Network Functions. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication (Budapest, Hungary) (SIGCOMM ’18). Association for Computing Machinery, New York, NY, USA, 372–385. https://doi.org/10.1145/3230543.3230573
- Laurent Perron and Vincent Furnon. 2023. OR-Tools. Google. Retrieved Decmber 10, 2023 from https://developers.google.com/optimization/cp/cp_solver/
- SlowFuzz: Automated Domain-Independent Detection of Algorithmic Complexity Vulnerabilities. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (Dallas, Texas, USA) (CCS ’17). Association for Computing Machinery, New York, NY, USA, 2155–2168. https://doi.org/10.1145/3133956.3134073
- Raghu Ramakrishnan and Johannes Gehrke. 2002. Database Management Systems (3 ed.). McGraw-Hill, Inc., USA.
- Learned Systems Security.
- ReScue: Crafting Regular Expression DoS Attacks. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (Montpellier, France) (ASE ’18). Association for Computing Machinery, New York, NY, USA, 225–235. https://doi.org/10.1145/3238147.3238159
- Improving the Resilience of an IDS against Performance Throttling Attacks. In Security and Privacy in Communication Networks - 8th International ICST Conference, SecureComm 2012, Padua, Italy, September 3-5, 2012. Revised Selected Papers (Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering), Angelos D. Keromytis and Roberto Di Pietro (Eds.), Vol. 106. Springer, 167–184. https://doi.org/10.1007/978-3-642-36883-7_11
- Learning Relaxed Belady for Content Distribution Network Caching. In 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20). USENIX Association, Santa Clara, CA, 529–544. https://www.usenix.org/conference/nsdi20/presentation/song
- When Does Machine Learning FAIL? Generalized Transferability for Evasion and Poisoning Attacks. In 27th USENIX Security Symposium (USENIX Security 18). USENIX Association, Baltimore, MD, 1299–1316. https://www.usenix.org/conference/usenixsecurity18/presentation/suciu
- Learned Index: A Comprehensive Experimental Evaluation. Proc. VLDB Endow. 16, 8 (jun 2023), 1992–2004. https://doi.org/10.14778/3594512.3594528
- Learned Indexes for Dynamic Workloads. https://arxiv.org/abs/1902.00655.
- Juha-Matti Tilli. 2018. CVE-2018-5390: Linux Kernel TCP Reassembly Algorithm Lets Remote Users Consume Excessive CPU Resources on the Target System. Retrieved Decmber 10, 2023 from https://ubuntu.com/security/cve-2018-5390
- Automatic Database Management System Tuning Through Large-Scale Machine Learning. In Proceedings of the 2017 ACM International Conference on Management of Data (Chicago, Illinois, USA) (SIGMOD ’17). Association for Computing Machinery, New York, NY, USA, 1009–1024. https://doi.org/10.1145/3035918.3064029
- Are Updatable Learned Indexes Ready? Proc. VLDB Endow. 15, 11 (jul 2022), 3004–3017. https://doi.org/10.14778/3551793.3551848
- Updatable Learned Index with Precise Positions. Proc. VLDB Endow. 14, 8 (apr 2021), 1276–1288. https://doi.org/10.14778/3457390.3457393
- NFL: Robust Learned Index via Distribution Transformation. Proc. VLDB Endow. 15, 10 (jun 2022), 2188–2200. https://doi.org/10.14778/3547305.3547322
- Static Detection of DoS Vulnerabilities in Programs That Use Regular Expressions. In Proceedings, Part II, of the 23rd International Conference on Tools and Algorithms for the Construction and Analysis of Systems - Volume 10206. Springer-Verlag, Berlin, Heidelberg, 3–20. https://doi.org/10.1007/978-3-662-54580-5_1
- Is Feature Selection Secure against Training Data Poisoning?. In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37 (Lille, France) (ICML’15). JMLR.org, 1689–1698.
- Generative Poisoning Attack Method Against Neural Networks. Retrieved Decmber 10, 2023 from https://arxiv.org/abs/1703.01340
- PLIN: A Persistent Learned Index for Non-Volatile Memory with High Performance and Instant Recovery. Proc. VLDB Endow. 16, 2 (oct 2022), 243–255. https://doi.org/10.14778/3565816.3565826