GTX: A Write-Optimized Latch-free Graph Data System with Transactional Support (2405.01418v1)
Abstract: This paper introduces GTX a standalone main-memory write-optimized graph system that specializes in structural and graph property updates while maintaining concurrent reads and graph analytics with snapshot isolation-level transactional concurrency. Recent graph libraries target efficient concurrent read and write support while guaranteeing transactional consistency. However, their performance suffers for updates with strong temporal locality over the same vertexes and edges due to vertex-centric lock contentions. GTX introduces a new delta-chain-centric concurrency-control protocol that eliminates traditional mutually exclusive latches. GTX resolves the conflicts caused by vertex-level locking, and adapts to real-life workloads while maintaining sequential access to the graph's adjacency lists storage. This combination of features has been demonstrated to provide good performance in graph analytical queries. GTX's transactions support fast group commit, novel write-write conflict prevention, and lazy garbage collection. Based on extensive experimental and comparative studies, in addition to maintaining competitive concurrent read and analytical performance, GTX demonstrates high throughput over state-of-the-art techniques when handling concurrent transaction+analytics workloads. For write-heavy transactional workloads, GTX performs up to 11x better than the best-performing state-of-the-art systems in transaction throughput. At the same time, GTX does not sacrifice the performance of read-heavy analytical workloads, and has competitive performance similar to state-of-the-art systems.
- [n.d.]. China’s Singles’ Day shopping spree sees robust sales. http://www.xinhuanet.com/english/2019-11/11/c_138546429.htm
- [n.d.]. JanusGraph. https://janusgraph.org/
- [n.d.]. Neofj. https://neo4j.com/
- [n.d.]. New Tweets per second record, and how! https://blog.twitter.com/engineering/en_us/a/2013/new-tweets-per-second-record-and-how
- [n.d.]. OpenMP. https://www.openmp.org/
- [n.d.]. OrientDB. https://orientdb.org/
- 2023. ByteDance. https://www.bytedance.com/en/
- 2024. Get Started with SAP HANA Graph. https://developers.sap.com/group.hana-aa-graph-overview.html
- 2024. OQGRAPH Overview. https://mariadb.com/kb/en/oqgraph-overview/
- 2024. Oracle Big Data Spatial and Graph. https://www.oracle.com/database/technologies/bigdata-spatialandgraph.html
- Wing Lung Ngai Stijn Heldens Arnau Prat-Pérez Thomas Manhardto Hassan Chafio Mihai Capotă Narayanan Sundaram Michael Anderson Ilie Gabriel Tănase Yinglong Xia Lifeng Nai Alexandru Iosup, Tim Hegeman and Peter Boncz. 2017. LDBC Graphalytics Benchmark specification, v0.9.0.
- LinkBench: A Database Benchmark Based on the Facebook Social Graph (SIGMOD ’13). https://doi.org/10.1145/2463676.2465296
- Bztree: A High-Performance Latch-Free Range Index for Non-Volatile Memory. Proc. VLDB Endow. 11, 5 (2018). https://doi.org/10.1145/3164135.3164147
- Greg Barnes. 1993. A Method for Implementing Lock-Free Shared-Data Structures. In Proceedings of the Fifth Annual ACM Symposium on Parallel Algorithms and Architectures (Velen, Germany) (SPAA ’93). Association for Computing Machinery, New York, NY, USA, 261–270. https://doi.org/10.1145/165231.165265
- A critique of ANSI SQL isolation levels. SIGMOD Rec. 24, 2 (may 1995), 1–10. https://doi.org/10.1145/568271.223785
- The Graph Database Interface: Scaling Online Transactional and Analytical Graph Workloads to Hundreds of Thousands of Cores. arXiv:2305.11162 [cs.DB]
- Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph Queries. ACM Comput. Surv. 56, 2, Article 31 (sep 2023), 40 pages. https://doi.org/10.1145/3604932
- Kai Zeng Bolin Ding and Wenyuan Yu. 2020. Alibaba Sponsor Talk at VLDB.
- A1: A Distributed In-Memory Graph Database. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (SIGMOD ’20). 329–344. https://doi.org/10.1145/3318464.3386135
- G-Tran: A High Performance Distributed Graph Database with a Decentralized Architecture. Proc. VLDB Endow. 15, 11 (jul 2022), 2545–2558. https://doi.org/10.14778/3551793.3551813
- PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs. ACM Trans. Parallel Comput. (2019). https://doi.org/10.1145/3298989
- TAOBench: an end-to-end benchmark for social network workloads. Proc. VLDB Endow. 15, 9 (2022). https://doi.org/10.14778/3538598.3538616
- Mammoths Are Slow: The Overlooked Transactions of Graph Data. ([n. d.]).
- ByteGAP: A Non-continuous Distributed Graph Computing System using Persistent Memory (CEUR Workshop Proceedings). CEUR-WS.org. https://ceur-ws.org/Vol-3462/ADMS7.pdf
- Dean De Leo. [n.d.]. graphlog. https://github.com/whatsthecraic/graphlog
- Dean De Leo and Peter Boncz. 2021. Teseo and the Analysis of Structural Dynamic Graphs. 14, 6 (feb 2021), 1053–1066. https://doi.org/10.14778/3447689.3447708
- Low-Latency Graph Streaming Using Compressed Purely-Functional Trees. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2019). https://doi.org/10.1145/3314221.3314598
- Weaver: A High-Performance, Transactional Graph Database Based on Refinable Timestamps. Proc. VLDB Endow. 9, 11 (jul 2016), 852–863. https://doi.org/10.14778/2983200.2983202
- STINGER: High performance data structure for streaming graphs. In 2012 IEEE Conference on High Performance Extreme Computing. 1–5. https://doi.org/10.1109/HPEC.2012.6408680
- On Power-Law Relationships of the Internet Topology (SIGCOMM ’99). https://doi.org/10.1145/316188.316229
- GraphScope: a unified engine for big graph processing. 14, 12 (jul 2021), 2879–2892. https://doi.org/10.14778/3476311.3476369
- RisGraph: A Real-Time Streaming System for Evolving Graphs to Support Sub-Millisecond Per-Update Analysis at Millions Ops/s (SIGMOD ’21). https://doi.org/10.1145/3448016.3457263
- KÙZU Graph Database Management System. CIDR.
- CSR++: A Fast, Scalable, Update-Friendly Graph Data Structure. https://doi.org/10.4230/LIPIcs.OPODIS.2020.17
- Sortledton: A Universal, Transactional Graph Data Structure. Proc. VLDB Endow. 15, 6 (feb 2022), 1173–1186. https://doi.org/10.14778/3514061.3514065
- Evolution of an Online Social Aggregation Network: An Empirical Study. In Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement (IMC ’09). https://doi.org/10.1145/1644893.1644931
- Real-Time Twitter Recommendation: Online Motif Detection in Large Dynamic Graphs. Proc. VLDB Endow. 7, 13 (aug 2014). https://doi.org/10.14778/2733004.2733010
- Extending In-Memory Relational Database Engines with Native Graph Support. In International Conference on Extending Database Technology. https://api.semanticscholar.org/CorpusID:11389988
- Group Commit Timers and High Volume Transaction Systems. 301–329. https://doi.org/10.1007/3-540-51085-0_52
- Jim Webber Ian Robinson and Emil Eifrem. 2015. Graph Databases: New Opportunities for Connected Data (2nd ed.). O’Reilly Media, Inc.
- LDBC Graphalytics: A Benchmark for Large-Scale Graph Analysis on Parallel and Distributed Platforms. 9, 13 (2016), 12. https://doi.org/10.14778/3007263.3007270
- Fast and Efficient Update Handling for Graph H2TAP. In Proceedings 26th International Conference on Extending Database Technology, EDBT 2023, Ioannina, Greece, March 28-31, 2023. OpenProceedings.org, 723–736. https://doi.org/10.48786/edbt.2023.60
- Authentication graphs: Analyzing user behavior within an enterprise network. Computers & Security 48 (2015), 150–166. https://doi.org/10.1016/j.cose.2014.09.001
- Kenneth C. Knowlton. 1965. A Fast Storage Allocator. Commun. ACM 8, 10 (oct 1965), 623–624. https://doi.org/10.1145/365628.365655
- Pradeep Kumar and H. Howie Huang. 2020. GraphOne: A Data Store for Real-Time Analytics on Evolving Graphs. ACM Trans. Storage 15, 4 (2020). https://doi.org/10.1145/3364180
- Jérôme Kunegis. [n.d.]. The KONECT Project. http://konect.cc/
- Geof Langdale. [n.d.]. Lock-Free Programming. https://www.cs.cmu.edu/~410-s05/lectures/L31_LockFree.pdf
- Jure Leskovec and Rok Sosič. 2016. SNAP: A General-Purpose Network Analysis and Graph-Mining Library. ACM Trans. Intell. Syst. Technol. 8, 1 (2016). https://doi.org/10.1145/2898361
- The Bw-Tree: A B-tree for New Hardware Platforms. In 2013 IEEE 29th International Conference on Data Engineering (ICDE) (2013 ieee 29th international conference on data engineering (icde) ed.). IEEE. https://www.microsoft.com/en-us/research/publication/the-bw-tree-a-b-tree-for-new-hardware/
- High Performance Transactions in Deuteronomy. In Conference on Innovative Data Systems Research (CIDR 2015). https://www.microsoft.com/en-us/research/publication/high-performance-transactions-in-deuteronomy/
- ByteGraph: A High-Performance Distributed Graph Database in ByteDance. Proc. VLDB Endow. 15, 12 (2022). https://doi.org/10.14778/3554821.3554824
- Performant Almost-Latch-Free Data Structures Using Epoch Protection. In Data Management on New Hardware (Philadelphia, PA, USA) (DaMoN’22). Association for Computing Machinery, New York, NY, USA, Article 1, 10 pages. https://doi.org/10.1145/3533737.3535091
- LLAMA: Efficient graph analytics using Large Multiversioned Arrays. In 2015 IEEE 31st International Conference on Data Engineering. 363–374. https://doi.org/10.1109/ICDE.2015.7113298
- Terrace: A Hierarchical Graph Container for Skewed Dynamic Graphs (SIGMOD ’21). https://doi.org/10.1145/3448016.3457313
- Concurrent Unrolled Skiplist. In 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS). https://doi.org/10.1109/ICDCS.2019.00157
- Real-Time Constrained Cycle Detection in Large Dynamic Graphs. 11, 12 (2018). https://doi.org/10.14778/3229863.3229874
- The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing. Proc. VLDB Endow. 11, 4 (2017). https://doi.org/10.1145/3186728.3164139
- GraphJet: Real-Time Content Recommendations at Twitter. Proc. VLDB Endow. 9, 13 (2016). https://doi.org/10.14778/3007263.3007267
- Retrofitting High Availability Mechanism to Tame Hybrid Transaction/Analytical Processing. In 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21). USENIX Association, 219–238. https://www.usenix.org/conference/osdi21/presentation/shen
- Bridging the Gap between Relational OLTP and Graph-based OLAP. In 2023 USENIX Annual Technical Conference (USENIX ATC 23). USENIX Association. https://www.usenix.org/conference/atc23/presentation/shen
- Spruce: a Fast yet Space-saving Structure for Dynamic Graph Storage. Proc. ACM Manag. Data 2, 1, Article 27 (mar 2024), 26 pages. https://doi.org/10.1145/3639282
- The topology of interbank payment flows. Physica A: Statistical Mechanics and its Applications (2007). https://doi.org/10.1016/j.physa.2006.11.093
- Building a Bw-Tree Takes More Than Just Buzz Words (SIGMOD ’18). Association for Computing Machinery, New York, NY, USA, 473–488. https://doi.org/10.1145/3183713.3196895
- Todd Warszawski and Peter Bailis. 2017. ACIDRain: Concurrency-Related Attacks on Database-Backed Web Applications. In Proceedings of the 2017 ACM International Conference on Management of Data (Chicago, Illinois, USA) (SIGMOD ’17). Association for Computing Machinery, New York, NY, USA, 5–20. https://doi.org/10.1145/3035918.3064037
- Preserving reciprocal consistency in distributed graph databases. In Proceedings of the 7th Workshop on Principles and Practice of Consistency for Distributed Data (PaPoC ’20). Association for Computing Machinery. https://doi.org/10.1145/3380787.3393675
- Architecture-Intact Oracle for Fastest Path and Time Queries on Dynamic Spatial Networks (SIGMOD ’20). Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3318464.3389718
- Brian Wheatman and Randal Burns. 2021. Streaming Sparse Graphs using Efficient Dynamic Sets. In 2021 IEEE International Conference on Big Data (Big Data). 284–294. https://doi.org/10.1109/BigData52589.2021.9671836
- Brian Wheatman and Helen Xu. 2018. Packed Compressed Sparse Row: A Dynamic Graph Representation. In 2018 IEEE High Performance extreme Computing Conference (HPEC). 1–7. https://doi.org/10.1109/HPEC.2018.8547566
- DuckPGQ: Bringing SQL/PGQ to DuckDB. Proc. VLDB Endow. 16, 12 (aug 2023), 4034–4037. https://doi.org/10.14778/3611540.3611614
- An Empirical Evaluation of In-Memory Multi-Version Concurrency Control. Proc. VLDB Endow. 10, 7 (2017). https://doi.org/10.14778/3067421.3067427
- Quadboost: A Scalable Concurrent Quadtree. IEEE Transactions on Parallel & Distributed Systems 29, 03 (2018). https://doi.org/10.1109/TPDS.2017.2762298
- LiveGraph: A Transactional Graph Storage System with Purely Sequential Adjacency List Scans. 13, 7 (mar 2020), 1020–1034. https://doi.org/10.14778/3384345.3384351