Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph Queries (1910.09017v8)

Published 20 Oct 2019 in cs.DB and cs.DC

Abstract: Graph processing has become an important part of multiple areas of computer science, such as machine learning, computational sciences, medical applications, social network analysis, and many others. Numerous graphs such as web or social networks may contain up to trillions of edges. Often, these graphs are also dynamic (their structure changes over time) and have domain-specific rich data associated with vertices and edges. Graph database systems such as Neo4j enable storing, processing, and analyzing such large, evolving, and rich datasets. Due to the sheer size of such datasets, combined with the irregular nature of graph processing, these systems face unique design challenges. To facilitate the understanding of this emerging domain, we present the first survey and taxonomy of graph database systems. We focus on identifying and analyzing fundamental categories of these systems (e.g., triple stores, tuple stores, native graph database systems, or object-oriented systems), the associated graph models (e.g., RDF or Labeled Property Graph), data organization techniques (e.g., storing graph data in indexing structures or dividing data into records), and different aspects of data distribution and query execution (e.g., support for sharding and ACID). 51 graph database systems are presented and compared, including Neo4j, OrientDB, or Virtuoso. We outline graph database queries and relationships with associated domains (NoSQL stores, graph streaming, and dynamic graph algorithms). Finally, we describe research and engineering challenges to outline the future of graph databases.

Citations (73)

View on Semantic Scholar

Summary

The paper introduces a comprehensive taxonomy of graph databases, categorizing data models, storage architectures, and query languages.
It evaluates 51 systems, comparing aspects like query execution, transaction support, and data distribution in OLTP and OLAP environments.
The study discusses future challenges and integration opportunities with AI, machine learning, and hardware accelerators for enhanced scalability.

Analysis and Taxonomy of Graph Databases: A Comprehensive Overview

The paper "Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph Queries" presents a rigorous survey and taxonomy of graph database systems. The authors categorize graph databases to provide a systematic understanding of various aspects such as data organization, system designs, and the diversity of graph queries. This work is aimed at experienced researchers and developers working in the domains of machine learning, social network analysis, computational sciences, and other fields where graph processing plays a fundamental role.

Graph databases (GDBs), such as Neo4j and OrientDB, have emerged as crucial technologies for storing, processing, and analyzing large, evolving, and rich graph datasets. The inherent complexities of graph algorithms, especially given their large sizes and the irregular nature of their computations, introduce unique challenges. These challenges are exacerbated by the need for low latency and high throughput in graph queries, which can be either localized to a small subgraph or span the entire graph structure. The paper addresses these challenges through a structured focus on key areas, including general design, data models and organization, data distribution, transactions, and queries.

Key Contributions

Taxonomy of Graph Databases: The paper introduces the first comprehensive taxonomy of graph databases, focusing on identifying and analyzing critical dimensions that impact the design of these systems. This includes data models (e.g., RDF, Labeled Property Graph), storage backends (e.g., triple stores, tuple stores, native GDBs), and the support for various graph query executions.
Evaluation and Comparison: It provides a comparative analysis of 51 graph database systems, including well-known systems like Neo4j, OrientDB, Graphflow, and proprietary systems like Microsoft's Graph Engine. This evaluation emphasizes aspects such as query execution, transaction support, data distribution, and storage backend.
Survey of Graph Queries and Workloads: The paper explores graph workloads, distinguishing between OLTP and OLAP queries and providing insights into graph queries beyond these simple classifications. It outlines local queries, neighborhood queries, graph traversal operations, and global graph analytics – critical tasks for real-world applications ranging from social networking to fraud detection.
Future Challenges: The authors present a discussion on potential directions for future research and engineering challenges. This includes the integration of graph databases with emerging technologies, improving scalability and performance, and enhancing support for more complex graph queries and analytics.

Implications and Future Developments

From a practical standpoint, the paper serves as a valuable resource for practitioners seeking insight into selecting appropriate graph database systems based on specific application requirements. It also contributes to setting benchmarks for evaluating graph databases in terms of performance, scalability, and support for various graph query languages such as SPARQL, Gremlin, and Cypher.

Theoretically, the taxonomy and detailed analysis provide a foundational framework for researchers to explore optimizations in graph database designs. This could involve advanced data structures, parallelization techniques, and distributed processing paradigms to handle the growing complexity and size of graph datasets efficiently.

One promising area for future exploration is the intersection of graph databases with AI and machine learning workflows, where efficient graph processing can significantly enhance data integration, pattern recognition, and decision-making processes. Additionally, with the rise of hardware accelerators such as GPUs and TPUs, exploring graph database designs that leverage such technologies could unlock new levels of performance and capability.

In summary, the comprehensive survey and taxonomy presented in this paper are instrumental for both understanding existing graph database systems and inspiring innovative approaches to meet the future demands of sophisticated graph data processing in various scientific and industrial applications.

PDF Markdown

Related Papers

YouTube

Show All Videos