Foundations of Modern Query Languages for Graph Databases (1610.06264v3)

Published 20 Oct 2016 in cs.DB

Abstract: We survey foundational features underlying modern graph query languages. We first discuss two popular graph data models: edge-labelled graphs, where nodes are connected by directed, labelled edges; and property graphs, where nodes and edges can further have attributes. Next we discuss the two most fundamental graph querying functionalities: graph patterns and navigational expressions. We start with graph patterns, in which a graph-structured query is matched against the data. Thereafter we discuss navigational expressions, in which patterns can be matched recursively against the graph to navigate paths of arbitrary length; we give an overview of what kinds of expressions have been proposed, and how they can be combined with graph patterns. We also discuss several semantics under which queries using the previous features can be evaluated, what effects the selection of features and semantics has on complexity, and offer examples of such features in three modern languages that are used to query graphs: SPARQL, Cypher and Gremlin. We conclude by discussing the importance of formalisation for graph query languages; a summary of what is known about SPARQL, Cypher and Gremlin in terms of expressivity and complexity; and an outline of possible future directions for the area.

Citations (367)

View on Semantic Scholar

Summary

The paper presents a comprehensive survey of key graph query language concepts, emphasizing the roles of edge-labelled and property graphs.
It details graph patterns and navigational queries, comparing evaluation models and highlighting complexity challenges like NP-completeness in bgps with projection.
The study contrasts SPARQL, Cypher, and Gremlin, underscoring their implementation differences and suggesting future research directions to enhance query standardization.

Overview of Modern Query Languages for Graph Databases

The paper "Foundations of Modern Query Languages for Graph Databases" by Renzo Angles, Marcelo Arenas, Pablo Barceló, Juan Reutter, and Domagoj Vrgoc provides a comprehensive survey of the underpinning features constituting contemporary query languages for graph databases. The authors embark on a meticulous exploration of the cardinal concepts central to graph query languages, focusing particularly on two primary models: edge-labelled graphs and property graphs, both foundational for various practical graph query languages.

Core Features and Graph Models

The paper discusses the two prominent graph data models: edge-labelled graphs, which are inherently crucial within theoretical frameworks, and property graphs, recognized for their practical utility, where nodes and edges are annotated with additional attributes. This distinction is pivotal as it addresses various querying complexities and expressivity differences encountered in these models, offering an intricate understanding of how each model supports particular language features.

Graph Patterns and Path Expressions

The survey dissects two fundamental querying functionalities: the graph patterns and navigational queries, forming the conceptual core of languages like SPARQL, Cypher, and Gremlin. Basic graph patterns (bgps) are elaborated as constructs matching graph-structured queries against data graphs, showcasing how these patterns are augmented into complex graph patterns (cgps) using operations analogous to those in relational databases, such as projection, union, and optional joins.

The theoretical implications of graph pattern semantics are addressed, contrasting different evaluation models: homomorphism-based, isomorphism-based, and simulation-based semantics, each offering distinct trade-offs in terms of complexity and expressivity. The paper judiciously highlights how these semantics impact the practicality of query evaluation, notably emphasizing the NP-completeness of evaluating bgps with projection.

Navigational Queries and Their Semantics

Further, the authors delve into navigational queries, particularly the regular path queries (RPQs), and their expanded forms, such as Two-way Regular Path Queries (2RPQs) and Nested Regular Expressions (NREs). They elucidate how these forms enable more nuanced graph navigation capabilities, catering to queries requiring transitive closure, cycle detection, and other complex navigational aspects not achievable through mere node or edge connectivity.

The paper presents a strategic analysis of various semantics applicable to path queries, from arbitrary path to shortest path semantics, further grappling with output modalities from boolean to graph-based solutions, underscoring how semantics choice influences computational feasibility and expressivity.

Comparative Analysis of SPARQL, Cypher, and Gremlin

In a pragmatic discourse, the implementation of these foundational features within SPARQL, Cypher, and Gremlin is scrutinized. The survey contrasts the declarative nature of SPARQL against the more imperative style inherent in Gremlin, while Cypher is shown to bridge aspects of both, supporting unique patterns via its distinct syntax and semantics. The complexity landscape for these languages is analyzed, noting that while SPARQL has undergone rigorous examination, the computational properties of Cypher and Gremlin remain under-explored, signaling potential areas for further research.

Implications and Future Directions

In synthesizing these insights, the authors underscore the importance of a robust theoretical foundation informing practical implementations of graph query languages. They highlight open questions in areas like efficient path enumeration under bag semantics and the intersection of graph analytics with conventional querying. Ultimately, the call to standardize elements of graph query languages to foster interoperability and performance is clear, advocating for future work in understanding the precise complexity and expressive boundaries of these promising languages.

The paper significantly contributes by meticulously examining the theoretical and practical aspects of graph query languages, providing a gateway for future explorations while aligning current understanding with ongoing developments in graph database technologies.

PDF Markdown