Renzo Angles, Marcelo Arenas, Pablo Barceló, Aidan Hogan, Juan Reutter, and Domagoj Vrgoč. 2017. Foundations of Modern Query Languages for Graph Databases. <i>ACM Comput. Surv.</i> 50, 5, Article 68 (September 2018), 40 pages. DOI:https://doi.org/10.1145/3104031
Abstract
We survey foundational features underlying modern graph query languages.
We first discuss two popular graph data models: edge-labelled graphs, where nodes are connected by directed, labelled edges, and property graphs, where nodes and edges can further have attributes.
Next we discuss the two most fundamental graph querying functionalities: graph patterns and navigational expressions. We start with graph patterns, in which a graph-structured query is matched against the data. Thereafter, we discuss navigational expressions, in which patterns can be matched recursively against the graph to navigate paths of arbitrary length; we give an overview of what kinds of expressions have been proposed and how they can be combined with graph patterns.
We also discuss several semantics under which queries using the previous features can be evaluated, what effects the selection of features and semantics has on complexity, and offer examples of such features in three modern languages that are used to query graphs: SPARQL, Cypher, and Gremlin.
We conclude by discussing the importance of formalisation for graph query languages; a summary of what is known about SPARQL, Cypher, and Gremlin in terms of expressivity and complexity; and an outline of possible future directions for the area.
1 INTRODUCTION
Although graphs can still be (and sometimes still are) stored in relational databases, the choice to use a graph database for certain domains has significant benefits in terms of querying, where the emphasis shifts from joining various tables to specifying graph patterns and navigational patterns between nodes that may span arbitrary-length paths.
By organising our survey at the level of query features, rather than languages, we provide a foundational introduction to the area, which helps to understand, and even define, individual query languages as the composition of features. We consider two high-level categories of query features: graph patterns and path expressions.
Specific novel aspects of this survey include a new formalisation of the property graph model; discussion of how this model can be understood through the lens of existing theory; comparisons of practical aspects of the SPARQL, Gremlin, and Cypher query languages and the semantics they adopt; and examples of how the design of such languages influences the complexity of query evaluation.
2 GRAPH DATA MODELS
Graphs can be used to encode data whereby nodes represent objects in a domain of interest, and edges represent relationships between these objects.
Edge-labelled graphs. ..., where we additionally assign labels to edges that indicate the different types of relationships in the domain being described.
Definition 2.1 (Edge-labelled graph). An edge-labelled graph G is a pair (V,E), where:
(1) V is a finite set of vertices (or nodes).
(2) E is a finite set of edges; formally, E ⊆V ×Lab ×V where Lab is a set of labels.
Edge-labelled graphs are widely adopted in practice where, for example, they form the basis of the Resource Description Framework (RDF) standard used for encoding machine-readable content on the Web (Klyne et al. 2014). An RDF graph is simply a set of triples analogous to the edges in a graph database, but with some further detailing: in the case of RDF, the set V can be partitioned into disjoint sets of IRIs, literals, and blank nodes, and the set Lab is a subset of IRIs (not necessarily disjoint from V )
With this principle of using nodes to represent n-ary relations (where n >2), it becomes feasible to encode increasingly more complex information in an edge-labelled graph, such as, for example, to encode that the same character can be portrayed by different actors, and so forth.
Property Graphs. In edge-labelled graphs, we use labels to indicate the type of edge, where multiple edges may have the same type. In a similar way, we could consider labelling nodes.
... having node labels as part of the model can offer a more direct abstraction that is easier for users to query and understand. In the same way, it is often cumbersome to add information about the edges to an edge-labelled graph.
... Thus, for scenarios where various new types of meta-information may regularly need to be added to edges or nodes, the most general and widely adopted alternative is to use an extension of an edge-labelled graph called a property graph.
In property graphs, both edges and nodes can be labelled. Each edge and node is additionally associated with a unique identifier that can be used as a “hook” to associate additional meta-information—in the form of a set of property-value pairs called attributes—directly to that edge or node. Again, while it would be possible to instead encode attributes and labels as additional edges, in practice, such features allow one to directly annotate the graph without modifying its overall structure.
Definition 2.3 (Property graph). A property graph G is a tuple (V,E,ρ,λ,σ), where:
(1) V is a finite set of vertices (or nodes).
(2) E is a finite set of edges such that V and E have no elements in common.
(3) ρ : E → (V ×V )is a total function. Intuitively, ρ(e)= (v1,v2) indicates that e is a directed edge from node v1 to node v2 in G.
(4) λ : (V ∪ E)→Lab is a total function with Lab a set of labels. Intuitively, if v ∈V (respectively, e ∈ E) and ρ(v)= (respectively, ρ(e)=) , then is the label of node v (respectively, edge e) in G.
(5) σ : (V ∪ E)×Prop →Val is a partial function with Prop a finite set of properties and Val a set of values. Intuitively, if v ∈ V (respectively, e ∈ E), p ∈ Prop and σ (v,p)=s (respectively, σ (e,p)=s), then s is the value of property p for node v (respectively, edge e) in the property graph G
In our definition of a property graph, each node and edge is associated with a single label and at most one value for each attribute property. In some applications, it may be useful to have multiple values in these positions. We could thus consider a variant of property graphs, which we call multi-valued property graphs, to allow multiple labels and multi-valued attribute properties within the property graph model: in Definition 2.3, the mapping λ would then return a set of labels and σ would return a set of values. In practice, engines may have custom policies
Comentários
Postar um comentário
Sinta-se a vontade para comentar. Críticas construtivas são sempre bem vindas.