Pular para o conteúdo principal

Foundations of Modern Query Languages for Graph Databases - Leitura de Artigo I

Renzo Angles, Marcelo Arenas, Pablo Barceló, Aidan Hogan, Juan Reutter, and Domagoj Vrgoč. 2017. Foundations of Modern Query Languages for Graph Databases. <i>ACM Comput. Surv.</i> 50, 5, Article 68 (September 2018), 40 pages. DOI:https://doi.org/10.1145/3104031

Abstract

We survey foundational features underlying modern graph query languages. 

We first discuss two popular graph data models: edge-labelled graphs, where nodes are connected by directed, labelled edges, and property graphs, where nodes and edges can further have attributes. 

Next we discuss the two most fundamental graph querying functionalities: graph patterns and navigational expressions. We start with graph patterns, in which a graph-structured query is matched against the data. Thereafter, we discuss navigational expressions, in which patterns can be matched recursively against the graph to navigate paths of arbitrary length; we give an overview of what kinds of expressions have been proposed and how they can be combined with graph patterns. 

We also discuss several semantics under which queries using the previous features can be evaluated, what effects the selection of features and semantics has on complexity, and offer examples of such features in three modern languages that are used to query graphs: SPARQL, Cypher, and Gremlin. 

We conclude by discussing the importance of formalisation for graph query languages; a summary of what is known about SPARQL, Cypher, and Gremlin in terms of expressivity and complexity; and an outline of possible future directions for the area.

1 INTRODUCTION

Although graphs can still be (and sometimes still are) stored in relational databases, the choice to use a graph database for certain domains has significant benefits in terms of querying, where the emphasis shifts from joining various tables to specifying graph patterns and navigational patterns between nodes that may span arbitrary-length paths. 

By organising our survey at the level of query features, rather than languages, we provide a foundational introduction to the area, which helps to understand, and even define, individual query languages as the composition of features. We consider two high-level categories of query features: graph patterns and path expressions.   

Specific novel aspects of this survey include a new formalisation of the property graph model; discussion of how this model can be understood through the lens of existing theory; comparisons of practical aspects of the SPARQL, Gremlin, and Cypher query languages and the semantics they adopt; and examples of how the design of such languages influences the complexity of query evaluation.

  2 GRAPH DATA MODELS

Graphs can be used to encode data whereby nodes represent objects in a domain of interest, and edges represent relationships between these objects. 

Edge-labelled graphs. ..., where we additionally assign labels to edges that indicate the different types of relationships in the domain being described. 

Definition 2.1 (Edge-labelled graph). An edge-labelled graph G is a pair (V,E), where:
(1) V is a finite set of vertices (or nodes).
(2) E is a finite set of edges; formally, E V ×Lab ×V where Lab is a set of labels.

Edge-labelled graphs are widely adopted in practice where, for example, they form the basis of the Resource Description Framework (RDF) standard used for encoding machine-readable content on the Web (Klyne et al. 2014). An RDF graph is simply a set of triples analogous to the edges in a graph database, but with some further detailing: in the case of RDF, the set V can be partitioned into disjoint sets of IRIs, literals, and blank nodes, and the set Lab is a subset of IRIs (not necessarily disjoint from V )  

With this principle of using nodes to represent n-ary relations (where n >2), it becomes feasible to encode increasingly more complex information in an edge-labelled graph, such as, for example, to encode that the same character can be portrayed by different actors, and so forth.

Property Graphs. In edge-labelled graphs, we use labels to indicate the type of edge, where multiple edges may have the same type. In a similar way, we could consider labelling nodes.

... having node labels as part of the model can offer a more direct abstraction that is easier for users to query and understand. In the same way, it is often cumbersome to add information about the edges to an edge-labelled graph.  

... Thus, for scenarios where various new types of meta-information may regularly need to be added to edges or nodes, the most general and widely adopted alternative is to use an extension of an edge-labelled graph called a property graph

In property graphs, both edges and nodes can be labelled. Each edge and node is additionally associated with a unique identifier that can be used as a “hook” to associate additional meta-information—in the form of a set of property-value pairs called attributes—directly to that edge or node. Again, while it would be possible to instead encode attributes and labels as additional edges, in practice, such features allow one to directly annotate the graph without modifying its overall structure.

Definition 2.3 (Property graph). A property graph G is a tuple (V,E,ρ,λ,σ), where:
(1) V is a finite set of vertices (or nodes).
(2) E is a finite set of edges such that V and E have no elements in common.
(3) ρ : E (V ×V )is a total function. Intuitively, ρ(e)= (v1,v2) indicates that e is a directed edge from node v1 to node v2 in G.
(4) λ : (V E)Lab is a total function with Lab a set of labels. Intuitively, if v V (respectively, e E) and ρ(v)= (respectively, ρ(e)=) , then  is the label of node v (respectively, edge e) in G.
(5) σ : (V E)×Prop Val is a partial function with Prop a finite set of properties and Val a set of values. Intuitively, if v V (respectively, e E), p Prop and σ (v,p)=s (respectively, σ (e,p)=s), then s is the value of property p for node v (respectively, edge e) in the property graph G 

In our definition of a property graph, each node and edge is associated with a single label and at most one value for each attribute property. In some applications, it may be useful to have multiple values in these positions. We could thus consider a variant of property graphs, which we call multi-valued property graphs, to allow multiple labels and multi-valued attribute properties within the property graph model: in Definition 2.3, the mapping λ would then return a set of labels and σ would return a set of values. In practice, engines may have custom policies   


Comentários

Postagens mais visitadas deste blog

Connected Papers: Uma abordagem alternativa para revisão da literatura

Durante um projeto de pesquisa podemos encontrar um artigo que nos identificamos em termos de problema de pesquisa e também de solução. Então surge a vontade de saber como essa área de pesquisa se desenvolveu até chegar a esse ponto ou quais desdobramentos ocorreram a partir dessa solução proposta para identificar o estado da arte nesse tema. Podemos seguir duas abordagens:  realizar uma revisão sistemática usando palavras chaves que melhor caracterizam o tema em bibliotecas digitais de referência para encontrar artigos relacionados ou realizar snowballing ancorado nesse artigo que identificamos previamente, explorando os artigos citados (backward) ou os artigos que o citam (forward)  Mas a ferramenta Connected Papers propõe uma abordagem alternativa para essa busca. O problema inicial é dado um artigo de interesse, precisamos encontrar outros artigos relacionados de "certa forma". Find different methods and approaches to the same subject Track down the state of the art rese...

Knowledge Graph Embedding with Triple Context - Leitura de Abstract

  Jun Shi, Huan Gao, Guilin Qi, and Zhangquan Zhou. 2017. Knowledge Graph Embedding with Triple Context. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM '17). Association for Computing Machinery, New York, NY, USA, 2299–2302. https://doi.org/10.1145/3132847.3133119 ABSTRACT Knowledge graph embedding, which aims to represent entities and relations in vector spaces, has shown outstanding performance on a few knowledge graph completion tasks. Most existing methods are based on the assumption that a knowledge graph is a set of separate triples, ignoring rich graph features, i.e., structural information in the graph. In this paper, we take advantages of structures in knowledge graphs, especially local structures around a triple, which we refer to as triple context. We then propose a Triple-Context-based knowledge Embedding model (TCE). For each triple, two kinds of structure information are considered as its context in the graph; one is the out...

KnOD 2021

Beyond Facts: Online Discourse and Knowledge Graphs A preface to the proceedings of the 1st International Workshop on Knowledge Graphs for Online Discourse Analysis (KnOD 2021, co-located with TheWebConf’21) https://ceur-ws.org/Vol-2877/preface.pdf https://knod2021.wordpress.com/   ABSTRACT Expressing opinions and interacting with others on the Web has led to the production of an abundance of online discourse data, such as claims and viewpoints on controversial topics, their sources and contexts . This data constitutes a valuable source of insights for studies into misinformation spread, bias reinforcement, echo chambers or political agenda setting. While knowledge graphs promise to provide the key to a Web of structured information, they are mainly focused on facts without keeping track of the diversity, connection or temporal evolution of online discourse data. As opposed to facts, claims are inherently more complex. Their interpretation strongly depends on the context and a vari...