Reconciliation of RDF⋆ and Property Graphs
Olaf Hartig
University of Waterloo
http://olafhartig.de
November 14, 2014
Abstract
Both the notion of Property Graphs (PG) and the Resource Description Framework (RDF) are commonly used models for representing graph-shaped data. While there exist some system specific solutions to convert data from one model to the other, these solutions are not entirely compatible with one another and none of them appears to be based on a formal foundation.
In fact, for the PG model, there does not even exist a commonly agreed-upon formal definition. The aim of this document is to reconcile both models formally.
To this end, the document proposes a formalization of the PG model and introduces well-defined transformations between PGs and RDF. As a result, the document provides a basis for the following two innovations:
On one hand, by implementing the RDF-to-PG transformations defined in this document, PG-based systems can enable their users to load RDF data and make it accessible in a compatible, system independent manner using, e.g., the graph traversal language Gremlin or the declarative graph query language Cypher.
On the other hand, the PG-to-RDF transformation in this document enables RDF data management systems to support compatible, system-independent queries over the content of Property Graphs by using the standard RDF query language SPARQL.
Additionally, this document represents a foundation for systematic research on relationships between
the two models and between their query languages.
1 Introduction
The primary goal of this reconciliation is to enable a user who is familiar with either of these data models to access data, represented in the other model, based on a well-defined, system-independent view of this data given in the model most familiar to the user.
2 Informal Overview of the Data Models
In contrast to Property Graphs, RDF is a standardized data model.
A shortcoming that RDF has been widely criticized for is the lack of an approach to represent statement-level metadata that is as intuitive and user-friendly as edge properties in Property Graphs
While RDF provides a notion of reification to support this use case, this approach is awkward to use, the resulting metadata is cumbersome to query, and it may blow up the dataset size significantly. However, a recently proposed extension of RDF addresses this shortcoming by making triples about triples a first class citizen in the data model .... RDF* ou RDF-Star
It is important to emphasize that RDF⋆ is simply a syntactic extension of RDF that makes dealing with statement-level metadata more intuitive. In fact, there exists a well-defined transformation of RDF⋆ data back to standard RDF data ... usando a reificação padrão (rdfs:object, rdfs:predicate, rdfs: subject, rdfs:statement).
3 Informal Overview of the Proposal
3.1 First Transformation: RDF⋆ to RDF-like Property Graphs
The first transformation presents an intuitive (perhaps the most natural) way of converting RDF⋆ data to Property Graphs; namely, this transformation represents any ordinary RDF triple as an edge in the resulting Property Graph; the two vertices incident to such an edge have properties that describe the subject and the object of the corresponding RDF triple; and metadata triples are represented as edge properties.
Tipos de vértices: IRI, Literal e Blank Node (?). As propriedades dos vértices tem o nome de acordo com o tipo e o valor de acordo com o valor na tripla. Cada tripla se transforma em arestas e em caso de triplas aninhadas, o predicado e o objeto dessas triplas se torna propriedade das arestas mas o problema está quando o objeto não é um literal e sim um outro nó IRI do grafo RDF-Star
the transformation also enables users to benefit from features of Cypher that are not available in SPARQL. For instance, Cypher allows for path expressions that are more powerful than the property path feature provided by SPARQL 1.1. Another example is querying statement-level metadata by accessing the corresponding edge properties using Cypher.
any RDF-based system—independent of whether it supports the RDF⋆ extension or not—may support Cypher queries on top of a virtual Property Graph view of the data (where the view is defined by the given transformation). However, for RDF⋆-enabled systems, if the primary use case for supporting Cypher are more user-friendly queries over statement-level metadata
Realizar a conversão em tempo de consulta, não precisa converter formato
3.2 Second Transformation: RDF⋆ to Simple Property Graphs
The transformation distinguishes attribute triples, that is, ordinary (non-metadata) triples whose object is a literal, and relationship triples, that is, ordinary triples whose object is an IRI or a blank node. Then, the transformation represents every relationship triple as an edge; every attribute triple is converted into a property of the vertex for the subject of that triple (instead of also converting it into a separate edge as the lossless transformation does). Hence, vertices in the resulting Property Graph represent IRIs and blank nodes only (whereas the lossless transformation produces vertices that may represent literals).
Metadata triples about relationship triples are converted into edge properties, whereas metadata triples about attribute triples cannot be converted by the transformation because the Property Graph model does not support metadata about (vertex) properties. Consequently, the second transformation is more limited in the RDF⋆ data that it can handle than the the aforementioned lossless transformation.
Diferenciar os tipos de triplas de acordo com o conteúdo do objeto da tripla
3.3 Third Transformation: Property Graphs to RDF⋆
The third transformation converts Property Graphs to RDF⋆ data (which then may be transformed to standard RDF data [HT14]). The idea of the transformation is simple: Every edge (including its label) in a given Property Graph is represented as an ordinary RDF triple in the resulting RDF⋆ data; the same holds for every vertex property. Any edge property is represented as a metadata triple whose subject is the triple representing the corresponding edge. The transformation gives users the freedom to choose patterns for generating IRIs that denote edge labels and properties keys, respectively. These IRIs become the predicates of triples in the resulting RDF⋆ data.
The transformation cannot be used for a Property Graph that contains distinct edges with the same start node, the same end node, and the same label.
Não suporta multi-grafo no RDF-Star
... a Property Graph system may provide virtual RDF⋆ views of its Property Graphs that can be queried using SPARQL
o Neo4J tem essa opção
4 Formalization of the Data Models
4.1 RDF⋆
Não fala sobre triplas asserted e not asserted
4.2 Property Graphs
Definition 2. Let P be the (infinite) set of all possible properties; that is, a pair p = <k, v> where
k ∈ dom(S) and v ∈ (União) D∈D dom(D); i.e., P = dom(S) × (União) D∈D dom(D). A Property Graph is a tuple G = <V,E, src, tgt , lbl ,Pi> such that
• <V,E, src, tgt , lbl> is an edge-labeled directed multigraph with
– a set of vertices V ,
– a set of edges E,
– a function src : E → V that associates each edge with its source vertex,
– a function tgt : E → V that associates each edge with its target vertex, and
– a function lbl : E → dom(S) that associates each edge its label; and
• P is a function P: (V ∪ E) → 2P.
(olhar notação no arquivo original)
4.3 Property Graph Convertibility of RDF⋆ graphs
Before going into the details of the transformations it is important to note that the RDF⋆ data model is more expressive than the Property Graphs model. For instance, RDF⋆ allows for an arbitrarily deep nesting of metadata triples, whereas a Property Graph cannot contain additional metadata about a property of a vertex or an edge (i.e., a property in a Property Graph cannot be annotated with properties itself). As a consequence, any transformation from RDF⋆ graphs to Property Graphs that adapts the natural approach of representing metadata triples as edge properties is possible only for specific RDF⋆ graphs.
RDF-Star data sem triplas aninhadas em mais de um nível, triplas aninhadas somente na posição de sujeito e o objeto dessas triplas que contem triplas aninhadas só pode ser um literal, e não um IRI, conversível para algum tipo de dados suportado.
5 Transforming RDF⋆ Graphs to RDF-like Property Graphs
While the given transformation is lossless (i.e., resulting Property Graphs contain all information present in the original RDF⋆ graph), some use cases may have the stronger requirement that the original RDF⋆ graph can be reconstructed exactly from its RDF-like Property Graph representation.
Hence, for these use cases the transformation must be invertible. To ensure an invertible transformation, any RDF⋆ graph to be transformed must not contain redundant RDF⋆ triples; that is, RDF⋆ triples that are embedded in metadata triples in the RDF⋆ graph and, additionally, appear directly (as a separate element) in the RDF⋆ graph.
Garantia de minimalidade
6 Transforming RDF⋆ Graphs to Simple Property Graphs
Metadata triples about relationship triples are converted into edge properties, and metadata triples about attribute triples cannot be converted by the transformation because the Property Graph model does not support metadata about (vertex) properties.
7 Transforming Property Graphs to RDF⋆ Graphs
While representing each edge (including its label) as a single triple is perhaps the most intuitive approach to transform such edges, this approach has the following shortcoming: If there are two (or more) distinct edges that connect the same vertices and have the same label (but may have different properties), the approach would represent both edges by a single triple. As a result, this triple would not represent any one of the edges unambiguously when embedded in a metadata triple for a property of the edge. To avoid this problem, the transformation is restricted to Property Graphs that do not contain distinct edges with the same source vertex, the same target vertex, and the same label. Hereafter, these Property Graphs are called edge-unique.
Edge-unique é o mesmo de não ser multi-grafo
Comentários
Postar um comentário
Sinta-se a vontade para comentar. Críticas construtivas são sempre bem vindas.