Pular para o conteúdo principal

Reconciliation of RDF⋆ and Property Graphs - Leitura de Artigo

Reconciliation of RDF⋆ and Property Graphs
Olaf Hartig
University of Waterloo
http://olafhartig.de
November 14, 2014

Abstract
Both the notion of Property Graphs (PG) and the Resource Description Framework (RDF) are commonly used models for representing graph-shaped data. While there exist some system specific solutions to convert data from one model to the other, these solutions are not entirely compatible with one another and none of them appears to be based on a formal foundation. 

In fact, for the PG model, there does not even exist a commonly agreed-upon formal definition. The aim of this document is to reconcile both models formally. 

To this end, the document proposes a formalization of the PG model and introduces well-defined transformations between PGs and RDF. As a result, the document provides a basis for the following two innovations: 

On one hand, by implementing the RDF-to-PG transformations defined in this document, PG-based systems can enable their users to load RDF data and make it accessible in a compatible, system independent manner using, e.g., the graph traversal language Gremlin or the declarative graph query language Cypher. 

On the other hand, the PG-to-RDF transformation in this document enables RDF data management systems to support compatible, system-independent queries over the content of Property Graphs by using the standard RDF query language SPARQL. 

Additionally, this document represents a foundation for systematic research on relationships between
the two models and between their query languages.

1 Introduction

The primary goal of this reconciliation is to enable a user who is familiar with either of these data models to access data, represented in the other model, based on a well-defined, system-independent view of this data given in the model most familiar to the user.

2 Informal Overview of the Data Models

In contrast to Property Graphs, RDF is a standardized data model.

A shortcoming that RDF has been widely criticized for is the lack of an approach to represent statement-level metadata that is as intuitive and user-friendly as edge properties in Property Graphs

While RDF provides a notion of reification to support this use case, this approach is awkward to use, the resulting metadata is cumbersome to query, and it may blow up the dataset size significantly. However, a recently proposed extension of RDF addresses this shortcoming by making triples about triples a first class citizen in the data model .... RDF*  ou RDF-Star

It is important to emphasize that RDF⋆ is simply a syntactic extension of RDF that makes dealing with statement-level metadata more intuitive. In fact, there exists a well-defined transformation of RDF⋆ data back to standard RDF data ... usando a reificação padrão (rdfs:object, rdfs:predicate, rdfs: subject, rdfs:statement).

3 Informal Overview of the Proposal

3.1 First Transformation: RDF⋆ to RDF-like Property Graphs

The first transformation presents an intuitive (perhaps the most natural) way of converting RDF⋆ data to Property Graphs; namely, this transformation represents any ordinary RDF triple as an edge in the resulting Property Graph; the two vertices incident to such an edge have properties that describe the subject and the object of the corresponding RDF triple; and metadata triples are represented as edge properties.

Tipos de vértices: IRI, Literal e Blank Node (?). As propriedades dos vértices tem o nome de acordo com o tipo e o valor de acordo com o valor na tripla. Cada tripla se transforma em arestas e em caso de triplas aninhadas, o predicado e o objeto dessas triplas se torna propriedade das arestas mas o problema está quando o objeto não é um literal e sim um outro nó IRI do grafo RDF-Star

the transformation also enables users to benefit from features of Cypher that are not available in SPARQL. For instance, Cypher allows for path expressions that are more powerful than the property path feature provided by SPARQL 1.1. Another example is querying statement-level metadata by accessing the corresponding edge properties using Cypher. 

any RDF-based system—independent of whether it supports the RDF⋆ extension or not—may support Cypher queries on top of a virtual Property Graph view of the data (where the view is defined by the given transformation). However, for RDF⋆-enabled systems, if the primary use case for supporting Cypher are more user-friendly queries over statement-level metadata

Realizar a conversão em tempo de consulta, não precisa converter formato

3.2 Second Transformation: RDF⋆ to Simple Property Graphs

The transformation distinguishes attribute triples, that is, ordinary (non-metadata) triples whose object is a literal, and relationship triples, that is, ordinary triples whose object is an IRI or a blank node. Then, the transformation represents every relationship triple as an edge; every attribute triple is converted into a property of the vertex for the subject of that triple (instead of also converting it into a separate edge as the lossless transformation does). Hence, vertices in the resulting Property Graph represent IRIs and blank nodes only (whereas the lossless transformation produces vertices that may represent literals). 

Metadata triples about relationship triples are converted into edge properties, whereas metadata triples about attribute triples cannot be converted by the transformation because the Property Graph model does not support metadata about (vertex) properties. Consequently, the second transformation is more limited in the RDF⋆ data that it can handle than the the aforementioned lossless transformation.

Diferenciar os tipos de triplas de acordo com o conteúdo do objeto da tripla

3.3 Third Transformation: Property Graphs to RDF⋆

The third transformation converts Property Graphs to RDF⋆ data (which then may be transformed to standard RDF data [HT14]). The idea of the transformation is simple: Every edge (including its label) in a given Property Graph is represented as an ordinary RDF triple in the resulting RDF⋆ data; the same holds for every vertex property. Any edge property is represented as a metadata triple whose subject is the triple representing the corresponding edge. The transformation gives users the freedom to choose patterns for generating IRIs that denote edge labels and properties keys, respectively. These IRIs become the predicates of triples in the resulting RDF⋆ data.

The transformation cannot be used for a Property Graph that contains distinct edges with the same start node, the same end node, and the same label.

Não suporta multi-grafo no RDF-Star

... a Property Graph system may provide virtual RDF⋆ views of its Property Graphs that can be queried using SPARQL

o Neo4J tem essa opção

4 Formalization of the Data Models

4.1 RDF⋆

Não fala sobre triplas asserted e not asserted

4.2 Property Graphs

Definition 2. Let P be the (infinite) set of all possible properties; that is, a pair p = <k, v> where
k ∈ dom(S) and v ∈ (União) D∈D dom(D); i.e., P = dom(S) × (União) D∈D dom(D). A Property Graph is a tuple G = <V,E, src, tgt , lbl ,Pi> such that
• <V,E, src, tgt , lbl> is an edge-labeled directed multigraph with
– a set of vertices V ,
– a set of edges E,
– a function src : E → V that associates each edge with its source vertex,
– a function tgt : E → V that associates each edge with its target vertex, and
– a function lbl : E → dom(S) that associates each edge its label; and
• P is a function P: (V ∪ E) → 2P.

(olhar notação no arquivo original)

4.3 Property Graph Convertibility of RDF⋆ graphs

Before going into the details of the transformations it is important to note that the RDF⋆ data model is more expressive than the Property Graphs model. For instance, RDF⋆ allows for an arbitrarily deep nesting of metadata triples, whereas a Property Graph cannot contain additional metadata about a property of a vertex or an edge (i.e., a property in a Property Graph cannot be annotated with properties itself). As a consequence, any transformation from RDF⋆ graphs to Property Graphs that adapts the natural approach of representing metadata triples as edge properties is possible only for specific RDF⋆ graphs.

RDF-Star data sem triplas aninhadas em mais de um nível, triplas aninhadas somente na posição de sujeito e o objeto dessas triplas que contem triplas aninhadas só pode ser um literal, e não um IRI, conversível para algum tipo de dados suportado. 

5 Transforming RDF⋆ Graphs to RDF-like Property Graphs

While the given transformation is lossless (i.e., resulting Property Graphs contain all information present in the original RDF⋆ graph), some use cases may have the stronger requirement that the original RDF⋆ graph can be reconstructed exactly from its RDF-like Property Graph representation.

Hence, for these use cases the transformation must be invertible. To ensure an invertible transformation, any RDF⋆ graph to be transformed must not contain redundant RDF⋆ triples; that is, RDF⋆ triples that are embedded in metadata triples in the RDF⋆ graph and, additionally, appear directly (as a separate element) in the RDF⋆ graph.

Garantia de minimalidade

6 Transforming RDF⋆ Graphs to Simple Property Graphs

Metadata triples about relationship triples are converted into edge properties, and metadata triples about attribute triples cannot be converted by the transformation because the Property Graph model does not support metadata about (vertex) properties.

7 Transforming Property Graphs to RDF⋆ Graphs

While representing each edge (including its label) as a single triple is perhaps the most intuitive approach to transform such edges, this approach has the following shortcoming: If there are two (or more) distinct edges that connect the same vertices and have the same label (but may have different properties), the approach would represent both edges by a single triple. As a result, this triple would not represent any one of the edges unambiguously when embedded in a metadata triple for a property of the edge. To avoid this problem, the transformation is restricted to Property Graphs that do not contain distinct edges with the same source vertex, the same target vertex, and the same label. Hereafter, these Property Graphs are called edge-unique.

Edge-unique é o mesmo de não ser multi-grafo


Comentários

Postagens mais visitadas deste blog

Connected Papers: Uma abordagem alternativa para revisão da literatura

Durante um projeto de pesquisa podemos encontrar um artigo que nos identificamos em termos de problema de pesquisa e também de solução. Então surge a vontade de saber como essa área de pesquisa se desenvolveu até chegar a esse ponto ou quais desdobramentos ocorreram a partir dessa solução proposta para identificar o estado da arte nesse tema. Podemos seguir duas abordagens:  realizar uma revisão sistemática usando palavras chaves que melhor caracterizam o tema em bibliotecas digitais de referência para encontrar artigos relacionados ou realizar snowballing ancorado nesse artigo que identificamos previamente, explorando os artigos citados (backward) ou os artigos que o citam (forward)  Mas a ferramenta Connected Papers propõe uma abordagem alternativa para essa busca. O problema inicial é dado um artigo de interesse, precisamos encontrar outros artigos relacionados de "certa forma". Find different methods and approaches to the same subject Track down the state of the art rese...

Knowledge Graph Embedding with Triple Context - Leitura de Abstract

  Jun Shi, Huan Gao, Guilin Qi, and Zhangquan Zhou. 2017. Knowledge Graph Embedding with Triple Context. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM '17). Association for Computing Machinery, New York, NY, USA, 2299–2302. https://doi.org/10.1145/3132847.3133119 ABSTRACT Knowledge graph embedding, which aims to represent entities and relations in vector spaces, has shown outstanding performance on a few knowledge graph completion tasks. Most existing methods are based on the assumption that a knowledge graph is a set of separate triples, ignoring rich graph features, i.e., structural information in the graph. In this paper, we take advantages of structures in knowledge graphs, especially local structures around a triple, which we refer to as triple context. We then propose a Triple-Context-based knowledge Embedding model (TCE). For each triple, two kinds of structure information are considered as its context in the graph; one is the out...

KnOD 2021

Beyond Facts: Online Discourse and Knowledge Graphs A preface to the proceedings of the 1st International Workshop on Knowledge Graphs for Online Discourse Analysis (KnOD 2021, co-located with TheWebConf’21) https://ceur-ws.org/Vol-2877/preface.pdf https://knod2021.wordpress.com/   ABSTRACT Expressing opinions and interacting with others on the Web has led to the production of an abundance of online discourse data, such as claims and viewpoints on controversial topics, their sources and contexts . This data constitutes a valuable source of insights for studies into misinformation spread, bias reinforcement, echo chambers or political agenda setting. While knowledge graphs promise to provide the key to a Web of structured information, they are mainly focused on facts without keeping track of the diversity, connection or temporal evolution of online discourse data. As opposed to facts, claims are inherently more complex. Their interpretation strongly depends on the context and a vari...