UniKG: A Unified Interoperable Knowledge Graph Database System

UniKG: A Unified Interoperable Knowledge Graph Database System - Leitura de Artigo

B. Liu, X. Wang, P. Liu, S. Li, Q. Fu and Y. Chai, "UniKG: A Unified Interoperable Knowledge Graph Database System," 2021 IEEE 37th International Conference on Data Engineering (ICDE), 2021, pp. 2681-2684, doi: 10.1109/ICDE51399.2021.00303.

Abstract: Knowledge graph currently has two main data models: RDF graph and property graph. The query language on RDF graph is SPARQL, while the query language on property graph is mainly Cypher. Different data models and query languages hinder the wider application of knowledge graphs.

In this demonstration, we propose a unified interoperable knowledge graph database system, UniKG.

(1) Based on the relational model, a unified storage scheme is utilized to efficiently store RDF graphs and property graphs, and support the query requirements of knowledge graphs.

(2) Using the characteristicset-based method, the storage problem of untyped entities is addressed in UniKG.

(3) UniKG realizes the interoperability of SPARQL and Cypher, and enables them to interchangeably operate on the same knowledge graph.

(4) With a unified Web interface, users are allowed to query with two different languages over the same knowledge graph and visualize query results and explanations.

URL: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9458632&isnumber=9458600

* O propósito é o mesmo do 1G INTEROPERABILIDADE mas o item 3 é estaria faltando ao modelo 1G do Amazon Neptune

Tem outro artigo sobre esse mesmo GraphDB em um workshop do VLDB

Liu, Baozhu et al. “Towards a Unified Knowledge Graph Data Management System.” SEA-Data@VLDB (2021).

To this end, we propose a unified KG data management system, which consists of three components, i.e., storage manager, query processing coordinator, and Web interface, making multiple KGs managable in a unified database management system. The queries will be translated into unified semantics denoted by relational algebra using the query processing coordinator. In storage manager, RDF graphs and property graphs will be shred into relations with the specific approaches.

To verify the effectiveness and efficiency of the proposed system, extensive experiments were conducted on several data sets. The experimental results show that our system outperforms gStore [1] and Neo4j [2], which are two state-of-the-art KG database systems.

* Qual KG usaram nos testes? Quais queries usaram? *

Sobre esse artigo temos:

I. INTRODUCTION

Different storage schemes for different data models have been proposed in existing KG database management systems.
The storage schemes of RDF graphs can be divided into three categories: (1) the storage schemes that directly utilize the characteristics of RDF triples, such as Triple Table, (2) the approaches that exploit the type characteristics of RDF graph data, such as DB2RDF, (3) and the methods that are based on the semantic information of RDF graphs, such as Characteristic Sets.

* Uma tabela com 3 colunas s, p,o *

* RDB2RDF é um padrão W3C para converter dados relacionais para RDF. Uma tabela para cada tipo de vértice com seus atributos e outra para cada tipo de relação *

* Verificar o que é Characteristics Sets e como isso pode ser usado para clusterizar vértices sem tipo definido *

* Não fala sobre RDF-star logo não tem propriedade das arestas e nos atributos *

For the storage of property graphs, native storage schemes are generally used instead of relational ones.

With characteristic-set-based clustering method, a unified storage scheme is implemented in UniKG, which supports the efficient storage of RDF graphs and property graphs.

... we have developed UniKG to realize semantic alignment of SPARQL and Cypher, so that users can use these two different languages on the same KG in an interoperable way.

* São linguagens com expressividade diferentes, Cypher tem mais opções para consultas de caminho por exemplo. Como lidou com as diferenças? *

In storage manager, RDF graphs will be shred into relations with the triple division approach, while property graphs will be applied a label-to-type alignment method.

* RDF: Uma tabela para cada tipo de vértice (rdf:type) com seus atributos e outra para cada tipo de relação *

* LPG: Uma tabela para cada tipo de vértice (label) com seus atributos e outra para cada tipo de relação com seus atributos *

II. UNIFIED STORAGE SCHEME

The entities and edges are clustered and stored in separate relation tables according to their types.
Due to the type-clustered storage scheme, the type-based queries, which include rdf:type, can be dramatically accelerated. More specifically, the scope of the queries are narrowed to a certain relation table, instead of scanning all relation tables.

* Melhora para determinados tipos de consultas ... BGP Look UP. São as mais frequentes em quais cenários? *

Furthermore, in order to support untyped queries, all vertices (edges) are simultaneously stored in the vertices (edges) table to facilitate global scans.

Vertex tables consist of two columns: the first column stores the IDs of the entities, while the second column stores the properties corresponding to the entities ....

* Como é o desempenho da busca usando os valores das propriedades? Como lidam com propriedades com diferentes tipos de dados? *

Meanwhile, edge tables consist of four columns, recording the IDs of the edges, the source vertices of the edges, the target vertices of the edges, and the properties corresponding to the edges, respectively.

* Como é o desempenho da busca usando os valores das propriedades? Como lidam com propriedades com diferentes tipos de dados? *

For the storage of RDF graphs, all triples are divided into three categories (i.e., triple division), each of which represents (1) the types of the entities, (2) the attributes of the entities, and (3) the relationship between entities. These triples can be transformed into (1) different types of vertex tables, (2) the attributes of the vertices, and (3) different types of edge tables, respectively.

* É o reverso do RDB2RDF *

For the storage of property graphs, the labels of the vertices or edges are transformed into different types of relations, i.e., label-to-type alignment, while the attributes are stored in the property columns. If an entity belongs to multiple types, the entity will be stored more than once in different type relations.

* LPG segue a mesma abordagem sendo que trata as propriedades das arestas *

* Lembrando que a implementação do LPG no ThinkerPop também prevê propriedades nos atributos dos vértices *

The basic storage scheme described above strictly depends on the types of entities, however, in real-world KGs not all entities are necessarily has specified types. For untyped entities, in UniKG, we leverage a clustering method based on characteristic set [6] to assign them types, where entities are allocated into clusters according to their characteristicsets. The distances between two clusters can be calculated based on their predicate difference, which is equal to the number of occurrences of predicates that are not shared in these two clusters. One typed cluster and one untyped cluster with shortest distance are recursively merged until there are no more clusters that satisfy the conditions. Adopting the above method, a “type” that has the smallest difference can be assigned to untyped entities, so that the untyped entities can be stored in the same relation table with typed entities.

* Como lidar com entidades / vértices sem tipo e não permitir que isso gere um número exponencial de tabelas quando nem todos os vértices tem tipo definido *

III. UNIFIED QUERY PROCESSING

In UniKG, we focus on the basic graph pattern matching queries (BGP), text search queries, graph analysis queries, and regular path queries in SPARQL and Cypher. In fact, in the UniKG, SPARQL and Cypher are regarded as syntactic views of unified semantics of KG queries. In other words, the query semantics of SPARQL and Cypher will be aligned to obtain the unified semantic tree, represented by relational algebra

* Operadores do "mundo" relacional ... E as consultas de caminho, como são tratadas? Não tem exemplo e nem descrição. *

During query processing, (1) information indicating the types of vertices or edges is transformed into RENAME operator ρ, (2) information indicating the attributes of vertices or edges is transformed into SELECTION operators σ, and (3) information indicating the relationships between the vertices is transformed into JOIN operators ./, respectively. Finally, a PROJECTION operator π is applied to handle the output results.

* Essas são as regras de conversão do artigo, estão em alto nível. Só tem esses operadores? É o suficienete para atender a todas as consultas? *

IV. RELATED SYSTEMS AND NOVELTY

RDF Graph Database System. ... most systems only support BGP (Basic Graph Pattern) matching query and various aggregation functions, e.g., gStore [2], Virtuoso [9], and RDF-3X [10]. Text search or graph analysis queries are not available in the systems aforementioned, which are all supported in UniKG.

* Vários suportam consultas de caminho como o AllegroGraph no padrão SPARQL 1.1. O StarDog por exemplo tem um extensão para que esse tipo de consulta retorna um grafo e não somente os nós de origem e destino. *

Property Graph Database System. ... Most property graph database systems are built upon native graph storage schemes, while relational model is used in UniKG to store KGs, which will provide support for transaction management and scalability. It is also noteworthy that text search or graph analysis queries are not available in these property graph database systems.

* Que tipo de análises? Neo4J e TigerGraph possuem funcionalidades adicionais para Shortest Path, Page Rank, etc ... *

V. DEMONSTRATIONS

We have implemented UniKG based on AgensGraph, which is a property graph database system. ... In order to import data sets into UniKG, we first preprocess the data with the characteristic-set-based clustering method to handle untyped entities and shred entities / edges into relations.

* Esse processamento é em tempo de carga. O quanto isso onera a carga? *

Scenario 1. Query against KGs. Users can issue SPARQL or Cypher queries in the specific input boxes, ... hen, these queries will be translated into relational algebra using the procedure described in Sect.III, which will be efficiently executed by the UniKG query engine. Finally, the query results can be demonstrated in the form of JSON, table, or forcelayout graph, ...

Multiple query features of SPARQL and Cypher are supported in UniKG, including BGP matching query (i.e., subgraph isomorphism), text search query (i.e., keyword query), graph analytical query (e.g., finding the shortest path between two vertices and calculating PageRank value among vertices), and regular path query (i.e., the path between two vertices conforming to a regular expression).

Scenario 2. Explain Queries of SPARQL and Cypher.

As mentioned in Sect.III, the SPARQL and Cypher query can be translated into a unified abstract query plan. The queries in different syntax forms with the same intend are executed in UniKG with the same plan (without considering the execution orders).
In addition, UniKG provides explain input boxes for both SPARQL and Cypher. The unified query plan after explaining can be demonstrated in the form of JSON, table or graph, ...

* Explain para mostrar a consulta convertida na linguagem abstrata *

Scenario 3. Translate Query Languages. In fact, SPARQL and Cypher can be considered as different syntactic views of the same query plan. Using the translation function between SPARQL and Cypher, a query in one language can be transformed into the equivalent query in the other language, which provides users multiple options to explore KGs.

* De / Para das linguagens é mostrado na interface *

* Não achei no GitHUB nem o software e nem o benchmark de comparação *

Connected Papers: Uma abordagem alternativa para revisão da literatura

Durante um projeto de pesquisa podemos encontrar um artigo que nos identificamos em termos de problema de pesquisa e também de solução. Então surge a vontade de saber como essa área de pesquisa se desenvolveu até chegar a esse ponto ou quais desdobramentos ocorreram a partir dessa solução proposta para identificar o estado da arte nesse tema. Podemos seguir duas abordagens: realizar uma revisão sistemática usando palavras chaves que melhor caracterizam o tema em bibliotecas digitais de referência para encontrar artigos relacionados ou realizar snowballing ancorado nesse artigo que identificamos previamente, explorando os artigos citados (backward) ou os artigos que o citam (forward) Mas a ferramenta Connected Papers propõe uma abordagem alternativa para essa busca. O problema inicial é dado um artigo de interesse, precisamos encontrar outros artigos relacionados de "certa forma". Find different methods and approaches to the same subject Track down the state of the art rese...

Pesquisa de Doutorado da Veronica

Pesquisar este blog

UniKG: A Unified Interoperable Knowledge Graph Database System - Leitura de Artigo

Marcadores

Comentários

Postar um comentário

Postagens mais visitadas deste blog

Connected Papers: Uma abordagem alternativa para revisão da literatura

Knowledge Graph Embedding with Triple Context - Leitura de Abstract

Exploratory Search: From Finding to Understanding - Leitura de Artigo