Pular para o conteúdo principal

UniKG: A Unified Interoperable Knowledge Graph Database System - Leitura de Artigo

B. Liu, X. Wang, P. Liu, S. Li, Q. Fu and Y. Chai, "UniKG: A Unified Interoperable Knowledge Graph Database System," 2021 IEEE 37th International Conference on Data Engineering (ICDE), 2021, pp. 2681-2684, doi: 10.1109/ICDE51399.2021.00303.
 
Abstract: Knowledge graph currently has two main data models: RDF graph and property graph. The query language on RDF graph is SPARQL, while the query language on property graph is mainly Cypher. Different data models and query languages hinder the wider application of knowledge graphs. 
 
In this demonstration, we propose a unified interoperable knowledge graph database system, UniKG. 
(1) Based on the relational model, a unified storage scheme is utilized to efficiently store RDF graphs and property graphs, and support the query requirements of knowledge graphs. 
(2) Using the characteristicset-based method, the storage problem of untyped entities is addressed in UniKG. 
(3) UniKG realizes the interoperability of SPARQL and Cypher, and enables them to interchangeably operate on the same knowledge graph. 
(4) With a unified Web interface, users are allowed to query with two different languages over the same knowledge graph and visualize query results and explanations.
URL: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9458632&isnumber=9458600 
 
* O propósito é o mesmo do 1G INTEROPERABILIDADE mas o item 3 é estaria faltando ao modelo 1G do Amazon Neptune
 
Tem outro artigo sobre esse mesmo GraphDB em um workshop do VLDB
Liu, Baozhu et al. “Towards a Unified Knowledge Graph Data Management System.” SEA-Data@VLDB (2021).

To this end, we propose a unified KG data management system, which consists of three components, i.e., storage manager, query processing coordinator, and Web interface, making multiple KGs managable in a unified database management system. The queries will be translated into unified semantics denoted by relational algebra using the query processing coordinator. In storage manager, RDF graphs and property graphs will be shred into relations with the specific approaches.

 

To verify the effectiveness and efficiency of the proposed system, extensive experiments were conducted on several data sets. The experimental results show that our system outperforms gStore [1] and Neo4j [2], which are two state-of-the-art KG database systems.

* Qual KG usaram nos testes? Quais queries usaram? * 

 

Sobre esse artigo temos: 
 
I. INTRODUCTION
 
Different storage schemes for different data models have been proposed in existing KG database management systems.
The storage schemes of RDF graphs can be divided into three categories: (1) the storage schemes that directly utilize the characteristics of RDF triples, such as Triple Table, (2) the approaches that exploit the type characteristics of RDF graph data, such as DB2RDF, (3) and the methods that are based on the semantic information of RDF graphs, such as Characteristic Sets
 
* Uma tabela com 3 colunas s, p,o *
* RDB2RDF é um padrão W3C para converter dados relacionais para RDF. Uma tabela para cada tipo de vértice com seus atributos e outra para cada tipo de relação *
* Verificar o que é Characteristics Sets e como isso pode ser usado para clusterizar vértices sem tipo definido *
* Não fala sobre RDF-star logo não tem propriedade das arestas e nos atributos * 

For the storage of property graphs, native storage schemes are generally used instead of relational ones. 
 
With characteristic-set-based clustering method, a unified storage scheme is implemented in UniKG, which supports the efficient storage of RDF graphs and property graphs.
 
... we have developed UniKG to realize semantic alignment of SPARQL and Cypher, so that users can use these two different languages on the same KG in an interoperable way.
 
* São linguagens com expressividade diferentes, Cypher tem mais opções para consultas de caminho por exemplo. Como lidou com as diferenças? * 

 

In storage manager, RDF graphs will be shred into relations with the triple division approach, while property graphs will be applied a label-to-type alignment method. 

* RDF: Uma tabela para cada tipo de vértice (rdf:type) com seus atributos e outra para cada tipo de relação *
* LPG: Uma tabela para cada tipo de vértice (label) com seus atributos e outra para cada tipo de relação com seus atributos *
 
II. UNIFIED STORAGE SCHEME
 
The entities and edges are clustered and stored in separate relation tables according to their types.
Due to the type-clustered storage scheme, the type-based queries, which include rdf:type, can be dramatically accelerated.
More specifically, the scope of the queries are narrowed to a certain relation table, instead of scanning all relation tables.
 
* Melhora para determinados tipos de consultas ... BGP Look UP. São as mais frequentes em quais cenários? * 
 
Furthermore, in order to support untyped queries, all vertices (edges) are simultaneously stored in the vertices (edges) table to facilitate global scans.
 
Vertex tables consist of two columns: the first column stores the IDs of the entities, while the second column stores the properties corresponding to the entities ....
 
* Como é o desempenho da busca usando os valores das propriedades? Como lidam com propriedades com diferentes tipos de dados? *
 
Meanwhile, edge tables consist of four columns, recording the IDs of the edges, the source vertices of the edges, the target vertices of the edges, and the properties corresponding to the edges, respectively.
 
* Como é o desempenho da busca usando os valores das propriedades? Como lidam com propriedades com diferentes tipos de dados? *
 
For the storage of RDF graphs, all triples are divided into three categories (i.e., triple division), each of which represents (1) the types of the entities, (2) the attributes of the entities, and (3) the relationship between entities. These triples can be transformed into (1) different types of vertex tables, (2) the attributes of the vertices, and (3) different types of edge tables, respectively.
 
* É o reverso do RDB2RDF *
 
For the storage of property graphs, the labels of the vertices or edges are transformed into different types of relations, i.e., label-to-type alignment, while the attributes are stored in the property columns. If an entity belongs to multiple types, the entity will be stored more than once in different type relations. 
 
* LPG segue a mesma abordagem sendo que trata as propriedades das arestas *
* Lembrando que a implementação do LPG no ThinkerPop também prevê propriedades nos atributos dos vértices *
 
The basic storage scheme described above strictly depends on the types of entities, however, in real-world KGs not all entities are necessarily has specified types. For untyped entities, in UniKG, we leverage a clustering method based on characteristic set [6] to assign them types, where entities are allocated into clusters according to their characteristicsets. The distances between two clusters can be calculated based on their predicate difference, which is equal to the number of occurrences of predicates that are not shared in these two clusters. One typed cluster and one untyped cluster with shortest distance are recursively merged until there are no more clusters that satisfy the conditions. Adopting the above method, a “type” that has the smallest difference can be assigned to untyped entities, so that the untyped entities can be stored in the same relation table with typed entities.
 
* Como lidar com entidades / vértices sem tipo e não permitir que isso gere um número exponencial de tabelas quando nem todos os vértices tem tipo definido * 
 
III. UNIFIED QUERY PROCESSING
 
In UniKG, we focus on the basic graph pattern matching queries (BGP), text search queries, graph analysis queries, and regular path queries in SPARQL and Cypher. In fact, in the UniKG, SPARQL and Cypher are regarded as syntactic views of unified semantics of KG queries. In other words, the query semantics of SPARQL and Cypher will be aligned to obtain the unified semantic tree, represented by relational algebra

 

* Operadores do "mundo" relacional ... E as consultas de caminho, como são tratadas? Não tem exemplo  e nem descrição. *
 
During query processing, (1) information indicating the types of vertices or edges is transformed into RENAME operator ρ, (2) information indicating the attributes of vertices or edges is transformed into SELECTION operators σ, and (3) information indicating the relationships between the vertices is transformed into JOIN operators ./, respectively. Finally, a PROJECTION operator π is applied to  handle the output results.
 
* Essas são as regras de conversão do artigo, estão em alto nível. Só tem esses operadores? É o suficienete para atender a todas as consultas? *
 
IV. RELATED SYSTEMS AND NOVELTY
 
RDF Graph Database System. ...  most systems only support BGP (Basic Graph Pattern) matching query and various aggregation functions, e.g., gStore [2], Virtuoso [9], and RDF-3X [10]. Text search or graph analysis queries are not available in the systems aforementioned, which are all supported in UniKG.
 
* Vários suportam consultas de caminho como o AllegroGraph no padrão SPARQL 1.1. O StarDog por exemplo tem um extensão para que esse tipo de consulta retorna um grafo e não somente os nós de origem e destino. * 
 
Property Graph Database System. ... Most property graph database systems are built upon native graph storage schemes, while relational model is used in UniKG to store KGs, which will provide support for transaction management and scalability. It is also noteworthy that text search or graph analysis queries are not available in these property graph database systems. 
 
* Que tipo de análises? Neo4J e TigerGraph possuem funcionalidades adicionais para Shortest Path, Page Rank, etc ... * 
 
V. DEMONSTRATIONS
 
We have implemented UniKG based on AgensGraph, which is a property graph database system.  ...
In order to import data sets into UniKG, we first preprocess the data with the characteristic-set-based clustering method to handle untyped entities and shred entities / edges into relations. 
 
* Esse processamento é em tempo de carga. O quanto isso onera a carga? *
 
Scenario 1. Query against KGs. Users can issue SPARQL or Cypher queries in the specific input boxes, ... hen, these queries will be translated into relational algebra using the procedure described in Sect.III, which will be efficiently executed by the UniKG query engine. Finally, the query results can be demonstrated in the form of JSON, table, or forcelayout graph, ... 

Multiple query features of SPARQL and Cypher are supported in UniKG, including BGP matching query (i.e., subgraph isomorphism), text search query (i.e., keyword query), graph analytical query (e.g., finding the shortest path between two vertices and calculating PageRank value among vertices), and regular path query (i.e., the path between two vertices conforming to a regular expression).
 
Scenario 2. Explain Queries of SPARQL and Cypher.
 
As mentioned in Sect.III, the SPARQL and Cypher query can be translated into a unified abstract query plan. The queries in  different syntax forms with the same intend are executed in UniKG with the same plan (without considering the execution orders).
In addition, UniKG provides explain input boxes for both SPARQL and Cypher. The unified query plan after explaining can be demonstrated in the form of JSON, table or graph, ...
 
* Explain para mostrar a consulta convertida na linguagem abstrata * 
 
Scenario 3. Translate Query Languages. In fact, SPARQL and Cypher can be considered as different syntactic views of the same query plan. Using the translation function between SPARQL and Cypher, a query in one language can be transformed into the equivalent query in the other language, which provides users multiple options to explore KGs.
 
* De / Para das linguagens é mostrado na interface *
 
* Não achei no GitHUB nem o software e nem o benchmark de comparação * 

Comentários

Postagens mais visitadas deste blog

Connected Papers: Uma abordagem alternativa para revisão da literatura

Durante um projeto de pesquisa podemos encontrar um artigo que nos identificamos em termos de problema de pesquisa e também de solução. Então surge a vontade de saber como essa área de pesquisa se desenvolveu até chegar a esse ponto ou quais desdobramentos ocorreram a partir dessa solução proposta para identificar o estado da arte nesse tema. Podemos seguir duas abordagens:  realizar uma revisão sistemática usando palavras chaves que melhor caracterizam o tema em bibliotecas digitais de referência para encontrar artigos relacionados ou realizar snowballing ancorado nesse artigo que identificamos previamente, explorando os artigos citados (backward) ou os artigos que o citam (forward)  Mas a ferramenta Connected Papers propõe uma abordagem alternativa para essa busca. O problema inicial é dado um artigo de interesse, precisamos encontrar outros artigos relacionados de "certa forma". Find different methods and approaches to the same subject Track down the state of the art rese...

Knowledge Graph Embedding with Triple Context - Leitura de Abstract

  Jun Shi, Huan Gao, Guilin Qi, and Zhangquan Zhou. 2017. Knowledge Graph Embedding with Triple Context. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM '17). Association for Computing Machinery, New York, NY, USA, 2299–2302. https://doi.org/10.1145/3132847.3133119 ABSTRACT Knowledge graph embedding, which aims to represent entities and relations in vector spaces, has shown outstanding performance on a few knowledge graph completion tasks. Most existing methods are based on the assumption that a knowledge graph is a set of separate triples, ignoring rich graph features, i.e., structural information in the graph. In this paper, we take advantages of structures in knowledge graphs, especially local structures around a triple, which we refer to as triple context. We then propose a Triple-Context-based knowledge Embedding model (TCE). For each triple, two kinds of structure information are considered as its context in the graph; one is the out...

KnOD 2021

Beyond Facts: Online Discourse and Knowledge Graphs A preface to the proceedings of the 1st International Workshop on Knowledge Graphs for Online Discourse Analysis (KnOD 2021, co-located with TheWebConf’21) https://ceur-ws.org/Vol-2877/preface.pdf https://knod2021.wordpress.com/   ABSTRACT Expressing opinions and interacting with others on the Web has led to the production of an abundance of online discourse data, such as claims and viewpoints on controversial topics, their sources and contexts . This data constitutes a valuable source of insights for studies into misinformation spread, bias reinforcement, echo chambers or political agenda setting. While knowledge graphs promise to provide the key to a Web of structured information, they are mainly focused on facts without keeping track of the diversity, connection or temporal evolution of online discourse data. As opposed to facts, claims are inherently more complex. Their interpretation strongly depends on the context and a vari...