Pular para o conteúdo principal

Querying knowledge graphs with extended property paths - Leitura de Artigo

Fionda, V., Pirrò, G., & Consens, M.P. (2019). Querying knowledge graphs with extended property paths. Semantic Web, 10, 1127-1168.

Abstract. The increasing number of Knowledge Graphs (KGs) available today calls for powerful query languages that can strike a balance between expressiveness and complexity of query evaluation, and that can be easily integrated into existing query processing infrastructures.

We present Extended Property Paths (EPPs), a significant enhancement of Property Paths (PPs), the navigational core included in the SPARQL query language. We introduce the EPPs syntax, which allows to capture in a succinct way a larger class of navigational queries than PPs and other navigational extensions of SPARQL, and provide formal semantics.

1. Introduction

While an early version of SPARQL did not provide explicit navigational capabilities that are crucial for querying graph-like data, the most recent version (SPARQL 1.1) incorporates Property Paths (PPs). The main goal of PPs is to allow the writing of navigational queries in a more succinct way and support basic transitive closure computations. However, it has been widely recognized that PPs offer very limited expressiveness; notably, PPs lack any form of tests within a path, a feature that can be very useful when dealing with graph data.  

In this paper we introduce Extended Property Paths (EPPs), a comprehensive language including a set of navigational features to extend the current navigational core of SPARQL.

1.1. EPPs by Example

Example 1. (Path Difference). Find pairs of cities located in the same country but not in the same region.

 ?x ((:country/ˆ:country)∼(:region/ˆ:region)) ?y

The symbol ^ denotes backward navigation from the object to the subject of a triple. 

Path difference ∼ enables to discard from the set of cities in the same country (i.e., :country/ˆ:country) those that are in the same region (i.e., :region/ˆ:region).

Example 2. (Path Conjunction). Find pairs of cities located in the same country and in the same region

?x ((:country/ˆ:country)&(:region/ˆ:region)) ?y  

In this case, path conjunction & enables to keep from the set of nodes satisfying the first subexpression those that also satisfy the second one. 

Example 3. (Tests). Find pairs of cities governed by the same political party founded before 2010

?x (:leaderParty&&TP(_o, :formationYear&&T(_o < 2010))/ˆ:leaderParty) ?y

TP denotes a test for the existence of a path whose parameters specify the position in the triple from which the test starts (_o denotes the object of the last traversed triple), and a path (in this case :formationYear&& T(_o < 2010)). The path is composed by logical AND (&&) of two tests.

Example 4. (Path Conjunction, Difference and Tests). Find pairs of cities located in the same country but not in the same region. Such cities must be governed by the same political party, which has been founded before 2010.

?x ((:country/ˆ:country)∼(:region/ˆ:region)) & (:leaderParty&&TP(_o, :formationYear&&T(_o < 2010)) /ˆ:leaderParty ?y

The advantage of using EPPs to write non-recursive navigational queries instead of writing them
directly into SPARQL is that the same request can be expressed more succinctly and without the need to deal with intermediate variables.

*  Novas operações para identificar caminhos de tamanho fixo que atualmente não são suportadas diretamente em property paths com o uso de expressões regulares mas que podem ser traduzidas para SPARQL. Permitem referenciar os nós intermediários dos caminhos com _o. *

SELECT ?x ?y WHERE {
{?x :country ?v1.?y :country ?v1.}
MINUS{?x :region ?v2.?y :region ?v2. }
?x :leaderParty ?v3. ?y :leaderParty ?v3.
FILTER EXISTS{?v3 :formationYear ?v4.
FILTER(?v4 < 2010)} }

SPARQL translation of the EPP expression in Example 4

Example 6. (Arbitrary Path Length with Tests). Find cities reachable from :Carrara connected via a path of arbitrary length composed by edges labels :twinned and considering only those cities reachable by a chain of intermediate cities having :population greater than 10000. The EPP expression capturing this request is: 

:Carrara (:twinned && TP(_o, :population&&T(_o>10000)) ) ?y

The expression involves arbitrary length paths plus tests.

The EPP expression in Example 6 cannot be translated into a basic SPARQL query because it makes use of the closure operator ∗ (requiring the evaluation of (:twinned &&TP (_o, :population&& T(_o>10000))) an a-priori unknown number of times). To give semantics to this kind of EPP expression we introduce the evaluation function EALP, which extends the function ALP defined for PPs in the SPARQL standard [10]. EPPs also support path repetitions (handled via EALP), that are a concise way of expressing the union of concatenations of an expression between a min and max number of times.

* Operações com caminhos de tamanho variável que não podem ser traduzidas diretamente para SPARQL *

Example 7. (Path Repetitions). If restricting the number of repetitions between 1 and 2, the expression
in Example 6 can be written as follows:

:Carrara (:twinned &&TP(_o, :population&&T(_o>10000)) ){{1, 2}}?y

Example 8. (EPPs within SPARQL). Find pairs of cities (A,B) and their populations such that: (i) A and
B are in the same country, but not in the same region; (ii) there exists some transportation from A to B.

SELECT ?cityA ?cityB ?popA ?popB WHERE {
?cityA :population ?popA.
?cityB :population ?popB.
{ /* BEGIN EPPs pattern */
?cityA ((:country/^country) ~(:region/^:region)) &:transportation ?cityB.
} /* END EPPs pattern */
}

The previous example does not take the KG RDFS schema into account. When considering transportation services without specifying the exact type of service, one would be able to actually discover the connection between :Rome and :Florence. This can be achieved by performing sub-property inference according to the RDFS entailment regime. One crucial aspect of EPPs is that they can capture the main RDFS inference types by encoding each inference rule in a prototypical EPP expression (see Section 5.2), with the advantage that the resulting expressions can be translated into SPARQL and evaluated on existing processors (via ALP).

Example 9. (EPPs and Reasoning). The EPPs in Example 8 can be automatically rewritten into an EPP
supporting RDFS reasoning as follows:

SELECT ?cityA ?cityB ?popA ?popB WHERE {
?cityA :population ?popA.
?cityB :population ?popB.
{ /* BEGIN EPPs pattern */
?cityA ((:country/^country) ~(:region/^:region)) &(TP(_p,(rdfs:sp*/rdfs:sp) && T(_o=:transportation)) || T(_p=:transportation))))) ?cityB.
}} /* END EPPs pattern */

* Permitem referenciar os predicados dos caminhos com _p. *

The translation to SPARQL this query is reported in Fig. 4. When this query is evaluated on the graph
in Fig. 1 it produces (?cityA→:Rome, ?cityB→:Florence, ?popA→2874034, ?popB→380226).

*** Poderia incluir uma referência a tripla com _t para suportar o RDF-Star e SPARQL-Star ***

SELECT ?since ?t
WHERE {
    ns:p3 ns:FRIENDS_OF+ ns:p2 AS PATH ?p
    FOR ALL ?p.?t (?t ns:date_of_start ?since. FILTER (?since > 2016)) .
}

?x (:friends_of&&TP(_t, :date_of_start&&T(_o >2016)))+ ?y 

1.2. Contributions and Organization

– We introduce two languages EPPs and iEPPs to query KGs. They have the same syntax but different semantics; one based on multisets (Section 3.2) and complying with SPARQL, and the other based on sets (Section 7.1).

– We provide a translation from non-recursive EPPs into SPARQL queries (Section 4). The benefit of our translation is twofold; on one hand, it allows to evaluate nEPPs (a larger class of queries than non-recursive PPs) using existing SPARQL processors; on the other hand, the usage of our translation paves the way toward readily incorporating EPPs in the current SPARQL standard.

– Building upon our translation, we also show how a SPARQL query can be rewritten into another SPARQL query that incorporates reasoning capabilities and can be evaluated on existing SPARQL processors (Section 5).

– We implement the nEPPs to SPARQL translation as an extension of the Jena library and an iEPPs query processor. Both are available on-line.

– We perform an extensive experimental evaluation on a variety of real data sets (Section 8).

2. Preliminaries
2.1. Background on SPARQL

Let V be a countably infinite set of variables, such that V ∩ (I ∪ L) = ∅

2.2. SPARQL Patterns
2.3. Semantics of SPARQL graph patterns

2.4. SPARQL Property Paths

Property paths (PPs) have been incorporated into the SPARQL standard with two main motivations; first, to provide explicit graph navigational capabilities (thus allowing the writing of SPARQL navigational queries in a more succinct way); second, to introduce the transitive closure operator ∗ previously not available in SPARQL.

Definition 11. (Property Path Pattern). A property path pattern (or PP pattern for short) is a tuple 〈α,elt,β〉 with α ∈ (I ∪ L ∪ V), β ∈ (I ∪ L ∪ V), and elt is a property path expression (PP expression) that is defined by the following grammar (where u,u1,...,un ∈ I):


The SPARQL standard distinguishes between two types of property path expressions: connectivity patterns (or recursive PPs) that include closure (*), and syntactic short forms or non-recursive PPs (nPPs) that do not include it. As for the evaluation of PPs, the W3C specification informally mentions the fact that nPPs can be evaluated via a translation into equivalent SPARQL basic expressions (see [10], Section 9.3). Property path patterns can be combined with graph patterns inside SPARQL patterns (using PP expressions in the middle position of a pattern).

2.5. Property Path Semantics
3. Extended Property Paths

3.1. Extended Property Paths Syntax

EPPs extend PPs and NRE-like languages with path conjunction/difference, repetitions and more types of tests. The importance of the new features considered by EPPs is witnessed by the fact that some of them (e.g., conjunction) are present in standards like XPath 2.0 [23].

Definition 12. (Extended Property Path Pattern). An extended property path pattern (or EPP pattern for short) is a tuple EP = 〈α,epp,β〉with α ∈ ( I ∪ L ∪ V), β ∈ (I ∪ L ∪ V), and epp an extended property path expression (EPP expression)

3.2. Extended Property Paths Semantics

We refer to non-recursive EPPs (nEPPs) as those expressions that do not include closure operators (i.e., * and +) and set-semantics repetitions ({l,h}).

Usage of EPPs in Practice. The overall goal of our proposal is to use EPP expressions in the predicate position of a property pattern (Definition 11) in lieu of PP expressions. This requires to “update" the SPARQL parser to support the nEPPs syntax. The aim of the Jena extension we implemented8 was to integrate nEPPs into an already existing (and popular) library. Clearly, while nEPPs expressions can be evaluated on current SPARQL processors, the evaluation of full EPPs expressions requires to also “update" query processors by replacing the ALP procedure with EALP.

4. Translation of nEPPs into SPARQL

4.1. Translation Algorithm: an overview

We now provide an overview of the translation algorithm At . The algorithm takes as input a nEPP pattern P =〈α,epp,β〉and produces a semantically equivalent SPARQL query Qe. The algorithm involves three main steps: (i) building of the operational tree; (ii) propagation of variables and terms along the nodes of the operational tree; (iii) application of the translation rules.

Discussion about the Translation

Conciseness. EPPs enable to write navigational queries in a more succinct way as compared to SPARQL queries using triple patterns and/or union of graph patterns. Given a nEPPs expression containing a number of fragments (e.g., concatenation, union, predicates) it is interesting to note that its corresponding translation in SPARQL is always more verbose.

5. Query-Based Reasoning on Existing SPARQL Processors

5.1. Capturing the Entailment Regime

In this paper we focus our attention on the ρdf fragment [19, 25]. This fragment considers a subset of RDFS vocabulary consisting of the following elements: rdfs:domain, rdfs:range, rdfs:type, rdfs:subClassOf, rdfs:subPropertyOf that we denote with dom, range, type, sc, and sp, respectively.

Definition 21. (SPARQL and query-based reasoning). Given a SPARQL pattern P and an RDF graph G, the evaluation of P over G under the ρdf semantics is denoted by ||P|| ρdf G , while ||P|| cl (G) denotes the evaluation of P over the closure of G

Lemma 22. (ρdf and SPARQL). Given a triple pattern (α,p,β) with α,β ∈ I ∪V and p ∈ I, then for every graph G we have that ||(α,p,β)|| ρdf  G = ||((α,Φ(p),β))|| G = ||(α,p,β)|| cl(G).

5.2. Query-based Reasoning on Existing Processors

The idea behind our approach, follows from the observation that closure operators appearing in Table 8 only involve single predicates i.e., sc+, sc*, sp+, sp*. Such types of expressions are property paths that (taken alone) can be evaluated via the ALP procedure defined in the W3C standard, and implemented in existing processors. Therefore, we need to rewrite the EPPs in Table 8 into SPARQL queries where recursive property paths with single predicates are used.

Lemma 23. Given a triple pattern P=〈α,p,β〉 with α,β ∈ I ∪ V and p ∈I, ||〈α,p,β〉|| cl(G) = || Atρ〈α,Φ(p),β〉|| G

The above result tells us that an algorithm to perform query-based reasoning works in three steps: (i) apply the translation function Φ(·); (ii) apply the translation Atρ(·) over the result of step (i); (iii) evaluate the SPARQL query resulting from (ii) on existing SPARQL processors.

6. Expressiveness Analysis

6.1. Expressive Power of Extended Property Paths vs. Property Paths

6.2. Expressiveness for Query-Based Reasoning

7. iEPPs: a SPARQL-independent Language

The aim of this section is to study EPPs as an independent language. The advantage of defining EPPs as a navigational language independent from SPARQL stems from the fact that the SPARQL-based semantics and translation discussed in Section 3.2 and Section 4 only apply to KGs based on RDF while the proposed language can be used to query arbitrary KGs.

7.2. Evaluation Algorithm

The aim of this section is to study whether the semantics in Fig. 15 can be implemented in an efficient way. In what follows we show an efficient evaluation algorithm, that has been implemented in a custom query evaluator, and discuss its complexity.

8. Experimental Evaluation

8.1. Translation Running Time

Our primary objective is to make practical the immediate adoption of EPPs as a query language for KGs. This objective is fulfilled by using our translation from nEPPs to SPARQL as front end to any existing SPARQL processor. To investigate the performance of the translation algorithm presented in Section 4, we show that our nEPPs to SPARQL translation performs comparably to the existing translations routinely performed by SPARQL processors.

While this suggests that the cost of our approach could be up to twice the cost of a direct nEPPs to algebra translation, keep in mind that we are comparing initial phases of query processing and these are typically much faster than subsequent phases.

8.2. Running time of iEPPs vs. Jena ARQ

We now compare the running time of the custom EPP processor implementing the iEPPs evaluation algorithm discussed in Section 7.2 against the translation-based approach described in Section 4 using Jena ARQ as underlying SPARQL query processor. This experiment gives insights about the pros and cons of evaluating EPPs into existing SPARQL processors as compared to the usage of the iEPPs custom query processor.

We created 4 groups Qi,i ∈ {1,...,4} of nEPP expressions each with 3 queries for a total of 12 queries (Q1-Q12) that are reported in Appendix B. The first group makes use of concatenation (‘/’) and path alternatives (‘|’); the second group also includes nesting (‘TP’); the third group includes path difference
(‘~’) and concatenation (‘/’); finally, the fourth group leverages path conjunction (‘&’) and concatenation (‘/’). These groups of queries allow to investigate the trade-off between expressiveness and running time.

The huge advantage of using nEPPs is that navigational queries can be written in a succinct way. Anecdotally, while the nEPPs asking for mutual friends (simple entailment) at distance 3 contains ∼200 characters, the SPARQL query (obtained from the translation) contains ∼700 characters; moreover, writing navigational queries directly in SPARQL requires to deal with a large number of variables that need to be consistently joined. We want to point out that the translated SPARQL queries have been automatically generated. It may be the case that manually written equivalent queries can be shorter.

8.3. Running Time of Query-Based Reasoning

We now move to a larger scale evaluation of the query-based reasoning approach described in Section 5. The goal is to compare the running time of queries with and without reasoning support. Even in this case we considered running time since it offers a reasonable summary of the overall performance of a query processing system. We also investigate the number of results returned.

9. Related Work

The idea of graph query languages is to use (variants of) regular expressions to express (in a compact way) navigational patterns (e.g., [13, 32–34]). Angles and Gutierrez [35], and Wood [36] provide surveys on the topic while Barceló provides a detailed overview of research in the field [37] while Angles et al. [7] describe a recent proposal. Our goal with EPPs is to extend the navigational core of SPARQL (i.e., PPs) and make the extension readily available for existing SPARQL processors.

9.1. SPARQL Navigational Extensions

Proposals to extend SPARQL with navigational features have been around for some time. Notable examples are PSPARQL [21] and nSPARQL [16] that tackled this problem even before the standardization of property paths (PPs) as SPARQL navigational core.
...
Since our main goal is to extend the navigational core of SPARQL we focus on the comparison between EPPs and other SPARQL navigational extensions. We compare EPPs with PPs, cpSPARQL [21], recSPARQL [15], RDFPath [38], nSPARQL-NREs [16],and star-free Nested Regular Expressions (sfNREs) that extends NREs with negation [39].

A crucial difference between EPPs and related research is that we tackle the problem of extending the SPARQL language in the least intrusive way. We show that there exists a precise fragment of SPARQL that is expressive enough to capture non recursive EPPs (nEPPs), that is, EPPs that do not use closure operators (i.e., * and +). Therefore, following the same line of the SPARQL standard where non-recursive PPs are translated into SPARQL queries, we devised a translation from (concise) nEPPs into (more verbose) SPARQL queries.

9.2. Other Navigational Languages

Besides SPARQL navigational extensions there exist other graph languages like GraphQL [41] the Facebook query language. However, this language departs from the SPARQL standard and it is not clear how reasoning is supported.

[7] R. Angles, M. Arenas, G.H. Fletcher, C. Gutierrez, T. Lindaaker, M. Paradies, S. Plantikow, J. Sequeda, O. van Rest, H. Voigt et al., G-CORE: A Core for Future Graph Query Languages, in: SIGMOD, 2018.
[36] P.T. Wood, Query Languages for Graph Databases., SIGMOD Rec. (2012).
[37] P. Barceló, Querying Graph Databases, in: Proc. of PODS, 2013.
[41] O. Hartig and J. Pérez, Semantics and Complexity of GraphQL, in: Proceedings of the 2018 World Wide Web Conference on World Wide Web, International World Wide Web Conferences Steering Committee, 2018, pp. 1155–1164


Comentários

Postagens mais visitadas deste blog

Aula 12: WordNet | Introdução à Linguagem de Programação Python *** com NLTK

 Fonte -> https://youtu.be/0OCq31jQ9E4 A WordNet do Brasil -> http://www.nilc.icmc.usp.br/wordnetbr/ NLTK  synsets = dada uma palavra acha todos os significados, pode informar a língua e a classe gramatical da palavra (substantivo, verbo, advérbio) from nltk.corpus import wordnet as wn wordnet.synset(xxxxxx).definition() = descrição do significado É possível extrair hipernimia, hiponimia, antonimos e os lemas (diferentes palavras/expressões com o mesmo significado) formando uma REDE LEXICAL. Com isso é possível calcular a distância entre 2 synset dentro do grafo.  Veja trecho de código abaixo: texto = 'útil' print('NOUN:', wordnet.synsets(texto, lang='por', pos=wordnet.NOUN)) texto = 'útil' print('ADJ:', wordnet.synsets(texto, lang='por', pos=wordnet.ADJ)) print(wordnet.synset('handy.s.01').definition()) texto = 'computador' for synset in wn.synsets(texto, lang='por', pos=wn.NOUN):     print('DEF:',s...

truth makers AND truth bearers - Palestra Giancarlo no SBBD

Dando uma googada https://iep.utm.edu/truth/ There are two commonly accepted constraints on truth and falsehood:     Every proposition is true or false.         [Law of the Excluded Middle.]     No proposition is both true and false.         [Law of Non-contradiction.] What is the difference between a truth-maker and a truth bearer? Truth-bearers are either true or false; truth-makers are not since, not being representations, they cannot be said to be true, nor can they be said to be false . That's a second difference. Truth-bearers are 'bipolar,' either true or false; truth-makers are 'unipolar': all of them obtain. What are considered truth bearers?   A variety of truth bearers are considered – statements, beliefs, claims, assumptions, hypotheses, propositions, sentences, and utterances . When I speak of a fact . . . I mean the kind of thing that makes a proposition true or false. (Russe...

DGL-KE : Deep Graph Library (DGL)

Fonte: https://towardsdatascience.com/introduction-to-knowledge-graph-embedding-with-dgl-ke-77ace6fb60ef Amazon recently launched DGL-KE, a software package that simplifies this process with simple command-line scripts. With DGL-KE , users can generate embeddings for very large graphs 2–5x faster than competing techniques. DGL-KE provides users the flexibility to select models used to generate embeddings and optimize performance by configuring hardware, data sampling parameters, and the loss function. To use this package effectively, however, it is important to understand how embeddings work and the optimizations available to compute them. This two-part blog series is designed to provide this information and get you ready to start taking advantage of DGL-KE . Finally, another class of graphs that is especially important for knowledge graphs are multigraphs . These are graphs that can have multiple (directed) edges between the same pair of nodes and can also contain loops. The...