Fionda, V., Pirrò, G., & Consens, M.P. (2019). Querying knowledge graphs with extended property paths. Semantic Web, 10, 1127-1168.
Abstract. The increasing number of Knowledge Graphs (KGs) available today calls for powerful query languages that can strike a balance between expressiveness and complexity of query evaluation, and that can be easily integrated into existing query processing infrastructures.
We present Extended Property Paths (EPPs), a significant enhancement of Property Paths (PPs), the navigational core included in the SPARQL query language. We introduce the EPPs syntax, which allows to capture in a succinct way a larger class of navigational queries than PPs and other navigational extensions of SPARQL, and provide formal semantics.
1. Introduction
While an early version of SPARQL did not provide explicit navigational capabilities that are crucial for querying graph-like data, the most recent version (SPARQL 1.1) incorporates Property Paths (PPs). The main goal of PPs is to allow the writing of navigational queries in a more succinct way and support basic transitive closure computations. However, it has been widely recognized that PPs offer very limited expressiveness; notably, PPs lack any form of tests within a path, a feature that can be very useful when dealing with graph data.
In this paper we introduce Extended Property Paths (EPPs), a comprehensive language including a set of navigational features to extend the current navigational core of SPARQL.
1.1. EPPs by Example
Example 1. (Path Difference). Find pairs of cities located in the same country but not in the same region.
?x ((:country/ˆ:country)∼(:region/ˆ:region)) ?y
The symbol ^ denotes backward navigation from the object to the subject of a triple.
Path difference ∼ enables to discard from the set of cities in the same country (i.e., :country/ˆ:country) those that are in the same region (i.e., :region/ˆ:region).
Example 2. (Path Conjunction). Find pairs of cities located in the same country and in the same region
?x ((:country/ˆ:country)&(:region/ˆ:region)) ?y
In this case, path conjunction & enables to keep from the set of nodes satisfying the first subexpression those that also satisfy the second one.
Example 3. (Tests). Find pairs of cities governed by the same political party founded before 2010
?x (:leaderParty&&TP(_o, :formationYear&&T(_o < 2010))/ˆ:leaderParty) ?y
TP denotes a test for the existence of a path whose parameters specify the position in the triple from which the test starts (_o denotes the object of the last traversed triple), and a path (in this case :formationYear&& T(_o < 2010)). The path is composed by logical AND (&&) of two tests.
Example 4. (Path Conjunction, Difference and Tests). Find pairs of cities located in the same country but not in the same region. Such cities must be governed by the same political party, which has been founded before 2010.
?x ((:country/ˆ:country)∼(:region/ˆ:region)) & (:leaderParty&&TP(_o, :formationYear&&T(_o < 2010)) /ˆ:leaderParty ?y
The advantage of using EPPs to write non-recursive navigational queries instead of writing them
directly into SPARQL is that the same request can be expressed more succinctly and without the need to deal with intermediate variables.
* Novas operações para identificar caminhos de tamanho fixo que atualmente não são suportadas diretamente em property paths com o uso de expressões regulares mas que podem ser traduzidas para SPARQL. Permitem referenciar os nós intermediários dos caminhos com _o. *
SELECT ?x ?y WHERE {
{?x :country ?v1.?y :country ?v1.}
MINUS{?x :region ?v2.?y :region ?v2. }
?x :leaderParty ?v3. ?y :leaderParty ?v3.
FILTER EXISTS{?v3 :formationYear ?v4.
FILTER(?v4 < 2010)} }
Example 6. (Arbitrary Path Length with Tests). Find cities reachable from :Carrara connected via a path of arbitrary length composed by edges labels :twinned and considering only those cities reachable by a chain of intermediate cities having :population greater than 10000. The EPP expression capturing this request is:
:Carrara (:twinned && TP(_o, :population&&T(_o>10000)) )∗ ?y
The expression involves arbitrary length paths plus tests.
The EPP expression in Example 6 cannot be translated into a basic SPARQL query because it makes use of the closure operator ∗ (requiring the evaluation of (:twinned &&TP (_o, :population&& T(_o>10000))) an a-priori unknown number of times). To give semantics to this kind of EPP expression we introduce the evaluation function EALP, which extends the function ALP defined for PPs in the SPARQL standard [10]. EPPs also support path repetitions (handled via EALP), that are a concise way of expressing the union of concatenations of an expression between a min and max number of times.
* Operações com caminhos de tamanho variável que não podem ser traduzidas diretamente para SPARQL *
Example 7. (Path Repetitions). If restricting the number of repetitions between 1 and 2, the expression
in Example 6 can be written as follows:
:Carrara (:twinned &&TP(_o, :population&&T(_o>10000)) ){{1, 2}}?y
Example 8. (EPPs within SPARQL). Find pairs of cities (A,B) and their populations such that: (i) A and
B are in the same country, but not in the same region; (ii) there exists some transportation from A to B.
SELECT ?cityA ?cityB ?popA ?popB WHERE {
?cityA :population ?popA.
?cityB :population ?popB.
{ /* BEGIN EPPs pattern */
?cityA ((:country/^country) ~(:region/^:region)) &:transportation ?cityB.
} /* END EPPs pattern */
}
The previous example does not take the KG RDFS schema into account. When considering transportation services without specifying the exact type of service, one would be able to actually discover the connection between :Rome and :Florence. This can be achieved by performing sub-property inference according to the RDFS entailment regime. One crucial aspect of EPPs is that they can capture the main RDFS inference types by encoding each inference rule in a prototypical EPP expression (see Section 5.2), with the advantage that the resulting expressions can be translated into SPARQL and evaluated on existing processors (via ALP).
Example 9. (EPPs and Reasoning). The EPPs in Example 8 can be automatically rewritten into an EPP
supporting RDFS reasoning as follows:
SELECT ?cityA ?cityB ?popA ?popB WHERE {
?cityA :population ?popA.
?cityB :population ?popB.
{ /* BEGIN EPPs pattern */
?cityA ((:country/^country) ~(:region/^:region)) &(TP(_p,(rdfs:sp*/rdfs:sp) && T(_o=:transportation)) || T(_p=:transportation))))) ?cityB.
}} /* END EPPs pattern */
* Permitem referenciar os predicados dos caminhos com _p. *
The translation to SPARQL this query is reported in Fig. 4. When this query is evaluated on the graph
in Fig. 1 it produces (?cityA→:Rome, ?cityB→:Florence, ?popA→2874034, ?popB→380226).
*** Poderia incluir uma referência a tripla com _t para suportar o RDF-Star e SPARQL-Star ***
SELECT ?since ?t
WHERE {
ns:p3 ns:FRIENDS_OF+ ns:p2 AS PATH ?p
FOR ALL ?p.?t (?t ns:date_of_start ?since. FILTER (?since > 2016)) .
}
?x (:friends_of&&TP(_t, :date_of_start&&T(_o >2016)))+ ?y
1.2. Contributions and Organization
– We introduce two languages EPPs and iEPPs to query KGs. They have the same syntax but different semantics; one based on multisets (Section 3.2) and complying with SPARQL, and the other based on sets (Section 7.1).
– We provide a translation from non-recursive EPPs into SPARQL queries (Section 4). The benefit of our translation is twofold; on one hand, it allows to evaluate nEPPs (a larger class of queries than non-recursive PPs) using existing SPARQL processors; on the other hand, the usage of our translation paves the way toward readily incorporating EPPs in the current SPARQL standard.
– Building upon our translation, we also show how a SPARQL query can be rewritten into another SPARQL query that incorporates reasoning capabilities and can be evaluated on existing SPARQL processors (Section 5).
– We implement the nEPPs to SPARQL translation as an extension of the Jena library and an iEPPs query processor. Both are available on-line.
– We perform an extensive experimental evaluation on a variety of real data sets (Section 8).
2. Preliminaries
2.1. Background on SPARQL
Let V be a countably infinite set of variables, such that V ∩ (I ∪ L) = ∅
2.2. SPARQL Patterns
2.3. Semantics of SPARQL graph patterns
2.4. SPARQL Property Paths
Property paths (PPs) have been incorporated into the SPARQL standard with two main motivations; first, to provide explicit graph navigational capabilities (thus allowing the writing of SPARQL navigational queries in a more succinct way); second, to introduce the transitive closure operator ∗ previously not available in SPARQL.
Definition 11. (Property Path Pattern). A property path pattern (or PP pattern for short) is a tuple 〈α,elt,β〉 with α ∈ (I ∪ L ∪ V), β ∈ (I ∪ L ∪ V), and elt is a property path expression (PP expression) that is defined by the following grammar (where u,u1,...,un ∈ I):
The SPARQL standard distinguishes between two types of property path expressions: connectivity patterns (or recursive PPs) that include closure (*), and syntactic short forms or non-recursive PPs (nPPs) that do not include it. As for the evaluation of PPs, the W3C specification informally mentions the fact that nPPs can be evaluated via a translation into equivalent SPARQL basic expressions (see [10], Section 9.3). Property path patterns can be combined with graph patterns inside SPARQL patterns (using PP expressions in the middle position of a pattern).
3. Extended Property Paths
3.1. Extended Property Paths Syntax
EPPs extend PPs and NRE-like languages with path conjunction/difference, repetitions and more types of tests. The importance of the new features considered by EPPs is witnessed by the fact that some of them (e.g., conjunction) are present in standards like XPath 2.0 [23].
Definition 12. (Extended Property Path Pattern). An extended property path pattern (or EPP pattern for short) is a tuple EP = 〈α,epp,β〉with α ∈ ( I ∪ L ∪ V), β ∈ (I ∪ L ∪ V), and epp an extended property path expression (EPP expression)
3.2. Extended Property Paths Semantics
We refer to non-recursive EPPs (nEPPs) as those expressions that do not include closure operators (i.e., * and +) and set-semantics repetitions ({l,h}).
Usage of EPPs in Practice. The overall goal of our proposal is to use EPP expressions in the predicate position of a property pattern (Definition 11) in lieu of PP expressions. This requires to “update" the SPARQL parser to support the nEPPs syntax. The aim of the Jena extension we implemented8 was to integrate nEPPs into an already existing (and popular) library. Clearly, while nEPPs expressions can be evaluated on current SPARQL processors, the evaluation of full EPPs expressions requires to also “update" query processors by replacing the ALP procedure with EALP.
4. Translation of nEPPs into SPARQL
4.1. Translation Algorithm: an overview
We now provide an overview of the translation algorithm At . The algorithm takes as input a nEPP pattern P =〈α,epp,β〉and produces a semantically equivalent SPARQL query Qe. The algorithm involves three main steps: (i) building of the operational tree; (ii) propagation of variables and terms along the nodes of the operational tree; (iii) application of the translation rules.
Discussion about the Translation
Conciseness. EPPs enable to write navigational queries in a more succinct way as compared to SPARQL queries using triple patterns and/or union of graph patterns. Given a nEPPs expression containing a number of fragments (e.g., concatenation, union, predicates) it is interesting to note that its corresponding translation in SPARQL is always more verbose.
5. Query-Based Reasoning on Existing SPARQL Processors
5.1. Capturing the Entailment Regime
In this paper we focus our attention on the ρdf fragment [19, 25]. This fragment considers a subset of RDFS vocabulary consisting of the following elements: rdfs:domain, rdfs:range, rdfs:type, rdfs:subClassOf, rdfs:subPropertyOf that we denote with dom, range, type, sc, and sp, respectively.
Definition 21. (SPARQL and query-based reasoning). Given a SPARQL pattern P and an RDF graph G, the evaluation of P over G under the ρdf semantics is denoted by ||P|| ρdf G , while ||P|| cl (G) denotes the evaluation of P over the closure of G
Lemma 22. (ρdf and SPARQL). Given a triple pattern (α,p,β) with α,β ∈ I ∪V and p ∈ I, then for every graph G we have that ||(α,p,β)|| ρdf G = ||((α,Φ(p),β))|| G = ||(α,p,β)|| cl(G).
5.2. Query-based Reasoning on Existing Processors
The idea behind our approach, follows from the observation that closure operators appearing in Table 8 only involve single predicates i.e., sc+, sc*, sp+, sp*. Such types of expressions are property paths that (taken alone) can be evaluated via the ALP procedure defined in the W3C standard, and implemented in existing processors. Therefore, we need to rewrite the EPPs in Table 8 into SPARQL queries where recursive property paths with single predicates are used.
Lemma 23. Given a triple pattern P=〈α,p,β〉 with α,β ∈ I ∪ V and p ∈I, ||〈α,p,β〉|| cl(G) = || Atρ〈α,Φ(p),β〉|| G
The above result tells us that an algorithm to perform query-based reasoning works in three steps: (i) apply the translation function Φ(·); (ii) apply the translation Atρ(·) over the result of step (i); (iii) evaluate the SPARQL query resulting from (ii) on existing SPARQL processors.
6. Expressiveness Analysis
6.1. Expressive Power of Extended Property Paths vs. Property Paths
6.2. Expressiveness for Query-Based Reasoning
7. iEPPs: a SPARQL-independent Language
The aim of this section is to study EPPs as an independent language. The advantage of defining EPPs as a navigational language independent from SPARQL stems from the fact that the SPARQL-based semantics and translation discussed in Section 3.2 and Section 4 only apply to KGs based on RDF while the proposed language can be used to query arbitrary KGs.
7.2. Evaluation Algorithm
The aim of this section is to study whether the semantics in Fig. 15 can be implemented in an efficient way. In what follows we show an efficient evaluation algorithm, that has been implemented in a custom query evaluator, and discuss its complexity.
8. Experimental Evaluation
8.1. Translation Running Time
Our primary objective is to make practical the immediate adoption of EPPs as a query language for KGs. This objective is fulfilled by using our translation from nEPPs to SPARQL as front end to any existing SPARQL processor. To investigate the performance of the translation algorithm presented in Section 4, we show that our nEPPs to SPARQL translation performs comparably to the existing translations routinely performed by SPARQL processors.
While this suggests that the cost of our approach could be up to twice the cost of a direct nEPPs to algebra translation, keep in mind that we are comparing initial phases of query processing and these are typically much faster than subsequent phases.
8.2. Running time of iEPPs vs. Jena ARQ
We now compare the running time of the custom EPP processor implementing the iEPPs evaluation algorithm discussed in Section 7.2 against the translation-based approach described in Section 4 using Jena ARQ as underlying SPARQL query processor. This experiment gives insights about the pros and cons of evaluating EPPs into existing SPARQL processors as compared to the usage of the iEPPs custom query processor.
We created 4 groups Qi,i ∈ {1,...,4} of nEPP expressions each with 3 queries for a total of 12 queries (Q1-Q12) that are reported in Appendix B. The first group makes use of concatenation (‘/’) and path alternatives (‘|’); the second group also includes nesting (‘TP’); the third group includes path difference
(‘~’) and concatenation (‘/’); finally, the fourth group leverages path conjunction (‘&’) and concatenation (‘/’). These groups of queries allow to investigate the trade-off between expressiveness and running time.
The huge advantage of using nEPPs is that navigational queries can be written in a succinct way. Anecdotally, while the nEPPs asking for mutual friends (simple entailment) at distance 3 contains ∼200 characters, the SPARQL query (obtained from the translation) contains ∼700 characters; moreover, writing navigational queries directly in SPARQL requires to deal with a large number of variables that need to be consistently joined. We want to point out that the translated SPARQL queries have been automatically generated. It may be the case that manually written equivalent queries can be shorter.
8.3. Running Time of Query-Based Reasoning
We now move to a larger scale evaluation of the query-based reasoning approach described in Section 5. The goal is to compare the running time of queries with and without reasoning support. Even in this case we considered running time since it offers a reasonable summary of the overall performance of a query processing system. We also investigate the number of results returned.
9. Related Work
The idea of graph query languages is to use (variants of) regular expressions to express (in a compact way) navigational patterns (e.g., [13, 32–34]). Angles and Gutierrez [35], and Wood [36] provide surveys on the topic while Barceló provides a detailed overview of research in the field [37] while Angles et al. [7] describe a recent proposal. Our goal with EPPs is to extend the navigational core of SPARQL (i.e., PPs) and make the extension readily available for existing SPARQL processors.
9.1. SPARQL Navigational Extensions
Proposals to extend SPARQL with navigational features have been around for some time. Notable examples are PSPARQL [21] and nSPARQL [16] that tackled this problem even before the standardization of property paths (PPs) as SPARQL navigational core.
...
Since our main goal is to extend the navigational core of SPARQL we focus on the comparison between EPPs and other SPARQL navigational extensions. We compare EPPs with PPs, cpSPARQL [21], recSPARQL [15], RDFPath [38], nSPARQL-NREs [16],and star-free Nested Regular Expressions (sfNREs) that extends NREs with negation [39].
A crucial difference between EPPs and related research is that we tackle the problem of extending the SPARQL language in the least intrusive way. We show that there exists a precise fragment of SPARQL that is expressive enough to capture non recursive EPPs (nEPPs), that is, EPPs that do not use closure operators (i.e., * and +). Therefore, following the same line of the SPARQL standard where non-recursive PPs are translated into SPARQL queries, we devised a translation from (concise) nEPPs into (more verbose) SPARQL queries.
9.2. Other Navigational Languages
Besides SPARQL navigational extensions there exist other graph languages like GraphQL [41] the Facebook query language. However, this language departs from the SPARQL standard and it is not clear how reasoning is supported.
[7] R. Angles, M. Arenas, G.H. Fletcher, C. Gutierrez, T. Lindaaker, M. Paradies, S. Plantikow, J. Sequeda, O. van Rest, H. Voigt et al., G-CORE: A Core for Future Graph Query Languages, in: SIGMOD, 2018.
[36] P.T. Wood, Query Languages for Graph Databases., SIGMOD Rec. (2012).
[37] P. Barceló, Querying Graph Databases, in: Proc. of PODS, 2013.
[41] O. Hartig and J. Pérez, Semantics and Complexity of GraphQL, in: Proceedings of the 2018 World Wide Web Conference on World Wide Web, International World Wide Web Conferences Steering Committee, 2018, pp. 1155–1164
Comentários
Postar um comentário
Sinta-se a vontade para comentar. Críticas construtivas são sempre bem vindas.