Angles, R., Aranda, C.B., Hogan, A., Rojas, C., Vrgoč, D. (2022). WDBench: A Wikidata Graph Query Benchmark. In: , et al. The Semantic Web – ISWC 2022. ISWC 2022. Lecture Notes in Computer Science, vol 13489. Springer, Cham. https://doi.org/10.1007/978-3-031-19433-7_41
Abstract We propose WDBench: a query benchmark for knowledge graphs based on Wikidata, featuring real-world queries extracted from the public query logs of the Wikidata SPARQL endpoint. ...WDBench thus focuses on three main query features that are common to SPARQL and graph databases: (i) basic graph patterns, (ii) optional graph patterns, (iii) path patterns, and (iv) navigational graph patterns. ... Using this benchmark, we present and compare performance results for evaluating queries using Blazegraph, Jena/Fuseki, Virtuoso and Neo4j.
[4 grupos de queries, CGP somente OPTIONAL]
1 Introduction
With may options available, it can be difficult to choose a suitable engine to support queries over a given knowledge graph, which calls for graph query benchmarks that reflect real-world workloads. For example, the Wikidata community is currently seeking an alternative to replace Blazegraph, whose development team has moved on to work on other projects.
[Problemas de escalabilidade e falta de manutenção do Blazegraph]
In this paper, we rather follow a feature-based approach: we generate a real-world benchmark by extracting a large and diverse set of queries from the query log of an open knowledge graph, but only for selected core features that are common to both SPARQL engines and graph databases [5]. Within these features, we define high-level subclasses in order to gain more detailed insights into the performance of different engines. The specific benchmark we propose here, which we call WDBench, is based on the Wikidata knowledge graph [44] and query logs [30].
[Consultas reais extraídas do log do WDQS]
2 Related Work
- Synthetic SPARQL-oriented benchmarks
- Synthetic graph database-oriented benchmarks
The Linked Data Benchmark Council’s Social Network Benchmark LDBC-SNB [19] is a benchmark that provides a common synthetic dataset for two different query workloads.
- Real-world RDF-oriented benchmarks
Comparison & novelty
We refer to Saleem et al. [37] for a detailed comparison of the benchmarks discussed here. In terms of the novelty of WDBench, it uses real-world data and queries; to the best of our knowledge, only DBSBM [31] and FEASIBLE [36] share this characteristic.
Unlike these two benchmarks, WDBench
(1) is based on Wikidata rather than DBpedia;
(2) uses a larger graph (1.257 billion triples/edges vs. 232 million triples/edges);
(3) contains path patterns that can match arbitrary length paths, which are a key feature of graph queries;
(4) is offered in both SPARQL and Cypher variants;
(5) does not apply templates or clustering, but rather contains a larger and more diverse query set that includes thousands of queries.
3 WDBench: Graph and Queries
3.1 WDBench Graph
To balance these criteria, we base WDBench on the Wikidata truthy dump [41] for three reasons:
(1) it is more concise and thus faster to load: some engines can take over a week to load the complete version of Wikidata;
(2) it is sufficient to address the majority of queries in the log chosen: 86.8% of the queries in this log use only truthy properties;
(3) it avoids issues relating to how Wikidata-specific qualifiers should be reified in different databases: this topic diverges from our goal of a general query benchmark for knowledge graphs and is addressed elsewhere [23,24]
[Não usaram qualificadores nas queries]
The dataset is available for download online at [3], and the scripts used to prune a truthy dump can be found online at [4].
3.2 WDBench Queries
The first choice we made was to concentrate exclusively on queries that timed out on the Wikidata endpoint (code 500 queries in the log files [30,47]). While endpoint timeouts can be caused by many factors (including temporary server load), we wish to focus on challenging queries, where this subset of queries largely filters out the multitude of trivial queries in the log. Additionally, focusing on the code 500 queries reduces the set to 122,980 queries.
[Priorizar consultas que geraram timeout]
The next reduction was based on the operators used by the queries. Considering that we aim to compare note only RDF/SPARQL engines, but also other graph databases, we decided to focus on four types of graph patterns at the core of popular graph query languages [5]: (i) basic graph patterns; (ii) optional graph patterns; (iii) path patterns; and (iv) navigational graph patterns (using paths).
...
We thus pruned queries that use any operator different from basic graph patterns, optionals and property paths.
[Não usar operadores não padrões em SPARQL]
3.3 Conversion to Cypher
This requires mapping the Wikidata graph to a property graph, which is straightforwardly achieved given that we only include binary (truthy) relations: each triple is simply represented as an edge in the property graph. The more complex part involves converting the queries to Cypher [20]: Neo4j’s query language.
[RDF->LPG, SPARQL->Cypher]
4 Running WDBench
How we ran the queries. To simulate a realistic database load, we do not split queries into cold/hot run segments. Rather we run them in succession, one after another, after a cold start of each system (and after cleaning the OS cache9). This simulates the fact that query performance can vary significantly based on the state of the system buffer, or even on the state of the hard drive, or the state of OS’s virtual memory.
[Não derrubaram entre queries, reaproveitar cache de DB e SO, é o natural do ambiente]
Handling timeouts. We defined a timeout of 1 minute per query for each system. This is a common limit available of SPARQL endpoints, so we replicated it in the benchmark.
[Ainda podem receber timeouts]
5 Experimental Results
5.1 Basic graph patterns
We begin by examining the performance of each query engine for basic graph patterns (BGPs), considering both Single and Multiple subsets.We can observe that as far as Single is concerned, Virtuoso is the most stable engine, returning no timeouts, nor errors, closely followed by Blazegraph.
In terms of performance, Blazegraph is the clear winner in the Single query set, followed thereafter by Virtuoso. Both Jena and Neo4j are lagging in terms of performance, with averages 4–5 times higher than the other two engines. Neo4j’s median is also above the third quartile for both Blazegraph and Virtuoso.
In terms of performance, Blazegraph is the clear winner in the Single query set, followed thereafter by Virtuoso. Both Jena and Neo4j are lagging in terms of performance, with averages 4–5 times higher than the other two engines. Neo4j’s median is also above the third quartile for both Blazegraph and Virtuoso.
[Virtuoso foi o mais estável]
5.2 Optional graph patterns
Blazegraph is the clear winner here, both in stability, with only 28 timeouts, and in speed, with its median being below the first quartile of the next best competitor, Jena. Jena also outperforms Virtuoso by a wide margin, and Neo4j trails further behind. Considering only well-designed OPTIONAL patterns, the performance of Virtuoso improves drastically. Blazegraph wins in terms of runtimes, but Virtuoso surpasses other engines in stability, timing out on only 5 of 390 queries. Non well-designed optionals seems to be a major issue for Virtuoso, where it times out in 64 of 108 cases, and its performance drops significantly.
[Blazegraph foi o melhor aqui]
5.3 Path patterns
Considering all property paths in the Paths query set, we can see that Virtuoso is the clear winner, both in stability and performance. Both Jena and Blazegraph trail some distance behind, and Neo4j is an order of magnitude slower in the median case. Similarly as in Single, we can observe that the form of the query (almost identical for all the queries in the set) does not matter much, but that the distribution of the data dictates query performance.
[Novamente Virtuoso]
5.4 Navigational graph patterns
When comparing recursive and non-recursive navigational graph patterns, we see different effects on different systems. Blazegraph is slightly slower for non-recursive queries, Jena is notably faster for non-recursive queries, Virtuoso is notably slower for non-recursive queries, and finally Neo4j is considerably slower for non-recursive queries.
[Aqui foi o Jena]
6 Conclusions
Results: We observed that Blazegraph and Virtuoso were the best-performing query engines for WDBench, followed by Jena, with Neo4j generally offering the slowest runtimes. Comparing Blazegraph and Virtuoso, the former is slightly faster than the latter for basic graph patterns, considerably faster for optional graph patterns (particularly for not well-designed patterns), considerably slower for path patterns (except the median case of non-recursive queries), and faster
in the median case but slower in the average case for navigational graph patterns.
in the median case but slower in the average case for navigational graph patterns.
[Blazegraph > Virtuoso > Jena > Neo4J]
Limitations and future directions: WDBench currently focuses on core features of graph queries, where languages such as SPARQL and Cypher include a wide range of other features that are frequently used in practice.
3. [Dataset] https://figshare.com/s/50b7544ad6b1f51de060.
4. [Queries] https://github.com/MillenniumDB/WDBench.
“Truthy statements represent statements that have the best non-deprecated rank for a given property. Namely, if there is a preferred statement for a property P, then only preferred statements for P will be considered truthy. Otherwise, all normal-rank statements for P are considered truthy.”
ResponderExcluir