Knowledge Graph OLAP: A Multidimensional Model and Query Operations for Contextualized Knowledge Graphs - Leitura de Artigo
Vídeo -> https://youtu.be/d8BNdEaL9bU
Git -> https://jku-win-dke.github.io/KG-OLAP/appendix/
Harald Sack, Christoph G. Schuetz, Loris Bozzato, Bernd Neumayr, Michael Schrefl, and Luciano Serafini. 2021. Knowledge Graph OLAP. Semant. web 12, 4 (2021), 649–683. https://doi.org/10.3233/SW-200419
KR 2021 - Knowledge Graph OLAP: A Multidimensional Model and Query Operations for Contextualized Knowledge Graphs
Abstract:
A knowledge graph (KG) represents real-world entities and their relationships.
The represented knowledge is often context-dependent, leading to the construction of contextualized KGs. The multidimensional and hierarchical nature of context invites comparison with the multidimensional OLAP cube model from data analysis.
Traditional systems for online analytical processing (OLAP) employ cube models to represent numeric values for further analysis using dedicated query operations.
In this paper, along with an adaptation of the OLAP cube model for KGs, we introduce an adaptation of the traditional OLAP query operations for the purposes of performing analysis over KGs. In particular, we decompose the roll-up operation from traditional OLAP into a merge and an abstraction operation. The merge operation corresponds to the selection of knowledge from different contexts whereas abstraction replaces entities with more general entities. The result of such a query is a more abstract, high-level view -- a management summary -- of the knowledge.
1. Introduction
The majority of a KG’s contents are facts/instances or assertional knowledge (ABox), although KGs may also include terminological /ontological knowledge (TBox) representing “the vocabulary used in the knowledge graph” in order to allow for “ontological reasoning and query answering” over the facts.
In a strive for successful management, KGs are increasingly subject to contextualization, i.e., the enrichment of facts with context metadata information such as time and location.
Frameworks such as the Contextualized Knowledge Repository (CKR) serve to organize knowledge within hierarchically ordered contexts along multiple contextual dimensions, e.g., spatial and temporal.
[CKR usou named graphs, reificação]
Similarly, context dimensions span a multidimensional space where each cell represents a context that comprises facts of a KG.
Based on the CKR framework, KG-OLAP extends the idea of Graph OLAP to the management of contextualized KGs. Unlike Graph OLAP, which deals with more structured graphs focused on the relationships between simple entities, KG-OLAP deals with more complex, semi-structured KGs with assertional and terminological components that must be adequately dealt with.
2. Use Case: Air Traffic Management
In this regard, situational awareness refers to a “person’s knowledge of particular task-related events and phenomena” [ 26 ], i.e., knowledge about the world relevant for ATM, which must be accurately represented and conveyed to the various stakeholders.
[Conhecimento de um domínio específico]
3. Multidimensional Model
In this section, we introduce the KG-OLAP cube model for the management of contextualized KGs. We
first introduce the model informally before providing a formal definition. We define the model as a specialization of the Contextualized Knowledge Repository (CKR) framework
3.1. KG-OLAP Cube Model
KG-OLAP adapts the multidimensional modeling paradigm from data warehousing in order to organize multidimensional KGs. Hence, the KG-OLAP cube is the central modeling element. Following the
basic structure of the CKR framework, the KG-OLAP cube consists of two distinct layers: an upper and a lower layer. The upper layer describes the structure and properties of a cube’s cells; the lower layer specifies cell contents. The two layers employ distinct and possibly disjoint languages.
The dimensions are hierarchically organized into levels. The definition of a cube’s dimensions and their hierarchical organization – the cube’s multidimensional structure – into levels is referred to as KG-OLAP cube schema.
3.2. Formalization
In the following, we adapt and extend the definitions of the CKR framework – building on the CKR definition in a generic description logic (DL) language – in order to fit the needs of KG-OLAP and
its query operations (see Section 4).
3.2.1. Basic Definitions
We first define the basic notions of a KG-OLAP cube before relating the KG-OLAP cube definitions to
the CKR framework. The multidimensional structure is expressed using a cube vocabulary Ω, which is a DL signature. Ω is composed of the mutually disjoint sets NRΩ of atomic roles, NCΩ of atomic concepts, and NIΩ of individual names. The vocabulary further specifies a set F ⊆ NIΩ of cell names, a set D ⊆ NRΩ of dimensions, a set L ⊆ NIΩ of levels, a set I ⊆ NIΩ of dimension members, and for every dimension E ∈ D, a set DE ⊆ I of dimension members of E. The cube language LΩ for expressing a KG-OLAP cube’s multidimensional structure is thus a DL language over cube vocabulary Ω.
For every dimension A ∈ D, we define the role ≺A of dimensional ordering for A as a strict partial order relation over dimension members DA, i.e., an irreflexive, transitive and anti symmetric role over couples 〈d, d′〉 ∈ DA × DA. In the following, we also employ the non-strict dimensional ordering A over DA. In general, we assume that each dimension is ordered in a simple hierarchy (or tree). Thus, if we denote with ̇≺A the direct successor relation in the dimensional ordering, we require that d ̇≺Ae1 and d ̇≺Ae2 implies e1 = e2, i.e., ̇≺A is functional, and we assume that, for every DA, there is a maximum, i.e., an all level with one all member.
We further formally define for every dimension A ∈ D its set LA ⊆ L of levels. We define the role ≺L
A as a strict order relation over LA and a role lev associating dimension members in DA to levels in LA.
[Não consigo compreender essa parte, formalizações em geral]
4. Query Operations
In this section, we introduce a set of query operations for working with KG-OLAP cubes. We distinguish between contextual and graph operations. Contextual operations alter the multidimensional structure of a cube. Graph operations modify the RDF graph in the knowledge modules of the cells. Formally, the operations are defined as transformations of KG-OLAP cubes.
4.1.1. Slice and Dice
The slice-and-dice operation restricts a cube to a set of cells with a specific subset of dimension attribute values; the operation selects a subcube of an input KG-OLAP cube. The slice-and-dice operation selects a partition of the cube for subsequent manipulation.
4.1.2. Merge
The merge (or contextual roll-up) operation changes the granularity of a cube and its dimensions. Given an argument granularity specified as a vector of dimension levels l, the merge operation combines the contents of knowledge modules at granularities that are more specific than the given granularity.
4.2. Graph Operations
Graph operations – abstraction, pivoting, and reification – alter the structure of the RDF graphs inside the knowledge modules of a cell.
Abstraction replaces sets of entities with individual and more abstract entities.
Pivoting moves metaknowledge (contextual information) inside the modules.
Reification allows to represent relations as individuals.
4.2.1. Abstraction
Abstraction serves as an umbrella term for a class of graph operations that, broadly speaking, replace entities in an RDF graph with more abstract entities. This abstraction is based on various types of ontological information, e.g., class membership and grouping properties.
We also refer to abstraction as ontological roll-up.
4.2.2. Pivoting
The pivoting operation attaches dimensional properties (dimension attribute values) of a cell to a specified set of individuals inside the cell’s object knowledge. Pivoting allows for the preservation of contextual knowledge in case of a merge operation.
4.2.3. Reification
The reification operation takes “triples” in the object knowledge of a cell and creates individuals that represent such triples. Reification allows for the preservation of duplicates in case of a union merge, which facilitates subsequent counting of occurrences in the course of the analysis. Furthermore, in combination with pivoting, the reification operation allows for attaching contextual information to context-dependent knowledge, preserving information about the context of a triple in case of a merge union.
5. Proof-of-Concept Implementation
In this section we sketch the foundations of a proof-of-concept implementation of a KG-OLAP system using off-the-shelf quad stores.
[Named Graphs suportados por Triplestores]
5.1. Architecture, Model, and Operations
A mapping of the formal language to an actual RDF representation allows for the storage of KG-OLAP
cubes in off-the-shelf quad stores with SPARQL realizations of the query operations. Context-aware rules serve to materialize roll-up relationships for levels and cells as well as inference and propagation of knowledge.
6. Related Work
Semantic technologies have been used for a variety of tasks in the context of OLAP. Related to KG-OLAP are techniques for data analysis over RDF data. The RDF data cube vocabulary (QB) [46] and its extension, QB4OLAP [ 47], provide an RDF representation format for publishing traditional OLAP cubes with numeric measures on the semantic web, with often SPARQL-based operators that emulate traditional OLAP queries ...
Other work has suggested “lenses” over RDF data for the purpose of RDF data analysis, i.e., analytical schemas which can be used for OLAP queries on RDF data. Similarly, superimposed multidimensional schemas define a mapping between a multidimensional model and a KG in order to allow for the formulation of OLAP queries.
Fusion cubes supplement traditional OLAP cubes with external data in RDF format, particularly linked open data where typically the data are not owned by the analyst. Fusion cubes are traditional OLAP cubes with numeric measures that can be populated dynamically with statistical data from RDF sources.
...
Closely related to KG-OLAP is Graph OLAP (also known as InfoNetOLAP) [17, 18 ], which through its informational and topological OLAP queries provides rich query facilities suitable for graph analysis. In Graph OLAP, graphs are associated with dimensional attributes, which yields a graph cube. The edges of the graphs themselves are weighted; the weights represent the measures to be analyzed. Typical applications of Graph OLAP are analysis of co-author and similar social graphs from different time periods, geographic locations, and so on.
....
Unlike KG-OLAP, existing work on graph and KG summarization largely ignores contextuality in KGs. In fact, existing work on KG summarization is orthogonal to the KG-OLAP approach. Consequently, future work may adapt summarization algorithms to serve as graph operators in KG-OLAP.
[Contexto é pouco explorado em outras abordagens]
Comentários
Postar um comentário
Sinta-se a vontade para comentar. Críticas construtivas são sempre bem vindas.