Pular para o conteúdo principal

Knowledge Graphs: Research Directions - Leitura de Artigo

Hogan, A. (2020). Knowledge Graphs: Research Directions. In: Manna, M., Pieris, A. (eds) Reasoning Web. Declarative Artificial Intelligence. Reasoning Web 2020. Lecture Notes in Computer Science(), vol 12258. Springer, Cham. https://doi.org/10.1007/978-3-030-60067-9_8

Abstract. We discuss six high-level concepts relating to knowledge graphs: data models, queries, ontologies, rules, embeddings and graph neural networks.

1 Introduction

However, underlying all such perspectives is the foundational idea of representing knowledge using a graph abstraction, with nodes representing entities of interest in a given domain, and edges representing relations between those entities. ...
Knowledge can consist of simple assertions, such as Charon orbits Pluto, which can be represented as directed labelled edges in a graph. Knowledge may also consist of quantified assertions, such as all stellar planets orbit stars, which require a more expressive formalism to capture, such as an ontology or rule language.

[Diferentes tipos de afirmações, afirmar sobre a instância é diferente de afirmar sobre uma classe ou sobre uma subclasse]

A reader familiar with related concepts may then question: what is the difference between a graph database and a knowledge graph, or an ontology and a knowledge graph? Per the previous definition, both graph databases and ontologies can be considered knowledge graphs. It is then natural to ask: what, then, is new about knowledge graphs? What distinguishes research on knowledge graphs from research on graph databases, or ontologies, etc.?

[Não responde a pergunta, destacar que esse indefinição é interessante para que cada area desenvolva o tema de acordo com as suas perspectivas .... achei estranho !!!!]

Anecdotally, many of the most influential papers on knowledge graphs have emerged from the machine learning community, particularly relating to two main techniques: knowledge graph embeddings and graph neural networks (which we will discuss later). These works are not particularly related to graph databases (they do not consider regular expressions on paths, graph pattern matching, etc.) nor are they particularly related to ontologies (they do not consider interpretations, entailments, etc.), but by exploiting knowledge in the context of large-scale graph-structured data, they are related to knowledge graphs.

[Até a definição de hiper relacional eu achei somente em papers de ML]

Subsequently, one can find relations between these ostensibly orthogonal techniques; for example, while graph neural networks are used to inductively classify nodes in the graph based on numerical methods, ontologies can be used to deductively classify nodes in the graph based on symbolic methods, which raises natural questions about how they might compare.

[GNN para raciocineo indutivo e Ontologias para raciocineo dedutivo, são complementares e podem ser comparados]

2 Data Model

Knowledge graphs assume a graph-structured data model. The high-level benefits of modelling data as graphs are as follows:
– Graphs offer a more intuitive abstraction of certain domains than alternative data models;
– When compared with (normalised) relational schemas, graphs offer a simpler and more flexible way to represent diverse, incomplete, evolving data, which can be defined independently of a particular schema.
– Graphs from different sources can be initially combined by taking the union of the constituent elements (nodes, edges, etc.).
– Unlike other semi-structured data models based on trees, graphs allow for representing and working with cyclic relations and do not assume a hierarchy.
– Graphs offer a more intuitive representation for querying and working with paths that represent indirect relations between entities.

[Vantagens de modelar em Grafo ... relacionamentos como tão importantes quanto as entidades e seus atributos]

Topic 2.1: Representing complex data

Still there are open questions regarding how units can be represented, or how levels of precision should be handled. Although proposals have been made to represent units and precision in graphs, often knowledge graphs will adopt an ad hoc representation. There are also open questions regarding how to perform querying and reasoning over statistical data, with automatic translation of units and handling of precision.

[A proposta de idioma inclui @xx no objeto e a de data type inclui ^^xsd:yy]

A more general question arises: how should graph models be extended (if at all) in order to be able to concisely, intuitively and comprehensively model diverse, real-world data? Should we allow only simple terms and use complex graph structures to model such data, or should we support complex terms – e.g., allowing tuples, arrays, tables, trees, edges, graphs, etc., as nodes – to simplify the graph structure? How can we explore such trade-offs and evaluate different proposals? How can we apply querying, reasoning, machine learning, etc., on these different representations?

[Grafos como nós é hipernó, valores como lista ou uma aresta para cada valor (e se a lista for ordenada como autores de um paper)]

Topic 2.2: Semantics of property graphs

While the semantics of del graphs have long been explored, the semantics of property graphs are less well understood. In reference to Figure 2, for example, we might ask if the child relation between Proxima Centauri and Proxima b holds. Intuitively we might assume that it does, but we should keep in mind that we may have property–value pairs on the edge that state deprecated = true, or probability = 0.1, for example. An open research question then relates to defining the semantics of property graphs, taking into consideration the semantics of labels and property–value pairs on nodes and edges. (We refer the interested reader to Attributed Description Logics [39], which make progress in this direction.)

[OLHAR essa referência, já apareceu esse assunto antes, as propriedades das arestas interferem na interpretação da aresta]
[Qual é a semântica dos qualificadores do KG? Alguns são para relações n-arias pq completam a afirmação e outros são para contexto]


3 Queries

[BGP, CGP, RPQ, NGP]

Topic 3.2: Native graph querying

Graph patterns return tables of solutions rather than graphs. An interesting research topic to explore is then to consider query languages based on composable operators that transform and return native graph objects – i.e., nodes, edges, paths, graphs, etc. – rather than tables

[SPARQL não retorna grafo (CONSTRUCT funciona para BGP mas não para RPQ), Cypher retorna grafos de caminho]

4 Ontologies

Looking more closely at the graph of Figure 1, we may be able to deduce additional knowledge beyond what is explicitly stated in the graph. ... In order to be able to draw such conclusions automatically from a graph, we require a formal representation of the semantics of the terms used in the graph. While there are a number of alternatives that we might consider for this task, we initially focus on ontologies, which have long been studied in the context of defining semantics over graphs.

[A semântica, ou seja, como deve ser interpretado um KG em RDF pode ser especificado pela Ontologia]

Ontologies centre around three main types of elements: individuals, concepts (aka classes) and roles (aka properties). They allow for making formal, well-defined claims about such elements, which in turn opens up the possibility to perform automated reasoning. In terms of the semantics of ontologies, there are many well-defined syntaxes we can potentially use, but perhaps the most established formalism is taken from Description Logics (DL) ...

Definition 9 (DL ontology). A DL ontology O is defined as a tuple (A, T, R), where A is the A-Box: a set of assertional axioms; T is the T-Box: a set of concept (aka class/terminological) axioms; and R is the R-Box: a set of role (aka property/relation) axioms.

[As object properties (R-Box) são entre instâncias de entidades (dois itens de A-Box que estão relacionados a alguns tipo de T-BOx) e as data properties são entre entidades (um item de A-Box relacionado a tipo de T-BOx) e literais. O q teríamos para conectar itens da A-Box que são relacionamentos e outros itens como entidades e literais?]

....

Topic 4.1: OBDA on graphs

The area of Ontology-Based Data Access (OBDA) focuses on finding and implementing DL fragments for which query answering can be conducted over an ontology by means of a query rewriting strategy. Given an ontology and a graph query, this strategy involves extending the query such that, when it is evaluated over the base data, the solutions include those also given by entailments with respect to the ontology.

So what happens if the input query is not simply a graph pattern or a conjunctive query, but is rather a (complex) navigational graph pattern?

Many database systems, however, support features that go beyond first-order queries; for example, relational database engines support recursion, while graph database engines support path expressions. Such features of query engines have rarely exploited as targets for query rewriting strategies; exceptions include works on supporting transitive closure [63] and recursive rules [73] using SQL recursion, and studies of other forms of rewritability, such as rewriting to Datalog and Monadic Disjunctive Datalog [22].

This leaves the question: given a graph database system capable of answering (complex) navigational graph patterns – as popular in practice – what are the limits to the types of OBDA that the system can support through query rewriting? What kinds of input queries can be supported with respect to which ontology languages?

5 Rules

Another way to define the semantics of knowledge graphs is to use rules. Rules can be formalised in many ways – as Horn clauses, as Datalog, etc. – but in essence they consist of if-then statements, where if some condition holds, then some entailment holds. Here we define rules based on graph patterns.

[Isso o Daniel pediu para exercitar, no artigo dele tem regra com N3 Logic e RDF]

Definition 12 (Rule). A rule is a pair R := (B, H) such that B and H are graph patterns. The graph pattern B is called the body of the rule while H is called the head of the rule.

Given a graph G and a rule R = (B, H), we can then apply R over G by taking the union of μ(H) for all μ ∈ B(G). Typically we will assume that the set of variables used in H will be a subset of those used in B (V(H) ⊆ V(B)), which assures us that μ(H) will always result in a graph without variables. Given a set of rules, we can then apply each rule recursively, accumulating the results in the graph until a fixpoint is reached, thus enriching our graph with entailments.

[Regras para derivar mais conhecimento, entailments = implicações na regra SE-ENTÃO]

Topic 5.1: Existential/disjunctive rule mining

Given a diverse knowledge graph, such as Wikidata [67], manually defining all of the rules that may hold is a costly exercise. To help arrive at an initial set of rules in such cases, a number of rule mining techniques have been proposed that, given a graph, a certain support threshold, and a certain confidence threshold, will return rules that meet the defined thresholds; support is defined as the number of entailments given by the rule that are deemed correct, while confidence is the ratio of entailments given by the rule that are deemed correct.

[Identificar padrões no grafo, SE {X ... nós com determinadas relações, determinados atributos} então {Y ... criar uma relação, o tipo do nó é tal, etc ...}, isso poderia servir como uma engenharia reversa do modelo do KG]

Given that knowledge graphs are incomplete, while we may assume that the triples given by the graph are correct, it is not clear how we might know which triples are incorrect. A common heuristic is to apply a Partial Completeness Assumption (PCA) [24], where a triple (s, p, o) is considered correct if it appears in G, and incorrect if it does not appear in G but there exists a triple (s, p, o′) that does appear in G; other triples are ignored.

[Assumir CWA para uma parte do grafo para poder considerar as triplas corretas]

A seminal rule mining system along these lines is AMIE [24], which incrementally builds rules using a sequence of steps called “refinements”, filtering rules that do not meet the specified thresholds. A number of later systems support additional features in rules, such as forms of negation [32]. However, to the best of our knowledge, mining existential or disjunctive rules remains open for knowledge graphs.

6 Context

All of the data represented in Figure 1 holds true with respect to a given context: a given scope of truth.

[Definição de contexto depende de definição de VERDADE ... de uma verdade relativa já que não existiria verdade absoluta]

As an example of temporal context, the Sun will cease being a G-type Star and will eventually become a White Dwarf in its later years. We may also have spatial context (e.g., to state that something holds true in a particular region of space, such as a solar eclipse affecting parts of Earth), provenance (e.g., to state that something holds true with respect to a given source, such as the mass of a given planet), and so forth.

[Temporal, Espacial, Proveniência são dimensões contextuais. Quais outras?]

There are a variety of syntactic ways to embed contextual meta-data in a knowledge graph. Within del graphs, for example, the options include reification [18], n-ary relations [18], and singleton properties [52]. Context can also be represented syntactically using higher-arity models, such as property graphs [4], RDF* [30], or named graphs [18]. For further details on these options for representing context in graphs, we refer to the extended tutorial [35].

[Representar contexto já uma demanda para esses outros modelos, eu não estou criando uma demanda nova, se isso trouxe problema de eficiência é outra direção de pesquisa e não é o que estou tratando com a minha proposta. Essa referência é o survey de KG]

Aside from syntactic conventions for representing context in graphs, an important issue is with respect to how the semantics of context should be defined. A general contextual framework for graphs based on annotations has been proposed by Zimmermann et al. [76], based on the notion of an annotation domain.

[OLHAR essa referência também]

For example, we may define A to be powerset of numbers from {1, . . . , 365} indicating different days of the year. We may then annotate triples such as (Luna, occults, Sun) with a set of days in a given year – say {13, 245, 301} on which the given occultation will occur. Given another similar such triple – say (Luna, occults, Jupiter) annotated with {46, 245, 323} – we can define ⊕ to be ∪ and use it to find the annotation for days when Luna occults the Sun or Jupiter ({13, 46, 245, 301, 323}); we can further define ⊗ to be ∩ and use it to find the annotation for days when Luna occults the Sun and Jupiter ({245}). In this scenario, 0 would then refer to the empty set, while 1 would refer to the set of all days of the year.

Custom annotation domains can be defined for custom forms of context, and can be used in the context of querying and rules for computing the annotations corresponding to particular results and entailments.

[Isso é importante para minha proposta!!!!]

In the context of ontologies, a more declarative framework for handling context – called contextual knowledge repositories – was proposed by Homola and Serafini [37] based on Description Logics, with a related framework more recently proposed by Schuetz et al. [62] based on an OLAP-style abstraction. These frameworks can assign graphs into different hierarchical contexts, which can then be aggregated into higher-level contexts.

[O 37 eu já li, o 62 está na lista ...]

Topic 6.1: Complex contexts

Context – in the sense of the scope of truth – is an almost arbitrarily complex subject. To take some examples, context can be recurrent, relative, and conceptual. Such forms of context are not – to the best of our knowledge – well-supported by current contextual frameworks. 

In terms of recurrent context, most examples stem from temporal context, where we may state something that recurrently holds true of a given date of a year, or a given day of the week, which we may then wish, for example, to map to intervals on a non-recurrent temporal context. 

In terms of relative context, the Wikidata [67] knowledge graph defines a “truthy” context, which includes information that is not deprecated or superseded by other information; for example, a population reading for a city would be considered truthy if there is no other more recent population reading available. However, the contextual notion of “most recent” requires a relative assessment of context dependent on the other data available. 

Regarding conceptual context, the triple (Pluto, type, Dwarf Planet) only holds true since 2006. But unlike temporal context, it is the conceptualisation of a Planet, rather than Pluto itself, that has changed. This is an example of concept drift [70], where the meaning of domain terms can change over time, which in turn relates to the area of ontology evolution [75]. 

How we conceptualise a domain can thus also be contextual. The notion of context in knowledge graphs can then be arbitrarily complex, where more complex notions of context remain poorly understood.

[Isso é importante para a minha pesquisa]

7 Embeddings

In order to create more practical numeric representations of graphs for machine learning applications, knowledge graph embeddings aim to embed graphs within a low-dimensional, dense, numerical representation. There are many ways in which embeddings may be defined, but typically an embedding will map each node and each edge-label of a graph to an independent vector or matrix.

Definition 14 (Knowledge graph embedding). Given a del graph G, a knowledge graph embedding of G is a pair of mappings (ε, ρ) such that ε : NG → Rd and ρ : EG → Rd. Typically ε is known as an entity embedding, while ρ is known as a relation embedding. The knowledge graph embedding then consists of (typically low-dimensional) numeric representations of the node and edge-labels of the graph G, typically extracted such that the graph G can be (approximately) reconstructed from the embedding.

Definition 15 (Plausibility)
. A plausibility scoring function is a partial function φ : Rd × Rd × Rd → R[0,1]. Given a del graph G = (V, E, L), a triple (s, p, o) ∈ NG × EG × NG, and a knowledge graph embedding (ε, ρ) of G, the plausibility of (s, p, o) is given as φ(ε(s), ρ(p), ε(o)). Triples with scores closer to 1 are considered more plausible, while triples with scores closer to 0 are considered less plausible.

[Definição de KG embeddings e de completação de grafo, Predição de Link]

Once trained over G, knowledge graph embeddings directly allow for estimating the plausibility for triples of the form (s, p, o) ∈ NG × EG × NG (i.e., triples using some combination of terms found in G). This can be used for the purposes of link prediction,

In terms of open questions, a natural topic to explore is the combination of knowledge graph embeddings with ontologies and notions of entailment.

Topic 7.1: Semantic knowledge graph embeddings

The plausibility scoring function assigns 1 to triples deemed likely to be true, and 0 to triples deemed likely to be false. This function is learnt based on the triples found in G. But in the case that G contains ontological definitions, it may entail triples that are not explicitly stated. In such cases, many knowledge graph embeddings only consider the structure, rather than the semantics, of G, and may thus assign entailed-but-not-stated triples a score closer to 0. Ideally, a semantic knowledge graph embedding would assign a plausibility of (close to) 1 for triples that are entailed by G.
...
Hence a relevant topic is to explore how knowledge graph embeddings can be formulated and trained in order to find a numerical representation not only of the structure of a graph, but also its semantics, including potentially complex forms of entailment.

8 Graph Neural Networks

Rather than encoding the structure of graphs numerically, an alternative to enable learning over graphs is to design machine learning architectures that operate directly over the structure of a graph. A natural starting point is to consider neural networks, which already form graphs where nodes are (artificial) neurons, and edges are (artificial) axons that pass signals between neurons. Unlike graphs used to represent data, however, traditional neural networks tend to have a much more regular structure, being organised into layers, where all the nodes of one layer are connected pairwise to all the nodes of the next layer. To enable learning over graphs of data, an alternative approach is to define the structure of the neural network in terms of the structure of the graph. In this case, the nodes of the graph are interpreted as neurons, while edges are interpreted as axons. Thus nodes can pass signals to each other through edges towards solving a given task.

We first define a vector-labelled del graph, which serves as input to a GNN:

Definition 16 (Vector-labelled graph). A vector-labelled del graph Gλ is a pair Gλ := (G, λ) where G is a del graph, and λ : NG ∪ G → Ra is a vector labelling function.

Topic 8.1

Both GNNs and ontologies can be used to classify nodes. While GNNs can be used to perform classification based on numerical methods and inductive learning, ontologies can be used to perform classification based on symbolic methods and deductive reasoning. An interesting research question is then to understand how these two paradigms might relate in terms of expressiveness. For example, can GNNs potentially learn to make similar classifications based on similar input data as ontologies?

In fact, some progress has been made recently on this question.

This rather surprising correspondence between GNNs and ALCQ constitutes a bridge between deductive and inductive semantics for knowledge graphs. An interesting line of research is then to investigate GNN-style architectures that permit classification with respect to other DLs [10].

9 Conclusions

For discussion on these and other topics we refer to the extended tutorial [35]

[11] Bellomarini, L., Sallinger, E., Gottlob, G.: The Vadalog System: Datalog-based Reasoning for Knowledge Graphs. Proceedings of the VLDB Endowment 11(9), 975–987 (2018)

[17] Carral, D., Dragoste, I., González, L., Jacobs, C.J.H., Krötzsch, M., Urbani, J.: VLog: A Rule Engine for Knowledge Graphs. In: [25], pp. 19–35


[35] Hogan, A., Blomqvist, E., Cochez, M., d’Amato, C., de Melo, G., Gutierrez, C., Gayo, J.E.L., Kirrane, S., Neumaier, S., Polleres, A., Navigli, R., Ngomo, A.N., Rashid, S.M., Rula, A., Schmelzeisen, L., Sequeda, J.F., Staab, S., Zimmermann, A.: Knowledge graphs. CoRR abs/2003.02320 (2020), https://arxiv.org/abs/2003.02320

[37] Homola, M., Serafini, L.: Contextualized Knowledge Repositories for the Semantic Web. Journal of Web Semantics 12, 64–87 (2012)

[39] Krötzsch, M., Marx, M., Ozaki, A., Thost, V.: Attributed Description Logics: Reasoning on Knowledge Graphs. pp. 5309–5313 (IJCAI:2018), https://doi.org/10.24963/ijcai.2018/743

[76] Zimmermann, A., Lopes, N., Polleres, A., Straccia, U.: A General Framework for Representing, Reasoning and Querying with Annotated Semantic Web Data. Journal of Web Semantics 12, 72–95 (mar 2012)

Comentários

Postagens mais visitadas deste blog

Aula 12: WordNet | Introdução à Linguagem de Programação Python *** com NLTK

 Fonte -> https://youtu.be/0OCq31jQ9E4 A WordNet do Brasil -> http://www.nilc.icmc.usp.br/wordnetbr/ NLTK  synsets = dada uma palavra acha todos os significados, pode informar a língua e a classe gramatical da palavra (substantivo, verbo, advérbio) from nltk.corpus import wordnet as wn wordnet.synset(xxxxxx).definition() = descrição do significado É possível extrair hipernimia, hiponimia, antonimos e os lemas (diferentes palavras/expressões com o mesmo significado) formando uma REDE LEXICAL. Com isso é possível calcular a distância entre 2 synset dentro do grafo.  Veja trecho de código abaixo: texto = 'útil' print('NOUN:', wordnet.synsets(texto, lang='por', pos=wordnet.NOUN)) texto = 'útil' print('ADJ:', wordnet.synsets(texto, lang='por', pos=wordnet.ADJ)) print(wordnet.synset('handy.s.01').definition()) texto = 'computador' for synset in wn.synsets(texto, lang='por', pos=wn.NOUN):     print('DEF:',s...

truth makers AND truth bearers - Palestra Giancarlo no SBBD

Dando uma googada https://iep.utm.edu/truth/ There are two commonly accepted constraints on truth and falsehood:     Every proposition is true or false.         [Law of the Excluded Middle.]     No proposition is both true and false.         [Law of Non-contradiction.] What is the difference between a truth-maker and a truth bearer? Truth-bearers are either true or false; truth-makers are not since, not being representations, they cannot be said to be true, nor can they be said to be false . That's a second difference. Truth-bearers are 'bipolar,' either true or false; truth-makers are 'unipolar': all of them obtain. What are considered truth bearers?   A variety of truth bearers are considered – statements, beliefs, claims, assumptions, hypotheses, propositions, sentences, and utterances . When I speak of a fact . . . I mean the kind of thing that makes a proposition true or false. (Russe...

DGL-KE : Deep Graph Library (DGL)

Fonte: https://towardsdatascience.com/introduction-to-knowledge-graph-embedding-with-dgl-ke-77ace6fb60ef Amazon recently launched DGL-KE, a software package that simplifies this process with simple command-line scripts. With DGL-KE , users can generate embeddings for very large graphs 2–5x faster than competing techniques. DGL-KE provides users the flexibility to select models used to generate embeddings and optimize performance by configuring hardware, data sampling parameters, and the loss function. To use this package effectively, however, it is important to understand how embeddings work and the optimizations available to compute them. This two-part blog series is designed to provide this information and get you ready to start taking advantage of DGL-KE . Finally, another class of graphs that is especially important for knowledge graphs are multigraphs . These are graphs that can have multiple (directed) edges between the same pair of nodes and can also contain loops. The...