Pular para o conteúdo principal

Approximate Answering of Graph Queries - Leitura de Artigo

Arxiv - Pré print https://arxiv.org/abs/2308.06585

Abstract

Knowledge graphs (KGs) are inherently incomplete because of incomplete world knowledge and bias in what is the input to the KG. Additionally, world knowledge constantly expands and evolves, making existing facts deprecated or introducing new ones. However, we would still want to be able to answer queries as if the graph were complete. ...

 [Na abordagem do CKG partimos da premissa que os KG são incompletos e assim consideramos um Mundo Aberto Dual onde os KGs são compostos por alegações e não fatos] 

1.1. Introduction

Due to the incomplete nature of knowledge graphs, a query answering system might not return a complete set of answers because the graph does not contain all information needed to perfectly match the conditions in a query. To address this setting, approximate query answering methods have been proposed, which attempt to provide likely answers to a query by learning from patterns in the graph and generalizing to queries that rely on missing edges.

[As regras de Contexto permitem inferir contexto implícito que não está expresso no KG e o mapeamento permite deixar claro o que está faltando referente ao contexto em termos de qualificadores para arestas e também propriedades/relações para entidades ]

1.2. Definitions

[Não considera o modelo de multi camadas e nem o hiper-relacional para a definição de KG]

Furthermore, approximate query answering techniques are often evaluated using queries where there is only one variable of interest, which is called the target variable.

[A abordagem de All contextualized (possible) answers não tem esta limitação]

Finally, we can define what we mean by Approximate Query Answering, which we base on the definition in [3]. Given an incomplete Knowledge Graph G = (V, E, L, l) (part of the not observable complete KG ˆG) and a query Q, with one of its variables t chosen as the target variable of the query, the complete set of answers T to this query is the set of all values for the mappings of t.

The task of Approximate Query Answering is to rank all nodes in V such that the elements of T, are at the top of the ranking. Since the given KG is incomplete, we cannot solve this problem directly as a graph-matching problem, as usually done in databases.

[A abordagem de All contextualized (possible) answers não faz ranking de respostas, é baseada em consultas BGP e NGP. Mas se a resposta não existe, o máximo a ser feito é responder isto!]

1.3. Evaluating Approximate Query Answering Performance

Generally, datasets for approximate graph query answering comprise an incomplete knowledge graph and pairs of test queries with their respective set of answer entities. Since the methods are developed to answer queries over incomplete knowledge graphs, these queries have some answers which cannot be obtained by a simple graph traversal but require implicit knowledge graph completion or reasoning. These answers are often called the hard answers, in contrast with the easy answers for this query, which can be found using graph traversal.

[Parte da premissa que o KG original que é usado como dataset é COMPLETO e a separação de algumas arestas para treinamento o tornaria INCOMPLETO]


StarQE [3], for example, is made to answer queries with additional context information and uses patterns that include this.

TRABALHO RELACIONADO

Alivanistos D, Berrendorf M, Cochez M, et al. Query embedding on hyper-relational knowledge graphs.
In: 10th International Conference on Learning Representations, ICLR 2020, virtual, April 25-29, 2022.
OpenReview.net; 2022. Available from: https://openreview.net/forum?id=4rLw09TgRw9.

1.3.3. Metrics

During the evaluation, query answering is posed as a ranking task where scores are assigned to candidate entities (i.e., potential answers), with larger scores indicating preferred answers. Metrics are computed based on this ranking and the complete set of answer entities. It is worth noting that due to the construction of the datasets and the open-world nature of knowledge graphs, where missing links do not necessarily indicate that the fact is wrong, these complete answer sets may be subsets of the true answer set.

[OWA e as inferências sobre resposta. Rankear todos os vértices que existem no KG ... mas a resposta pode nem ser um vértice no próprio KG]

In general, there are two types of evaluation metrics: ...

[Sem Ranking (lista de respostas) e Com Ranking (lista de respostas ordenadas)]

1.4. Approaches with Pre-trained Graph Embeddings

Atomic Query Answering via Neural Link Predictors A neural link predictor is a differentiable model where atom arguments are first mapped into a k-dimensional embedding space and then used for producing a score for the atom. 

[Predição de link para gerar as respostas aproximadas. Seria possível inferir contexto implícito com link prediction também? Seria possível inferir contexto incompleto com link prediction? Qual seria a influência das informações de contexto para Link Prediction?]

1.5. Projection Approaches

1.5.1. GQE

Graph Query Embedding (GQE) [17] was introduced as one of the first methods for approximate query answering over KGs. It is focused on conjunctive queries that can be represented as a Directed Acyclic Graph (DAG), which contains nodes for entities known in the query (called anchor nodes), variables, and a target node. GQE operates by learning an embedding for each entity in the graph, which is randomly initialized before training with query-answer pairs. For a given query, it performs a sequence of operations that follow the structure of its corresponding DAG. The operations start from the embeddings of the entities at the anchor nodes, and progressively apply projection and intersection operators until reaching the target node. 

Once the operations reach the target node, the result is an embedding called the query embedding. All parameters in GQE (i.e. the entity embeddings, and all parameters in the operators) are optimized such that the query embedding has a high dot product with the embedding of entities that are answers to the query, and a low dot product with embeddings of entities selected at random. As such, GQE requires generating a training dataset of millions of query-answer pairs with sufficient coverage for different query structures.

1.5.2. Query2Box

1.6. Message Passing Approaches

1.6.1. MPQE
While GQE [17] demonstrated the effectiveness of using embeddings for complex query answering, the presence of specific operators for projections and intersections, and the algorithm for obtaining the query embedding results in a limited set of query structures that can be embedded and a need for a large set of query-answer pairs

1.6.2. StarQE
Approximate query answering methods have also been extended to a type of graph known as hyper-relational knowledge graphs. Hyper-relational KGs allow for qualifying or quantifying relations between nodes.

StarQE is a recently proposed method for query answering over hyper-relational KGs based on a message-passing approach [3]. It relies on StarE [28], a specialized graph neural network to encode hyper-relational graphs, to obtain an embedding of hyper-relational query in a similar vein as in MPQE.
The authors show that the narrowing down of the result set leads to a substantial increase in performance over different query patterns. In their work, the authors introduced a new query dataset called WD50K-QE which is generated by querying a hyper-relational subset of Wikidata using various specialized logical query patterns.

1.8. Conclusion, Outlook, and Challenges

A further issue with the current state of methods is that they only support a limited set of operators. Besides, even when a large set is supported, there would still be queries which cannot be represented with them. A prominent example are queries which have cycles in them. While some of the methods (e.g., the message passing based ones) could in principle compute an embedding for these, there is little theoretical basis supporting the idea, and it has not been evaluated. It is clear that there is still a large scope for exploration left in this aspect. Besides this, all current methods require the queries to have at least one anchor node, while in general, queries can also consist of only variables.

Furthermore, a common challenge amongst the approaches introduced in this chapter appears when trying to incorporate literal values in the queries. Examples can be time, strings, numerical values indicating pressure or temperature, etc. These literals require different handling compared to entities. One of the issues is that the set of literals is infinite, rather than taken from a finite set. This aspect is important both when using these literal values as part of the query, part of the values bound to variables, and when the answer of the query is a literal value.

[Literais ainda não são tratados nestes abrodagens então seriam somente para predição de links e queries para relações e não para propriedades]

[11] Bordes A, Usunier N, Garc´ıa-Dur´an A, et al. Translating embeddings for modeling multi-relational data. In: Burges CJC, Bottou L, Ghahramani Z, et al., editors. Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States; 2013. p. 2787–2795.

[26] Wu Z, Pan S, Chen F, et al. A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems. 2020;32(1):4–24.

[28] Galkin M, Trivedi P, Maheshwari G, et al. Message passing for hyper-relational knowledge graphs. In: Webber B, Cohn T, He Y, et al., editors. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020. Association for Computational Linguistics; 2020. p. 7346–7359.

Comentários

Postagens mais visitadas deste blog

Aprendizado de Máquina Relacional

 Extraído de -> https://www.lncc.br/~ziviani/papers/Texto-MC1-SBBD2019.pdf   Aprendizado de máquina relacional (AMR) destina-se à criação de modelos estatísticos para dados relacionais (seria o mesmo que dados conectados) , isto é, dados cuja a informação relacional é tão ou mais impor tante que a informação individual (atributos) de cada elemento.    Essa classe de aprendizado tem sido utilizada em diversas aplicações, por exemplo, na extração de informação de dados não estruturados [Zhang et al. 2016] e na modelagem de linguagem natural [Vu et al. 2018].   A adoção de técnicas de aprendizado de máquina relacional em tarefas de comple mentação de grafo de conhecimento se baseia na premissa de existência de regularidades semânticas presentes no mesmo . Modelos grafos probabilísticos  Baseada em regras / heurísticas que não podem garantir 100% de precisão no resultado da inferência mas os resultados podem ser explicados. Modelos de características de ...

Connected Papers: Uma abordagem alternativa para revisão da literatura

Durante um projeto de pesquisa podemos encontrar um artigo que nos identificamos em termos de problema de pesquisa e também de solução. Então surge a vontade de saber como essa área de pesquisa se desenvolveu até chegar a esse ponto ou quais desdobramentos ocorreram a partir dessa solução proposta para identificar o estado da arte nesse tema. Podemos seguir duas abordagens:  realizar uma revisão sistemática usando palavras chaves que melhor caracterizam o tema em bibliotecas digitais de referência para encontrar artigos relacionados ou realizar snowballing ancorado nesse artigo que identificamos previamente, explorando os artigos citados (backward) ou os artigos que o citam (forward)  Mas a ferramenta Connected Papers propõe uma abordagem alternativa para essa busca. O problema inicial é dado um artigo de interesse, precisamos encontrar outros artigos relacionados de "certa forma". Find different methods and approaches to the same subject Track down the state of the art rese...

Knowledge graphs: Introduction, history, and perspectives - Leitura de Artigo

Chaudhri, V. K., C. Baru, N. Chittar, X. L. Dong, M. Genesereth, J. Hendler, A. Kalyanpur, D. Lenat, J. Sequeda, D. Vrandečić, and K.Wang 2022. “ Knowledge graphs: Introduction, history, and perspectives. ” AI Magazine 43: 17–29. https://doi.org/10.1002/aaai.12033 Knowledge graphs (KGs) have emerged as a compelling abstraction for organizing the world’s structured knowledge and for integrating information extracted from multiple data sources. KNOWLEDGE GRAPH DEFINITION A KG is a directed labeled graph in which domain-specific meanings are associated with nodes and edges. [ Definição focado no COMO representar, diferente dos KBs ] There are multiple approaches for associating meanings with the nodes and edges. At the simplest level, the meanings could be stated as documentation strings expressed in a human understandable language such as English. At a computational level, the meanings can be expressed in a formal specification language such as first-order logic. An active area of curren...