Pular para o conteĂºdo principal

Approximate Answering of Graph Queries - Leitura de Artigo

Arxiv - Pré print https://arxiv.org/abs/2308.06585

Abstract

Knowledge graphs (KGs) are inherently incomplete because of incomplete world knowledge and bias in what is the input to the KG. Additionally, world knowledge constantly expands and evolves, making existing facts deprecated or introducing new ones. However, we would still want to be able to answer queries as if the graph were complete. ...

 [Na abordagem do CKG partimos da premissa que os KG sĂ£o incompletos e assim consideramos um Mundo Aberto Dual onde os KGs sĂ£o compostos por alegações e nĂ£o fatos] 

1.1. Introduction

Due to the incomplete nature of knowledge graphs, a query answering system might not return a complete set of answers because the graph does not contain all information needed to perfectly match the conditions in a query. To address this setting, approximate query answering methods have been proposed, which attempt to provide likely answers to a query by learning from patterns in the graph and generalizing to queries that rely on missing edges.

[As regras de Contexto permitem inferir contexto implĂ­cito que nĂ£o estĂ¡ expresso no KG e o mapeamento permite deixar claro o que estĂ¡ faltando referente ao contexto em termos de qualificadores para arestas e tambĂ©m propriedades/relações para entidades ]

1.2. Definitions

[NĂ£o considera o modelo de multi camadas e nem o hiper-relacional para a definiĂ§Ă£o de KG]

Furthermore, approximate query answering techniques are often evaluated using queries where there is only one variable of interest, which is called the target variable.

[A abordagem de All contextualized (possible) answers nĂ£o tem esta limitaĂ§Ă£o]

Finally, we can define what we mean by Approximate Query Answering, which we base on the definition in [3]. Given an incomplete Knowledge Graph G = (V, E, L, l) (part of the not observable complete KG ˆG) and a query Q, with one of its variables t chosen as the target variable of the query, the complete set of answers T to this query is the set of all values for the mappings of t.

The task of Approximate Query Answering is to rank all nodes in V such that the elements of T, are at the top of the ranking. Since the given KG is incomplete, we cannot solve this problem directly as a graph-matching problem, as usually done in databases.

[A abordagem de All contextualized (possible) answers nĂ£o faz ranking de respostas, Ă© baseada em consultas BGP e NGP. Mas se a resposta nĂ£o existe, o mĂ¡ximo a ser feito Ă© responder isto!]

1.3. Evaluating Approximate Query Answering Performance

Generally, datasets for approximate graph query answering comprise an incomplete knowledge graph and pairs of test queries with their respective set of answer entities. Since the methods are developed to answer queries over incomplete knowledge graphs, these queries have some answers which cannot be obtained by a simple graph traversal but require implicit knowledge graph completion or reasoning. These answers are often called the hard answers, in contrast with the easy answers for this query, which can be found using graph traversal.

[Parte da premissa que o KG original que Ă© usado como dataset Ă© COMPLETO e a separaĂ§Ă£o de algumas arestas para treinamento o tornaria INCOMPLETO]


StarQE [3], for example, is made to answer queries with additional context information and uses patterns that include this.

TRABALHO RELACIONADO

Alivanistos D, Berrendorf M, Cochez M, et al. Query embedding on hyper-relational knowledge graphs.
In: 10th International Conference on Learning Representations, ICLR 2020, virtual, April 25-29, 2022.
OpenReview.net; 2022. Available from: https://openreview.net/forum?id=4rLw09TgRw9.

1.3.3. Metrics

During the evaluation, query answering is posed as a ranking task where scores are assigned to candidate entities (i.e., potential answers), with larger scores indicating preferred answers. Metrics are computed based on this ranking and the complete set of answer entities. It is worth noting that due to the construction of the datasets and the open-world nature of knowledge graphs, where missing links do not necessarily indicate that the fact is wrong, these complete answer sets may be subsets of the true answer set.

[OWA e as inferĂªncias sobre resposta. Rankear todos os vĂ©rtices que existem no KG ... mas a resposta pode nem ser um vĂ©rtice no prĂ³prio KG]

In general, there are two types of evaluation metrics: ...

[Sem Ranking (lista de respostas) e Com Ranking (lista de respostas ordenadas)]

1.4. Approaches with Pre-trained Graph Embeddings

Atomic Query Answering via Neural Link Predictors A neural link predictor is a differentiable model where atom arguments are first mapped into a k-dimensional embedding space and then used for producing a score for the atom. 

[PrediĂ§Ă£o de link para gerar as respostas aproximadas. Seria possĂ­vel inferir contexto implĂ­cito com link prediction tambĂ©m? Seria possĂ­vel inferir contexto incompleto com link prediction? Qual seria a influĂªncia das informações de contexto para Link Prediction?]

1.5. Projection Approaches

1.5.1. GQE

Graph Query Embedding (GQE) [17] was introduced as one of the first methods for approximate query answering over KGs. It is focused on conjunctive queries that can be represented as a Directed Acyclic Graph (DAG), which contains nodes for entities known in the query (called anchor nodes), variables, and a target node. GQE operates by learning an embedding for each entity in the graph, which is randomly initialized before training with query-answer pairs. For a given query, it performs a sequence of operations that follow the structure of its corresponding DAG. The operations start from the embeddings of the entities at the anchor nodes, and progressively apply projection and intersection operators until reaching the target node. 

Once the operations reach the target node, the result is an embedding called the query embedding. All parameters in GQE (i.e. the entity embeddings, and all parameters in the operators) are optimized such that the query embedding has a high dot product with the embedding of entities that are answers to the query, and a low dot product with embeddings of entities selected at random. As such, GQE requires generating a training dataset of millions of query-answer pairs with sufficient coverage for different query structures.

1.5.2. Query2Box

1.6. Message Passing Approaches

1.6.1. MPQE
While GQE [17] demonstrated the effectiveness of using embeddings for complex query answering, the presence of specific operators for projections and intersections, and the algorithm for obtaining the query embedding results in a limited set of query structures that can be embedded and a need for a large set of query-answer pairs

1.6.2. StarQE
Approximate query answering methods have also been extended to a type of graph known as hyper-relational knowledge graphs. Hyper-relational KGs allow for qualifying or quantifying relations between nodes.

StarQE is a recently proposed method for query answering over hyper-relational KGs based on a message-passing approach [3]. It relies on StarE [28], a specialized graph neural network to encode hyper-relational graphs, to obtain an embedding of hyper-relational query in a similar vein as in MPQE.
The authors show that the narrowing down of the result set leads to a substantial increase in performance over different query patterns. In their work, the authors introduced a new query dataset called WD50K-QE which is generated by querying a hyper-relational subset of Wikidata using various specialized logical query patterns.

1.8. Conclusion, Outlook, and Challenges

A further issue with the current state of methods is that they only support a limited set of operators. Besides, even when a large set is supported, there would still be queries which cannot be represented with them. A prominent example are queries which have cycles in them. While some of the methods (e.g., the message passing based ones) could in principle compute an embedding for these, there is little theoretical basis supporting the idea, and it has not been evaluated. It is clear that there is still a large scope for exploration left in this aspect. Besides this, all current methods require the queries to have at least one anchor node, while in general, queries can also consist of only variables.

Furthermore, a common challenge amongst the approaches introduced in this chapter appears when trying to incorporate literal values in the queries. Examples can be time, strings, numerical values indicating pressure or temperature, etc. These literals require different handling compared to entities. One of the issues is that the set of literals is infinite, rather than taken from a finite set. This aspect is important both when using these literal values as part of the query, part of the values bound to variables, and when the answer of the query is a literal value.

[Literais ainda nĂ£o sĂ£o tratados nestes abrodagens entĂ£o seriam somente para prediĂ§Ă£o de links e queries para relações e nĂ£o para propriedades]

[11] Bordes A, Usunier N, Garc´Ä±a-Dur´an A, et al. Translating embeddings for modeling multi-relational data. In: Burges CJC, Bottou L, Ghahramani Z, et al., editors. Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States; 2013. p. 2787–2795.

[26] Wu Z, Pan S, Chen F, et al. A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems. 2020;32(1):4–24.

[28] Galkin M, Trivedi P, Maheshwari G, et al. Message passing for hyper-relational knowledge graphs. In: Webber B, Cohn T, He Y, et al., editors. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020. Association for Computational Linguistics; 2020. p. 7346–7359.

ComentĂ¡rios

Postagens mais visitadas deste blog

Connected Papers: Uma abordagem alternativa para revisĂ£o da literatura

Durante um projeto de pesquisa podemos encontrar um artigo que nos identificamos em termos de problema de pesquisa e tambĂ©m de soluĂ§Ă£o. EntĂ£o surge a vontade de saber como essa Ă¡rea de pesquisa se desenvolveu atĂ© chegar a esse ponto ou quais desdobramentos ocorreram a partir dessa soluĂ§Ă£o proposta para identificar o estado da arte nesse tema. Podemos seguir duas abordagens:  realizar uma revisĂ£o sistemĂ¡tica usando palavras chaves que melhor caracterizam o tema em bibliotecas digitais de referĂªncia para encontrar artigos relacionados ou realizar snowballing ancorado nesse artigo que identificamos previamente, explorando os artigos citados (backward) ou os artigos que o citam (forward)  Mas a ferramenta Connected Papers propõe uma abordagem alternativa para essa busca. O problema inicial Ă© dado um artigo de interesse, precisamos encontrar outros artigos relacionados de "certa forma". Find different methods and approaches to the same subject Track down the state of the art rese...

Knowledge Graph Embedding with Triple Context - Leitura de Abstract

  Jun Shi, Huan Gao, Guilin Qi, and Zhangquan Zhou. 2017. Knowledge Graph Embedding with Triple Context. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM '17). Association for Computing Machinery, New York, NY, USA, 2299–2302. https://doi.org/10.1145/3132847.3133119 ABSTRACT Knowledge graph embedding, which aims to represent entities and relations in vector spaces, has shown outstanding performance on a few knowledge graph completion tasks. Most existing methods are based on the assumption that a knowledge graph is a set of separate triples, ignoring rich graph features, i.e., structural information in the graph. In this paper, we take advantages of structures in knowledge graphs, especially local structures around a triple, which we refer to as triple context. We then propose a Triple-Context-based knowledge Embedding model (TCE). For each triple, two kinds of structure information are considered as its context in the graph; one is the out...

KnOD 2021

Beyond Facts: Online Discourse and Knowledge Graphs A preface to the proceedings of the 1st International Workshop on Knowledge Graphs for Online Discourse Analysis (KnOD 2021, co-located with TheWebConf’21) https://ceur-ws.org/Vol-2877/preface.pdf https://knod2021.wordpress.com/   ABSTRACT Expressing opinions and interacting with others on the Web has led to the production of an abundance of online discourse data, such as claims and viewpoints on controversial topics, their sources and contexts . This data constitutes a valuable source of insights for studies into misinformation spread, bias reinforcement, echo chambers or political agenda setting. While knowledge graphs promise to provide the key to a Web of structured information, they are mainly focused on facts without keeping track of the diversity, connection or temporal evolution of online discourse data. As opposed to facts, claims are inherently more complex. Their interpretation strongly depends on the context and a vari...