Arxiv - Pré print https://arxiv.org/abs/2308.06585
Abstract
Knowledge graphs (KGs) are inherently incomplete because of incomplete world knowledge and bias in what is the input to the KG. Additionally, world knowledge constantly expands and evolves, making existing facts deprecated or introducing new ones. However, we would still want to be able to answer queries as if the graph were complete. ...
[Na abordagem do CKG partimos da premissa que os KG sĂ£o incompletos e assim consideramos um Mundo Aberto Dual onde os KGs sĂ£o compostos por alegações e nĂ£o fatos]
1.1. Introduction
Due to the incomplete nature of knowledge graphs, a query answering system might not return a complete set of answers because the graph does not contain all information needed to perfectly match the conditions in a query. To address this setting, approximate query answering methods have been proposed, which attempt to provide likely answers to a query by learning from patterns in the graph and generalizing to queries that rely on missing edges.
[As regras de Contexto permitem inferir contexto implĂcito que nĂ£o estĂ¡ expresso no KG e o mapeamento permite deixar claro o que estĂ¡ faltando referente ao contexto em termos de qualificadores para arestas e tambĂ©m propriedades/relações para entidades ]
1.2. Definitions
[NĂ£o considera o modelo de multi camadas e nem o hiper-relacional para a definiĂ§Ă£o de KG]
Furthermore, approximate query answering techniques are often evaluated using queries where there is only one variable of interest, which is called the target variable.
[A abordagem de All contextualized (possible) answers nĂ£o tem esta limitaĂ§Ă£o]
Finally, we can define what we mean by Approximate Query Answering, which we base on the definition in [3]. Given an incomplete Knowledge Graph G = (V, E, L, l) (part of the not observable complete KG ˆG) and a query Q, with one of its variables t chosen as the target variable of the query, the complete set of answers T to this query is the set of all values for the mappings of t.
The task of Approximate Query Answering is to rank all nodes in V such that the elements of T, are at the top of the ranking. Since the given KG is incomplete, we cannot solve this problem directly as a graph-matching problem, as usually done in databases.
[A abordagem de All contextualized (possible) answers nĂ£o faz ranking de respostas, Ă© baseada em consultas BGP e NGP. Mas se a resposta nĂ£o existe, o mĂ¡ximo a ser feito Ă© responder isto!]
1.3. Evaluating Approximate Query Answering Performance
Generally, datasets for approximate graph query answering comprise an incomplete knowledge graph and pairs of test queries with their respective set of answer entities. Since the methods are developed to answer queries over incomplete knowledge graphs, these queries have some answers which cannot be obtained by a simple graph traversal but require implicit knowledge graph completion or reasoning. These answers are often called the hard answers, in contrast with the easy answers for this query, which can be found using graph traversal.
[Parte da premissa que o KG original que Ă© usado como dataset Ă© COMPLETO e a separaĂ§Ă£o de algumas arestas para treinamento o tornaria INCOMPLETO]
StarQE [3], for example, is made to answer queries with additional context information and uses patterns that include this.
TRABALHO RELACIONADO
Alivanistos D, Berrendorf M, Cochez M, et al. Query embedding on hyper-relational knowledge graphs.
In: 10th International Conference on Learning Representations, ICLR 2020, virtual, April 25-29, 2022.
OpenReview.net; 2022. Available from: https://openreview.net/forum?id=4rLw09TgRw9.
1.3.3. Metrics
During the evaluation, query answering is posed as a ranking task where scores are assigned to candidate entities (i.e., potential answers), with larger scores indicating preferred answers. Metrics are computed based on this ranking and the complete set of answer entities. It is worth noting that due to the construction of the datasets and the open-world nature of knowledge graphs, where missing links do not necessarily indicate that the fact is wrong, these complete answer sets may be subsets of the true answer set.
[OWA e as inferĂªncias sobre resposta. Rankear todos os vĂ©rtices que existem no KG ... mas a resposta pode nem ser um vĂ©rtice no prĂ³prio KG]
In general, there are two types of evaluation metrics: ...
[Sem Ranking (lista de respostas) e Com Ranking (lista de respostas ordenadas)]
1.4. Approaches with Pre-trained Graph Embeddings
Atomic Query Answering via Neural Link Predictors A neural link predictor is a differentiable model where atom arguments are first mapped into a k-dimensional embedding space and then used for producing a score for the atom.
[PrediĂ§Ă£o de link para gerar as respostas aproximadas. Seria possĂvel inferir contexto implĂcito com link prediction tambĂ©m? Seria possĂvel inferir contexto incompleto com link prediction? Qual seria a influĂªncia das informações de contexto para Link Prediction?]
1.5. Projection Approaches
1.5.1. GQE
Graph Query Embedding (GQE) [17] was introduced as one of the first methods for approximate query answering over KGs. It is focused on conjunctive queries that can be represented as a Directed Acyclic Graph (DAG), which contains nodes for entities known in the query (called anchor nodes), variables, and a target node. GQE operates by learning an embedding for each entity in the graph, which is randomly initialized before training with query-answer pairs. For a given query, it performs a sequence of operations that follow the structure of its corresponding DAG. The operations start from the embeddings of the entities at the anchor nodes, and progressively apply projection and intersection operators until reaching the target node.
Once the operations reach the target node, the result is an embedding called the query embedding. All parameters in GQE (i.e. the entity embeddings, and all parameters in the operators) are optimized such that the query embedding has a high dot product with the embedding of entities that are answers to the query, and a low dot product with embeddings of entities selected at random. As such, GQE requires generating a training dataset of millions of query-answer pairs with sufficient coverage for different query structures.
1.5.2. Query2Box
1.6. Message Passing Approaches
1.6.1. MPQE
While GQE [17] demonstrated the effectiveness of using embeddings for complex query answering, the presence of specific operators for projections and intersections, and the algorithm for obtaining the query embedding results in a limited set of query structures that can be embedded and a need for a large set of query-answer pairs
1.6.2. StarQE
Approximate query answering methods have also been extended to a type of graph known as hyper-relational knowledge graphs. Hyper-relational KGs allow for qualifying or quantifying relations between nodes.
StarQE is a recently proposed method for query answering over hyper-relational KGs based on a message-passing approach [3]. It relies on StarE [28], a specialized graph neural network to encode hyper-relational graphs, to obtain an embedding of hyper-relational query in a similar vein as in MPQE.
The authors show that the narrowing down of the result set leads to a substantial increase in performance over different query patterns. In their work, the authors introduced a new query dataset called WD50K-QE which is generated by querying a hyper-relational subset of Wikidata using various specialized logical query patterns.
1.8. Conclusion, Outlook, and Challenges
A further issue with the current state of methods is that they only support a limited set of operators. Besides, even when a large set is supported, there would still be queries which cannot be represented with them. A prominent example are queries which have cycles in them. While some of the methods (e.g., the message passing based ones) could in principle compute an embedding for these, there is little theoretical basis supporting the idea, and it has not been evaluated. It is clear that there is still a large scope for exploration left in this aspect. Besides this, all current methods require the queries to have at least one anchor node, while in general, queries can also consist of only variables.
Furthermore, a common challenge amongst the approaches introduced in this chapter appears when trying to incorporate literal values in the queries. Examples can be time, strings, numerical values indicating pressure or temperature, etc. These literals require different handling compared to entities. One of the issues is that the set of literals is infinite, rather than taken from a finite set. This aspect is important both when using these literal values as part of the query, part of the values bound to variables, and when the answer of the query is a literal value.
[Literais ainda nĂ£o sĂ£o tratados nestes abrodagens entĂ£o seriam somente para prediĂ§Ă£o de links e queries para relações e nĂ£o para propriedades]
[11] Bordes A, Usunier N, Garc´Ä±a-Dur´an A, et al. Translating embeddings for modeling multi-relational data. In: Burges CJC, Bottou L, Ghahramani Z, et al., editors. Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States; 2013. p. 2787–2795.
[26] Wu Z, Pan S, Chen F, et al. A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems. 2020;32(1):4–24.
[28] Galkin M, Trivedi P, Maheshwari G, et al. Message passing for hyper-relational knowledge graphs. In: Webber B, Cohn T, He Y, et al., editors. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020. Association for Computational Linguistics; 2020. p. 7346–7359.
ComentĂ¡rios
Postar um comentĂ¡rio
Sinta-se a vontade para comentar. CrĂticas construtivas sĂ£o sempre bem vindas.