Pular para o conteúdo principal

Knowledge Graph Question Answering with Ambiguous Query - Leitura de artigo

https://dl.acm.org/doi/pdf/10.1145/3543507.3583316

Lihui Liu, Yuzhong Chen, Mahashweta Das, Hao Yang, and Hanghang Tong. 2023. Knowledge Graph Question Answering with Ambiguous Query. In Proceedings of the ACM Web Conference 2023 (WWW '23). Association for Computing Machinery, New York, NY, USA, 2477–2486. https://doi.org/10.1145/3543507.3583316

Abs -> 1 -> 6 -> 2 -> 5 -> 4 -> 3

ABSTRACT

In the vast majority of the existing works, the input queries are considered perfect and can precisely express the user’s query intention. However, in reality, input queries might be ambiguous and elusive which only contain a limited amount of information.

[Consultas em palavras chaves? Consultas como perguntas "completas"? Consultas em liguagem GQL?]
[Na minha pesquisa consideramos que as consultas, de qualquer tipo, são potencialmente incompletas em relação ao contexto (implícito) uma vez que o próprio usuário / aplicação pode desconhecer o contexto que se aplica ás alegações de interesse. Mas as respostas serão o mais contextualizadas possível através do mapeamento do contexto explícito e das regras para inferir contexto implícito]

In this paper, we propose PReFNet which focuses on answering ambiguous queries with pseudo relevance feedback on knowledge graphs. In order to leverage the hidden (pseudo) relevance information existed in the results that are initially returned from a given query, .... The inferred high quality queries will be returned to the users to help them search with ease.

[Seria reescrita ou expansão de consulta para sugerir ao usuário? Reescrita com geração de respostas]

1 INTRODUCTION

A knowledge graph is a graph data structure which contains a multitude of triples denoting real world facts.

[Definição de KG para a pesquisa deles. Não considera Dual OWA, são fatos e não alegações, não cita contexto, baseado em triplas]

Despite the great progress, most works focus on answering defectless queries on knowledge graphs. These queries are assumed to be perfect and can precisely express users’ query intentions. However, this is not true most of the time in real cases for the following reasons. (1) First, the vocabulary of diferent users can vary dramatically. According to a prominent study on the human vocabulary problem [8], about 80-90% of the times two persons will give diferent representations when they are asked to name the same concept [21]. This means the input queries of diferent users could be very diferent from each other. (2) Second, some KGQA methods (e.g., [28] [4]) need to transform the natural language questions to graph queries, and then search the results according to these query graphs. The transformation algorithm may generate queries with inaccurate graph structure. Last but not the least, allowing users to input query graphs directly may introduce additional structural noise or inaccuracy due to their lack of full background knowledge of the underlying KG [21].

[A questão da incompatibilidade terminológica e o processo de conversão da necessidade de informação, pergunta para a query em grafo está fora do escopo da pesquisa (1) e (2). O possível desconhecimento do usuário em relação ao contexto ao formular a consulta é atendido com a abordagem de "Melhor" Resposta Possível uma vez que as alegações da respostas são contextualizadas com o contexto explícito e implícito (inferido)]

To address these issues, query ambiguity and vagueness need to be correctly resolved, which in turn requires new information in addition to the query itself. Relevance feedback (short for ReF) is one promising solution. The general idea behind relevance feedback is to take the results that are initially returned from a given query, to gather user feedback, and to use information about whether or not those results are relevant to form a new query. ... Finally, the newly inferred queries will be used to re-rank the original candidate answers.

[Reescrita mas não tem interação do usuário para a resposta pois é pseudo]

2 PROBLEM DEFINITION

Knowledge graph question answering aims to answer a question with the help of knowledge graphs. According to the study in [21], most users formulate queries using their own knowledge and vocabulary during the search process. They might not have a fairly good understanding of the underlying data schema and the knowledge graph structure. This means that the users’ true intentions behind the queries may be frequently misinterpreted or misrepresented.

[A semântica da conversão da necessidade de informação em uma query]

In this paper, we focus on answering ambiguous one-hop question over knowledge graph. We assume the input ambiguous query Q contains a topic/anchor entity vq (pertence) V and a sequence of words Q = (q1, q2, ..., qn). Ideally, each question can be mapped to a unique relation rq in the knowledge graph. The goal of question answering over knowledge graph is to identify a set of nodes aq (pertence) V which can answer the ambiguous question. We assume that all the answer entities exist in the knowledge graph, each question only contains a single topic/anchor entity vq (pertence) V and vq is given.

[Premissas: (1) one-hop query, (2) possui um tópico ou entidade ancora vq em cada query Q que é conhecido (não precisa ser descoberto/mapeado) e (3) cada pergunta pode ser mapeada unicamente a uma relação rq do KG ]

Problem Definition. Answering Ambiguous Query:
Given: (1) A knowledge graph G, (2) an ambiguous one-hop natural language question;
Output: (1) The answer of the question, (2) Top-k most likely correct query relations of the input query.

[Dado um KG H potencialmente incompleto em relação ao contexto e uma consulta em grafo potencialmente incompleta em relação ao contexto, Recuperar um conjunto de alegações contextualizadas que respondam a consulta]

3 PROPOSED METHOD

3.1 Model Overview

Given the ambiguous query and its anchor entity, we give the following lemma to decompose the problem of question answering over knowledge graph (KGQA).

Lemma 1. (KGQA Decomposition) Given an ambiguous query Q and its anchor node vq , let Pr (T|Q, vq) denote the probability that query relation T is generated from Q and let Pr (a|Q, vq) denote the probability that candidate answer a found by T is the true answer

... the main idea of question answering over knowledge graph. The KGQA system first transforms the input natural language question Q to a high quality query relation T, then finds the answer according to T and the anchor node vq.

3.2 Query Inference: Posterior of True Query

When the input query is a one-hop query, this problem is equivalent to the link prediction problem in the knowledge graph.

[Reuso de link prediction em graph embeddings]

3.3 Query Inference Training

3.4 Query Ranking: Likelihood of Ambiguous Query

3.5 Answering Re-ranking

4 EXPERIMENTS

4.1 Experimental Setting

WebQuestionsSP - Freebase.
SimpleQuestions - Freebase.
MetaQA - domain KG contains information about directors, movies, genres and actors.

[Não usa WD]

In the experiment, we test the efectiveness of PReFNet on complete KG and incomplete KG with 50% and 20% missing edges respectively. All missing edges are randomly deleted.

[KG incompleto]

We test the query ranking performance on 4 baselines

[Rankeamento das consultas geradas]

We test question answering performance on 3 baselines

[Comparação para as respostas geradas]

4.2 Performance of Query Ranking

Traditional KBQA methods usually transform the natural language query to a query graph, and then find the answer according to the query graph. However, because of the ambiguity in the input query, the generated query graph is usually inaccurate. The pseudo relevance feedback, on the other hand, can infer queries according to the top candidate answers.

[Transformar NL em GQL]

4.3 Performance of Question Answering

Among all the methods, EmbedKGQA with relation matching can achieve the highest accuracy. PReFNet further increases the accuracy by 1% on average.

[O ganho é pequeno, o esforço compensa?]

4.4 Efciency

4.5 Ablation Study
A - Query Inference. In this subsection, we show the efectiveness of the query inference module. We first pretrain the module only on the background knowledge graph of each dataset, and then retrain the module on the question training dataset.

B - Query Ranking.
In this subsection, we show the efectiveness of the query ranking module. Some examples are shown in Table 7. As we can see, when the input question is ambiguous, it is very hard to correctly predict its true query intention.

C - Question Answering
More specifcally, the relation matching process of EmbedKGQA fnds the shortest path between the anchor node vq and candidate answer a, and order the candidate answers according to their shortest paths.

5 RELATED WORK

5.1 Knowledge Graph Question Answering

Knowledge graph has many applications [2, 3 , 6, 7, 9, 12, 15, 19, 24 –26]. Among them, knowledge Graph Question Answering has been studied for a long time. When the input query is a natural language sentence, a general strategy to answer the question is to transform the question to a query graph, and search the answer according to the query graph.
For example, in [28], Xi et.al. propose a model which contains candidate query graphs ranking component and a true query graph generation component. By iteratively updating these two components, both components’ performance can be improved. The query graph is finally generated by the second component and can be used to search the KG.
In [14], Liu et.al. propose a multi-task model to tackle KGQA and KGC at the same time.
Other methods, e.g., [20], [16], directly learn an embedding from the natural language sentence and search answers in the embedding space. When the input query is a graph query, [18] models diferent operations in the query graph as diferent neural network and transform the query process to an entity search problem in the embedding space. In principle, all of them can be used as the query system in our method. That is, the top-k answers of these methods can be treated as the pseudo relevance feedback of our method.

[Mapear na query GQL: pode seguir abordagens de KGQA existentes. Do mesmo modo para a "Melhor" Resposta Possível porém seria sempre sem o contexto (?)]

5.2 Relevance Feedback

Relevance feedback is a widely studied topic in Information Retrieval. However, it has not been well studied for graph data. In [21], Su et.al. use relevance feedback to infer additional information and use them to enrich the query. The original ranking function is re-tuned according to the results in relevance feedback. In [11], Matteo et.al. concentrate on assisting the user by expanding the query according to the additional information in relevance feed-back to provide a more informative (full) query that can retrieve more detailed and relevant answers. However, diferent from our work which aims to infer the true intention of users, they expand the query graph at each round until they find the answer. In other words, the setting is diferent.

[Por isto este trabalho não é expansão, é reescrita que recupera um conjunto de respostas relevantes]

5.3 Variational Inference

The goal of variational inference is to approximate difcult-to-compute posterior density. In [29], Zhang et.al. treat the topic entity in the input question as a latent variable and utilize variational reasoning network to handle noise in questions, and learn multi-hop reasoning simultaneously. In [17], Qu et.al. propose a probabilistic model called RNNLogic which treats logic rules as latent variables, and simultaneously trains a rule generator as well as a reasoning predictor with logic rules. These logic rules are similar to the latent paths in our model. There are many other works using variational inference. Diferent from these works, we are the frst to utilize variational inference in relevance feedback on graph data.

6 CONCLUSION

[O problema tem relação com o nosso pq a query é incompleta, vaga, ambígua. Mas só estamos tratando este aspecto no contexto e não em outras partes da consulta. Mas seria possível a partir da query (sem contexto - one-hop) e das top-k respostas geradas por este método, acrescentar o tratamento do contexto para produzir a melhor resposta, identificando o contexto explícito do KG e o implícito e inferido pelas regras]

 

Comentários

Postagens mais visitadas deste blog

Aula 12: WordNet | Introdução à Linguagem de Programação Python *** com NLTK

 Fonte -> https://youtu.be/0OCq31jQ9E4 A WordNet do Brasil -> http://www.nilc.icmc.usp.br/wordnetbr/ NLTK  synsets = dada uma palavra acha todos os significados, pode informar a língua e a classe gramatical da palavra (substantivo, verbo, advérbio) from nltk.corpus import wordnet as wn wordnet.synset(xxxxxx).definition() = descrição do significado É possível extrair hipernimia, hiponimia, antonimos e os lemas (diferentes palavras/expressões com o mesmo significado) formando uma REDE LEXICAL. Com isso é possível calcular a distância entre 2 synset dentro do grafo.  Veja trecho de código abaixo: texto = 'útil' print('NOUN:', wordnet.synsets(texto, lang='por', pos=wordnet.NOUN)) texto = 'útil' print('ADJ:', wordnet.synsets(texto, lang='por', pos=wordnet.ADJ)) print(wordnet.synset('handy.s.01').definition()) texto = 'computador' for synset in wn.synsets(texto, lang='por', pos=wn.NOUN):     print('DEF:',s

truth makers AND truth bearers - Palestra Giancarlo no SBBD

Dando uma googada https://iep.utm.edu/truth/ There are two commonly accepted constraints on truth and falsehood:     Every proposition is true or false.         [Law of the Excluded Middle.]     No proposition is both true and false.         [Law of Non-contradiction.] What is the difference between a truth-maker and a truth bearer? Truth-bearers are either true or false; truth-makers are not since, not being representations, they cannot be said to be true, nor can they be said to be false . That's a second difference. Truth-bearers are 'bipolar,' either true or false; truth-makers are 'unipolar': all of them obtain. What are considered truth bearers?   A variety of truth bearers are considered – statements, beliefs, claims, assumptions, hypotheses, propositions, sentences, and utterances . When I speak of a fact . . . I mean the kind of thing that makes a proposition true or false. (Russell, 1972, p. 36.) “Truthmaker theories” hold that in order for any truthbe

DGL-KE : Deep Graph Library (DGL)

Fonte: https://towardsdatascience.com/introduction-to-knowledge-graph-embedding-with-dgl-ke-77ace6fb60ef Amazon recently launched DGL-KE, a software package that simplifies this process with simple command-line scripts. With DGL-KE , users can generate embeddings for very large graphs 2–5x faster than competing techniques. DGL-KE provides users the flexibility to select models used to generate embeddings and optimize performance by configuring hardware, data sampling parameters, and the loss function. To use this package effectively, however, it is important to understand how embeddings work and the optimizations available to compute them. This two-part blog series is designed to provide this information and get you ready to start taking advantage of DGL-KE . Finally, another class of graphs that is especially important for knowledge graphs are multigraphs . These are graphs that can have multiple (directed) edges between the same pair of nodes and can also contain loops. The