Question Answering Over Knowledge Graphs: Question Understanding Via Template Decomposition - Leitura de Artigo
Zheng, Weiguo et al. “Question Answering Over Knowledge Graphs: Question Understanding Via Template Decomposition.” Proc. VLDB Endow. 11 (2018): 1373-1386.
Abstract
[Problema] The gap between unstructured natural language and structured data makes it challenging to build a system that supports using natural language to query large knowledge graphs. Many existing methods construct a structured query for the input question based on a syntactic parser. Once the input question is parsed incorrectly, a false structured query will be generated, which may result in false or incomplete answers. The problem gets worse especially for complex questions.
[Proposta de Solução] In this paper, we propose a novel systematic method to understand natural language questions by using a large number of binary templates rather than semantic parsers. As sufficient templates are critical in the procedure, we present a low-cost approach that can build a huge number of templates automatically. To reduce the search space, we carefully devise an index to facilitate the online template decomposition. Moreover, we design effective strategies to perform the two-level disambiguations (i.e., entity-level ambiguity and structure-level ambiguity) by considering the query semantics.
[Resultado] Extensive experiments over several benchmarks demonstrate that our proposed approach is effective as it significantly outperforms state-of-the-art methods in terms of both precision and recall
[Não é Keyword Search, é Q &A baseado em templates de perguntas]
1. INTRODUCTION
To generate a structured query, a widely used approach is parsing the input question into the syntactic dependency representation by employing some NLP tools... Based on the parsing result, a query skeleton is constructed. Once the question is parsed incorrectly, it will be difficult to produce the correct structured query.
[parse sintático de palavras chaves para mapear em elemento do KB/KG. Apresenta bons resultados para perguntas simples (BGP com somente um predicado?) e resultados fracos com perguntas complexas (BGP com mais de um predicado?) ]
1.1 Motivating Example
Recently, two template-based approaches have been proposed to deal with complex questions. The underlying intuition is that a complex question consists of multiple simple subquestions, where each simple subquestion involves only one relation that is mapped to a predicate in the knowledge graph.
[Perguntas complexas podem ser decompostas em perguntas simples. As respostas a essas perguntas podem formar um sub grafo conectado que responde a pergunta complexa]
[As perguntas são tipo: Quem, Quando, Onde, Qual, ...]
Specifically, we decompose the input question into a set of subquestions via templates, where these subquestions form a dependency graph. The template used in this paper consists of two parts: the question pattern and the SPARQL query pattern (as shown in the upper middle of Figure 2). We model the task of answering complex questions as a template selection and assembly problem: Given an input question and a set T of simple templates, we choose several templates from T to cover the input question and assemble them into a semantic dependency graph.
[Os templates são compostos pela pergunta e a query SPARQL que responde a pergunta]
1.2 Challenges and Contributions
... obtaining sufficient templates, especially complex templates (the templates contain multiple relations) is non-trivial....In this paper, we adopt simple templates (also called binary templates) and propose a novel approach to build templates by exploiting the knowledge graph and a large-scale text corpus.
[Geração dos templates com base no KG]
... By integrating the two procedures, we define a notion semantic dependency graph that is an intermediate representation between the input question and structured query, and design an effective strategy to decompose q in an iterative manner. Since the number of templates may be too large, to enhance the online user experience, we carefully devise an index, based on which the search space can be reduced significantly.
[Como selecionar os templates mais adequados para formar o grafo de dependência semântica]
To understand an input question, a huge challenge is to deal with the ambiguities generated during the question decomposition and the semantic dependency graph construction. ...We devise a systematic method that screens the decompositions by considering query semantics.
[Como lidar com a ambiguidade]
2. PRELIMINARY AND OVERVIEW
2.1 Problem Definition
[ KG em RDF]
[Perguntas simples são BGP com uma única tripla de resultado e perguntas complexas são BGP com mais de uma tripla]
Definition 3. (Binary template). A template t contains two parts of patterns: the natural language question pattern t.n and the SPARQL query pattern t.s, where the corresponding entities in t.n and t.s are replaced by their types (denoted by slots), and there are correspondences between the slots in t.n and t.s. A binary template refers to the template that contains just one fact.
[As perguntas possuem slots. Quem dirigiu o filme <X>? Qual filme foi dirigido por <Fulano>?]
Definition 4. (Semantic Dependency Graph). Given an input question q, its semantic dependency graph is a directed graph, denoted as SDG(q) = {Vq , Eq }, where each vertex v ∈ Vq represents an instantiated template, and there is an edge from v1 to v2 if the answer to v2 is v1’s value (for filling the slot in v1) or v1 and v2 share the common answer.
Problem Statement 1. Given a knowledge graph G and a text corpus D, we need to generate binary templates for G.
Problem Statement 2. Given a knowledge graph G, a set of templates T , and an input question q, the goal is to construct a semantic dependency graph for q.
2.2 Overview of Our Approach
2.2.1 Template Generation
[É feito offline usando um corpus de texto para encontrar padrões em linguagem natural que correspondam a triplas de G]
2.2.2 Question Decomposition
[Em tempo de consulta (online) é feita a decomposição da consulta q para selecionar os templates que correspondam aos trechos]
2.2.3 Structured Query Formulation
[Em tempo de consulta (online) é feita a construção do grafo de dependência semântica para a consulta q usando os templates selecionados]
3. AUTOMATIC TEMPLATE GENERATION
We propose a novel approach to build templates automatically that directly uses the knowledge graph and text corpus. It comprises three steps to complete template generation.
3.1 Natural Language Pattern Generation
The text corpus D is better to come from the same source as G. Otherwise, the generated templates may be bad since the entities in G may not be found in D, which will lead to a very limited number of templates ...
[O corpus de texto deve ter sido a fonte de geração do KG]
3.2 Relation Determination
After obtaining the natural language patterns, we should map them to the specific predicates (i.e., relations).
[Mapear com os predicados que irão gerar os BGPs]
3.3 Template Formulation
We need to transform the declarative statements into question form since the sentence patterns acquired from open-domain text corpus are likely to be declarative.
First, we determine the wh-word according to the entity type. If the target type is a descendent of “Person” in the ontology, “who” or “whose” is selected. If the target type is a descendent of “Location” in the ontology, “where” is chosen. If the target type is a time, “when” is used. In the remaining cases, “what” is used. In general, “which” can match any type. For the wh-word ‘who”, “whose”, “where”,
or “what”, we remove the corresponding type. For the wh-word “which”, we move the type behind “which”. Finally, the auxiliary verb “do”, “does”, or “can” is inserted into the sentence according to the subject of the sentence.
[Regra para identificar o tipo de pergunta de acordo com o tipo de objeto da tripla]
4. ONLINE QUESTION DECOMPOSITION
In the online phase, we need to build a structured query, i.e., SPARQL query, for the input question q. Moreover, the SPARQL query should comply with the schema and vocabulary that are used in G. In real scenario, q may be a simple question or a complex question.
[Identificar o tipo de pergunta para saber o algoritmo a ser aplicado]
In this paper, we shall exploit the widely used Jaccard similarity coefficient since it is highly related to several other similarity measures, e.g., string edit distance [50], Dice similarity [18], and Overlap similarity [41], as discussed in the work [41]. The template with the largest similarity score is selected, denoted by ti (lines 3-6).
[A similaridade entre a pergunta do template e a pergunta (decomposta) que é parte da pergunta original]
4.2 Answering Complex Questions
As shown in Definition 2, a complex question contains more than one fact, which can be classified into two categories: plain complex questions and “And-questions”. The “And-question” is a complex question that contains a conjunction, such as “and”, “or”, and “but”. The rest complex questions are called plain complex questions. ... a complex question can be decomposed into multiple simple questions. Hence, we need to retrieve a set S of templates from T to cover the semantics of q such that the templates in S form a semantic dependency graph.
<... pulei uma parte>
5 . STRUCTURED QUERY FORMULATION
To answer the questions, we need to build a semantic dependency graph SDG(q) for the input question q ...
5.1 Semantic Dependency Graph Construction
After acquiring decomposed subquestions and the corresponding templates, we can build the SDG(q) for q, which contains two steps, making connections (i.e., adding edges) between two templates and filling the slots in templates.
[Montar o grafo conectado e preencher os slots ]
5.2 Two-level Ambiguities
During the construction of semantic dependency graphs, a big challenge is to solve the ambiguities in the question decomposition. Generally, there are two kinds of ambiguities, i.e., entity-level ambiguity and structure-level ambiguity.
Rule-based Disambiguation. Entity-level ambiguity is a well-known problem, that is, a mention in the question may correspond to multiple distinct entities in the underlying knowledge graph.
Besides entity-level ambiguities, we identify structure-level ambiguities, which refers to the case that a question may be decomposed into different sets of subquestions.
Definition 7. (Weak Semantic Relevance). Weak semantic relevance means that the triples are not correlated, i.e., the generated SPARQL query is not a connected graph.
[Nem sempre é possível gerar um grafo conectado]
6. EXPERIMENTAL STUDY
6.1 Experimental Setup
6.1.1 Datasets
Knowledge Graphs. We use two well-known public knowledge graphs DBpedia [29] and Freebase [7] in the experiments.
Free text corpus. We use the free online encyclopedia Wikipedia as the text corpus, based on which we can build templates automatically.
QA datasets. We use the datasets QALD-5 [39], WebQuestions [5], and ComplexQuestions [1] in the experiments.
[Três elementos para realizar os experimentos: o KG + Corpus + Perguntas com Respostas]
6.2 Template Evaluation
To evaluate the quality of these templates quantitatively, we recruit five students to rate the templates ...
6.3 Effectiveness Evaluation
6.4 Efficiency Evaluation
7. RELATED WORK
Natural language question answering. ... They can be divided into the three categories.
Retrieval based methods. The methods integrate the techniques used in information extraction to answer natural language questions as it can provide the syntactic framework for pattern identification. The answer extraction module identifies all the matches for this pattern, based on which the final answers are extracted. ... Yao and Durme identify the subgraph, called topic graph, that consists of the entities contained in a question.
Semantic parsing and paraphrase based methods. They often adopt a domain independent meaning representation derived from the combinatory categorial grammar (CCG) ... Xu et al. propose an efficient pipeline framework to model a user’s query intention as a phrase level dependency DAG which is
then instantiated regarding a specific KB to construct the final structured query.
Recently, a general framework for learning paraphrases for question answering, called PARA4QA, is proposed, where paraphrase scoring and QA models are trained end-to-end on question-answer pairs ..
Template based methods. They transform the input question into a structured query by employing templates. Conducting the structured query leads to final answers. Unger et al. propose to rely on a parse of the question to produce a SPARQL template that directly mirrors the internal structure of the question .... String similarity search has been well studied there years .... Xiao et al. study the top-k set similarity join, that is, it returns the top-k pairs of records ranked by their similarities ...
For more discussions on question answering and string similarity search, please refer to two surveys [19] and [51].
[19] D. Diefenbach, V. Lopez, K. Singh, and P. Maret. Core techniques of question answering systems over knowledge bases: a survey. Knowledge Information Systems, (2):1–41, 2017.
[51] M. Yu, G. Li, D. Deng, and J. Feng. String similarity search and join: a survey. Frontiers of Computer Science, 10(3):399–417, 2016.
[Em relação a um KG hiper relacional essa abordagem não prevê consultas com contexto, explícito ou implícito. Por exemplo supondo a pergunta: Qual é a capital do Brasil? A resposta para o conjunto de fatos abaixo seria: Brasilia, Salvador e Rio de Janeiro. Mas o país só pode ter uma capital. Um pergunta desse tipo deveria considerar que se o contexto temporal não está explícitamente especificado, algum parâmetro deve ser considerado como default. Nesse caso o verbo ser no presente indica que se deseja obter a atual capital, Brasília. Se a pergunta fosse: Qual era a capital do Brasil na época da Revolução Industrial? a melhor resposta seria Rio de Janeiro e se a pergunta fosse: Qual a primeira capital do Brasil? a resposta seria Salvador.]
sujeito predicado objeto Data Início Data Fim
Brasilia capital do Brasil 1961
Salvador capital do Brasil 1549 1763
Rio de Janeiro capital do Brasil 1763 1961
Revolução Industrial instância de Período Histórico 1760 1820, 1840
Comentários
Postar um comentário
Sinta-se a vontade para comentar. Críticas construtivas são sempre bem vindas.