Pular para o conteúdo principal

Question Answering Over Knowledge Graphs: Question Understanding Via Template Decomposition - Leitura de Artigo

Zheng, Weiguo et al. “Question Answering Over Knowledge Graphs: Question Understanding Via Template Decomposition.” Proc. VLDB Endow. 11 (2018): 1373-1386.

Abstract

[Problema] The gap between unstructured natural language and structured data makes it challenging to build a system that supports using natural language to query large knowledge graphs. Many existing methods construct a structured query for the input question based on a syntactic parser. Once the input question is parsed incorrectly, a false structured query will be generated, which may result in false or incomplete answers. The problem gets worse especially for complex questions.

[Proposta de Solução] In this paper, we propose a novel systematic method to understand natural language questions by using a large number of binary templates rather than semantic parsers. As sufficient templates are critical in the procedure, we present a low-cost approach that can build a huge number of templates automatically. To reduce the search space, we carefully devise an index to facilitate the online template decomposition. Moreover, we design effective strategies to perform the two-level disambiguations (i.e., entity-level ambiguity and structure-level ambiguity) by considering the query semantics.

[Resultado] Extensive experiments over several benchmarks demonstrate that our proposed approach is effective as it significantly outperforms state-of-the-art methods in terms of both precision and recall

[Não é Keyword Search, é Q &A baseado em templates de perguntas]

1. INTRODUCTION

To generate a structured query, a widely used approach is parsing the input question into the syntactic dependency representation by employing some NLP tools... Based on the parsing result, a query skeleton is constructed. Once the question is parsed incorrectly, it will be difficult to produce the correct structured query. 

[parse sintático de palavras chaves para mapear em elemento do KB/KG. Apresenta bons resultados para perguntas simples (BGP com somente um predicado?) e resultados fracos com perguntas complexas (BGP com mais de um predicado?) ]

1.1 Motivating Example

Recently, two template-based approaches have been proposed to deal with complex questions. The underlying intuition is that a complex question consists of multiple simple subquestions, where each simple subquestion involves only one relation that is mapped to a predicate in the knowledge graph.   

[Perguntas complexas podem ser decompostas em perguntas simples. As respostas a essas perguntas podem formar um sub grafo conectado que responde a pergunta complexa]

 

[As perguntas são tipo: Quem, Quando, Onde, Qual, ...]

Specifically, we decompose the input question into a set of subquestions via templates, where these subquestions form a dependency graph. The template used in this paper consists of two parts: the question pattern and the SPARQL query pattern (as shown in the upper middle of Figure 2). We model the task of answering complex questions as a template selection and assembly problem: Given an input question and a set T of simple templates, we choose several templates from T to cover the input question and assemble them into a semantic dependency graph. 

[Os templates são compostos pela pergunta e a query SPARQL que responde a pergunta]

1.2 Challenges and Contributions

... obtaining sufficient templates, especially complex templates (the templates contain multiple relations) is non-trivial....In this paper, we adopt simple templates (also called binary templates) and propose a novel approach to build templates by exploiting the knowledge graph and a large-scale text corpus.

[Geração dos templates com base no KG]

... By integrating the two procedures, we define a notion semantic dependency graph that is an intermediate representation between the input question and structured query, and design an effective strategy to decompose q in an iterative manner. Since the number of templates may be too large, to enhance the online user experience, we carefully devise an index, based on which the search space can be reduced significantly. 

[Como selecionar os templates mais adequados para formar o grafo de dependência semântica]

To understand an input question, a huge challenge is to deal with the ambiguities generated during the question decomposition and the semantic dependency graph construction. ...We devise a systematic method that screens the decompositions by considering query semantics. 

[Como lidar com a ambiguidade]

2. PRELIMINARY AND OVERVIEW

2.1 Problem Definition

[ KG em RDF]

[Perguntas simples são BGP com uma única tripla de resultado e perguntas complexas são BGP com mais de uma tripla]

Definition 3. (Binary template). A template t contains two parts of patterns: the natural language question pattern t.n and the SPARQL query pattern t.s, where the corresponding entities in t.n and t.s are replaced by their types (denoted by slots), and there are correspondences between the slots in t.n and t.s. A binary template refers to the template that contains just one fact.

[As perguntas possuem slots. Quem dirigiu o filme <X>? Qual filme foi dirigido por <Fulano>?]

Definition 4. (Semantic Dependency Graph). Given an input question q, its semantic dependency graph is a directed graph, denoted as SDG(q) = {Vq , Eq }, where each vertex v Vq represents an instantiated template, and there is an edge from v1 to v2 if the answer to v2 is v1’s value (for filling the slot in v1) or v1 and v2 share the common answer.

Problem Statement 1. Given a knowledge graph G and a text corpus D, we need to generate binary templates for G.

Problem Statement 2. Given a knowledge graph G, a set of templates T , and an input question q, the goal is to construct a semantic dependency graph for q. 

2.2 Overview of Our Approach

  

2.2.1 Template Generation

[É feito offline usando um corpus de texto para encontrar padrões em linguagem natural que correspondam a triplas de G]

2.2.2 Question Decomposition

[Em tempo de consulta (online) é feita a decomposição da consulta q para selecionar os templates que correspondam aos trechos] 

2.2.3 Structured Query Formulation

[Em tempo de consulta (online) é feita a construção do grafo de dependência semântica para a consulta q usando os templates selecionados] 

3. AUTOMATIC TEMPLATE GENERATION

We propose a novel approach to build templates automatically that directly uses the knowledge graph and text corpus. It comprises three steps to complete template generation. 

3.1 Natural Language Pattern Generation

The text corpus D is better to come from the same source as G. Otherwise, the generated templates may be bad since the entities in G may not be found in D, which will lead to a very limited number of templates ...

[O corpus de texto deve ter sido a fonte de geração do KG]

  

3.2 Relation Determination

After obtaining the natural language patterns, we should map them to the specific predicates (i.e., relations).

[Mapear com os predicados que irão gerar os BGPs]

3.3 Template Formulation

We need to transform the declarative statements into question form since the sentence patterns acquired from open-domain text corpus are likely to be declarative.
First, we determine the wh-word according to the entity type. If the target type is a descendent of “Person” in the ontology, “who” or “whose” is selected. If the target type is a descendent of “Location” in the ontology, “where” is chosen. If the target type is a time, “when” is used. In the
remaining cases, “what” is used. In general, “which” can match any type. For the wh-word ‘who”, “whose”, “where”,
or “what”, we remove the corresponding type. For the wh-word “which”, we move the type behind “which”. Finally, the auxiliary verb “do”, “does”, or “can” is inserted into the sentence according to the subject of the sentence.

[Regra para identificar o tipo de pergunta de acordo com o tipo de objeto da tripla]

 4. ONLINE QUESTION DECOMPOSITION 

In the online phase, we need to build a structured query, i.e., SPARQL query, for the input question q. Moreover, the SPARQL query should comply with the schema and vocabulary that are used in G. In real scenario, q may be a simple question or a complex question. 

[Identificar o tipo de pergunta para saber o algoritmo a ser aplicado]

    

In this paper, we shall exploit the widely used Jaccard similarity coefficient since it is highly related to several other similarity measures, e.g., string edit distance [50], Dice similarity [18], and Overlap similarity [41], as discussed in the work [41]. The template with the largest similarity score is selected, denoted by ti (lines 3-6). 

[A similaridade entre a pergunta do template e a pergunta (decomposta) que é parte da pergunta original]

4.2 Answering Complex Questions

As shown in Definition 2, a complex question contains more than one fact, which can be classified into two categories: plain complex questions and “And-questions”. The “And-question” is a complex question that contains a conjunction, such as “and”, “or”, and “but”. The rest complex questions are called plain complex questions. ... a complex question can be decomposed into multiple simple questions. Hence, we need to retrieve a set S of templates from T to cover the semantics of q such that the templates in S form a semantic dependency graph.  

<... pulei uma parte>

5 . STRUCTURED QUERY FORMULATION 

To answer the questions, we need to build a semantic dependency graph SDG(q) for the input question q ...

5.1 Semantic Dependency Graph Construction

After acquiring decomposed subquestions and the corresponding templates, we can build the SDG(q) for q, which contains two steps, making connections (i.e., adding edges) between two templates and filling the slots in templates.

[Montar o grafo conectado e preencher os slots ]

5.2 Two-level Ambiguities

During the construction of semantic dependency graphs, a big challenge is to solve the ambiguities in the question decomposition. Generally, there are two kinds of ambiguities, i.e., entity-level ambiguity and structure-level ambiguity.
Rule-based Disambiguation. Entity-level ambiguity is a well-known problem, that is, a mention in the question may correspond to multiple distinct entities in the underlying knowledge graph. 

Besides entity-level ambiguities, we identify structure-level ambiguities, which refers to the case that a question may be decomposed into different sets of subquestions.

Definition 7. (Weak Semantic Relevance). Weak semantic relevance means that the triples are not correlated, i.e., the generated SPARQL query is not a connected graph.

[Nem sempre é possível gerar um grafo conectado]

6. EXPERIMENTAL STUDY

6.1 Experimental Setup
6.1.1 Datasets 

Knowledge Graphs. We use two well-known public knowledge graphs DBpedia [29] and Freebase [7] in the experiments.   
Free text corpus. We use the free online encyclopedia Wikipedia as the text corpus, based on which we can build templates automatically.
QA datasets. We use the datasets QALD-5 [39], WebQuestions [5], and ComplexQuestions [1] in the experiments.

[Três elementos para realizar os experimentos: o KG + Corpus + Perguntas com Respostas]

6.1.2 Metrics and Competitors

6.2 Template Evaluation
To evaluate the quality of these templates quantitatively, we recruit five students to rate the templates ...
6.3 Effectiveness Evaluation
6.4 Efficiency Evaluation

7. RELATED WORK 

Natural language question answering. ... They can be divided into the three categories.

Retrieval based methods. The methods integrate the techniques used in information extraction to answer natural language questions as it can provide the syntactic framework for pattern identification. The answer extraction module identifies all the matches for this pattern, based on which the final answers are extracted. ... Yao and Durme identify the subgraph, called topic graph, that consists of the entities contained in a question. 

Semantic parsing and paraphrase based methods. They often adopt a domain independent meaning representation derived from the combinatory categorial grammar (CCG) ... Xu et al. propose an efficient pipeline framework to model a user’s query intention as a phrase level dependency DAG which is
then instantiated regarding a specific KB to construct the final structured query.

Recently, a general framework for learning paraphrases for question answering, called PARA4QA, is proposed, where paraphrase scoring and QA models are trained end-to-end on question-answer pairs ..

Template based methods. They transform the input question into a structured query by employing templates. Conducting the structured query leads to final answers. Unger et al. propose to rely on a parse of the question to produce a SPARQL template that directly mirrors the internal structure of the question .... String similarity search has been well studied there years .... Xiao et al. study the top-k set similarity join, that is, it returns the top-k pairs of records ranked by their similarities ...
For more discussions on question answering and string similarity search, please refer to two surveys [19] and [51].

[19] D. Diefenbach, V. Lopez, K. Singh, and P. Maret. Core techniques of question answering systems over knowledge bases: a survey. Knowledge Information Systems, (2):1–41, 2017.
[51] M. Yu, G. Li, D. Deng, and J. Feng. String similarity search and join: a survey. Frontiers of Computer Science, 10(3):399–417, 2016. 
 

[Em relação a um KG hiper relacional essa abordagem não prevê consultas com contexto, explícito ou implícito. Por exemplo supondo a pergunta: Qual é a capital do Brasil? A resposta para o conjunto de fatos abaixo seria:   Brasilia, Salvador e Rio de Janeiro. Mas o país só pode ter uma capital. Um pergunta desse tipo deveria considerar que se o contexto temporal não está explícitamente especificado, algum parâmetro deve ser considerado como default. Nesse caso o verbo ser no presente indica que se deseja obter a atual capital, Brasília. Se a pergunta fosse: Qual era a capital do Brasil na época da Revolução Industrial? a melhor resposta seria Rio de Janeiro e se a pergunta fosse: Qual a primeira capital do Brasil? a resposta seria Salvador.] 

sujeito                            predicado        objeto                       Data Início       Data Fim
Brasilia                          capital do        Brasil                        1961
Salvador                        capital do        Brasil                        1549                1763
Rio de Janeiro               capital do        Brasil                        1763                1961

Revolução Industrial     instância de    Período Histórico     1760               1820, 1840

Comentários

Postagens mais visitadas deste blog

Connected Papers: Uma abordagem alternativa para revisão da literatura

Durante um projeto de pesquisa podemos encontrar um artigo que nos identificamos em termos de problema de pesquisa e também de solução. Então surge a vontade de saber como essa área de pesquisa se desenvolveu até chegar a esse ponto ou quais desdobramentos ocorreram a partir dessa solução proposta para identificar o estado da arte nesse tema. Podemos seguir duas abordagens:  realizar uma revisão sistemática usando palavras chaves que melhor caracterizam o tema em bibliotecas digitais de referência para encontrar artigos relacionados ou realizar snowballing ancorado nesse artigo que identificamos previamente, explorando os artigos citados (backward) ou os artigos que o citam (forward)  Mas a ferramenta Connected Papers propõe uma abordagem alternativa para essa busca. O problema inicial é dado um artigo de interesse, precisamos encontrar outros artigos relacionados de "certa forma". Find different methods and approaches to the same subject Track down the state of the art rese...

Knowledge Graph Embedding with Triple Context - Leitura de Abstract

  Jun Shi, Huan Gao, Guilin Qi, and Zhangquan Zhou. 2017. Knowledge Graph Embedding with Triple Context. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM '17). Association for Computing Machinery, New York, NY, USA, 2299–2302. https://doi.org/10.1145/3132847.3133119 ABSTRACT Knowledge graph embedding, which aims to represent entities and relations in vector spaces, has shown outstanding performance on a few knowledge graph completion tasks. Most existing methods are based on the assumption that a knowledge graph is a set of separate triples, ignoring rich graph features, i.e., structural information in the graph. In this paper, we take advantages of structures in knowledge graphs, especially local structures around a triple, which we refer to as triple context. We then propose a Triple-Context-based knowledge Embedding model (TCE). For each triple, two kinds of structure information are considered as its context in the graph; one is the out...

KnOD 2021

Beyond Facts: Online Discourse and Knowledge Graphs A preface to the proceedings of the 1st International Workshop on Knowledge Graphs for Online Discourse Analysis (KnOD 2021, co-located with TheWebConf’21) https://ceur-ws.org/Vol-2877/preface.pdf https://knod2021.wordpress.com/   ABSTRACT Expressing opinions and interacting with others on the Web has led to the production of an abundance of online discourse data, such as claims and viewpoints on controversial topics, their sources and contexts . This data constitutes a valuable source of insights for studies into misinformation spread, bias reinforcement, echo chambers or political agenda setting. While knowledge graphs promise to provide the key to a Web of structured information, they are mainly focused on facts without keeping track of the diversity, connection or temporal evolution of online discourse data. As opposed to facts, claims are inherently more complex. Their interpretation strongly depends on the context and a vari...