Sumit Neelam, Udit Sharma, Hima Karanam, Shajith Ikbal, Pavan Kapanipathi, Ibrahim Abdelaziz, Nandana Mihindukulasooriya, Young-Suk Lee, Santosh K. Srivastava, Cezar Pendus, Saswati Dana, Dinesh Garg, Achille Fokoue, G. P. Shrivatsa Bhargav, Dinesh Khandelwal, Srinivas Ravishankar, Sairam Gurajada, Maria Chang, Rosario Uceda-Sosa, Salim Roukos, Alexander G. Gray, Guilherme Lima, Ryan Riegel, Francois P. S. Luus, L. Venkata Subramaniam:
A Benchmark for Generalizable and Interpretable Temporal Question Answering over Knowledge Bases. CoRR abs/2201.05793 (2022)
Abstract
Knowledge Base Question Answering (KBQA) tasks that involve complex reasoning are emerging as an important research direction. However, most existing KBQA datasets focus primarily on generic multi-hop reasoning over explicit facts, largely ignoring other reasoning types such as temporal, spatial, and taxonomic reasoning.
[Reasoning no sentido de inferir qual é o contexto de interesse]
In this paper, we present a benchmark dataset for temporal reasoning, TempQA-WD, to encourage research in extending the present approaches to target a more challenging set of complex reasoning tasks. Specifically, our benchmark is a temporal question answering dataset with the following advantages: (a) it is based on Wikidata, which is the most frequently curated, openly available knowledge base, (b) it includes intermediate sparql queries to facilitate the evaluation of semantic parsing based approaches for KBQA, and (c) it generalizes to multiple knowledge bases: Freebase and Wikidata.
[Mais um baseado na WD]
The TempQA-WD dataset is available at https: //github.com/IBM/tempqa-wd.
1 Introduction
The goal of KBQA systems is to answer natural language questions by retrieving and reasoning over facts in Knowledge Base (KB).
Currently, there is a lack of approaches and datasets that address other types of complex reasoning, such as temporal and spatial reasoning. In this paper, we focus on a specific category of questions called temporal questions, where answering a question requires reasoning about points and intervals in time.
[Não é só Quando, inclui concomitância de fatos, encadeamento de fatos]
Our aim in this paper is to fill the above mentioned gaps by adapting the TempQuestions dataset to Wikidata and by enhancing it with additional SPARQL query annotations. Having SPARQL queries for temporal dataset is crucial to refresh ground truth answers as the KB evolves. We choose Wikidata for this dataset because it is well structured, fast evolving, and the most up-to-date KB, making it a suitable candidate for temporal KBQA.
[NO meu caso tem que ser WD pq se trata de um hiper grafo, ou seja, os fatos são contextualizados pq as areastas tem qualificadores]
... help drive research towards development of generalizable approaches, i.e., those that could be easily be adaptable to multiple KBs.
[Generalizar para outros contextos]
2 Related Work
Table 2: This table compares most of the KBQA datasets based on features relevant to the work (Multi-hop x Temporal Context)
3 Dataset
TempQuestions (Jia et al., 2018a) was the first KBQA dataset intended to focus specifically on temporal reasoning
We adapt TempQuestions to Wikidata to create a temporal QA dataset that has three desirable properties. First, in identifying answers in Wikidata, we create a generalizable benchmark that has parallel annotations on two KBs. Second, we take advantage of Wikidata’s evolving, up-to-date knowledge. Lastly, we enhance TempQuestions with SPARQL, entity, and relation annotations so that we may evaluate intermediate outputs of KBQA systems.
[Não basta ter a pergunta e a resposta, é necessário ter a query SPARQL para avaliar se o sistema gerou uma consulta igual ou parecida]
3.1 Wikidata
We chose Wikidata as our knowledge base as it has many temporal facts with appropriate knowledge representation encoded.
It supports reification of statements (triples) to add additional metadata with qualifiers such as start date, end date, point in time, location etc. ... With such representation and the availability of up-to-date information, Wikidata makes it a good choice to build benchmark datasets to test different kinds of reasoning including temporal reasoning.
[Os qualificadores acrescentam contexto aos fatos mas a reificação ocorre ao transferir para o BlazeGraph pq RDF não suporta]
3.2 Dataset Details
We took all the questions from TempQuestions dataset (of size 1271) and chose a subset for which we could find Wikidata answers. This subset has 839 questions that constitute our new dataset, TempQA-WD. We annotated this set with their corresponding Wikidata SPARQL queries and the derived answers.
Within this dataset, we also chose a smaller subset (of size 175) for more detailed annotations. ,,, The goal of these additional annotations is to encourage improved interpretability of the temporal KBQA systems, i.e., to evaluate accuracy of outputs expected at intermediate stages of the system.
[Interpretabilidade seria o mesmo que Explicabilidade. O usuário poderia ter acesso a consulta para entender o subgrafo subjacente que gerou os resultados? ]
3.2.1 Question Complexity Categorization
In this dataset, we also labeled questions with complexity category based on the complexity of the question in terms of temporal reasoning required to answer.
[Não é complexibilidade de graph patter: BGP x CGP]
1) Simple: Questions that involve one temporal event and need no temporal reasoning to derive the answer. For example, questions involving simple retrieval of a temporal fact or simple retrieval of other answer types using a temporal fact.
[BGP Look up]
2) Medium: Questions that involve two temporal events and need temporal reasoning (such as overlap/before/after) using time intervals of those events. We also include those questions that involve single temporal event but need additional non-temporal reasoning.
[CGP com Reasoning]
3) Complex: Questions that involve two or more temporal events, need one temporal reasoning and also need an additional temporal or non-temporal reasoning like teenager or spatial or class hierarchy
[CGP com Reasoning em mais de um aspecto]
In addition to the constants and logical connectives, we introduced some new temporal functions and instance variables to avoid function nesting.
interval, overlap, before, after, teenager, year; where interval gets time interval associated with event and overlap, before, after are used to compare temporal events ...
[Regras para parsing]
4 Evaluation
4.2 Metrics
We use GERBIL
[mesmo do QALD-9 Plus]
We use standard performance metrics typically used for KBQA systems, namely macro precision, macro recall and F1.
4.3 Results & Discussion
Como posso usar na minha pesquisa?
1) Avaliar as perguntas e entender essa classificação
2) Adaptar para outras dimensões contextuais como espacial, proveniência, tematica
3) Explorar melhor o reasoning no sentido de hierarquia de subclasse/instância
Novas URLS
ResponderExcluirhttps://github.com/IBM/tempqa-wd
https://ibm.github.io/neuro-symbolic-ai/toolkit/tempqa-wd