https://knowledge-nlp.github.io/kdd2023/papers/Kumar5.pdf
https://github.com/isunitha98selvan/odqa-tail
ABSTRACT
Pretrained Large Language Models (LLMs) have gained significant attention for addressing open-domain Question Answering (QA). While they exhibit high accuracy in answering questions related to common knowledge, LLMs encounter difficulties in learning about uncommon long-tail knowledge (tail entities).
[Entidades com poucas informações disponÃveis, não tão populares ou comuns no interesse do público em geral]
1 INTRODUCTION
However, the impressive achievements of LLMs in QA tasks are primarily observed with regard to common concepts that frequently appear on the internet (referred to as "head entities"), which are
thus more likely to be learned effectively by LLMs during pretraining time. Conversely, when it comes to dealing with long-tail knowledge, which encompasses rarely occurring entities (referred to as "tail entities"), LLMs struggle to provide accurate answers and often exhibit hallucination issues [5]. Due to the predominant focus of most QA datasets on head entities [ 3 , 6, 10], research investigating the performance of LLMs on long-tail knowledge has been limited.
[Conceito de Long-Tail e seu impacto nos LLMs. Os KGs podem cobrir tanto as tail quanto as head entities e também podem representar alegações em contextos recorrentes (default) e em contextos especÃficos]
In this study, we propose a novel approach to defining tail entities based on their degree information in Wikidata, as opposed to [7] relying on Wikipedia. By doing so, we generate QA datasets with distinct distributions from previous works [7], thus fostering diversity within tail-knowledge QA datasets. Within the context of Wikidata, the degrees of entities reflect their level of engagement with general knowledge. Hence, we leverage this degree information to define tail entities.
[Métrica para definir entidades tail]
Moreover, we investigate strategies to enhance the performance of pretrained LLMs by incorporating external resources, such as external documents or knowledge graphs, during inference time on our automatically-generated long-tail QA datasets.
[Integrar LLM e KG]
Introduction of novel tail knowledge QA datasets derived from the Wikidata knowledge graph
[Teriam exemplos de alegações com contexto neste dataset?]
RELATED WORK
Kandpal et al. [7] show that an LLM’s ability to answer a question is affected by how many times it has
seen relevant documents related to the question in its pre-training data. They show that LLMs struggle to reason accurately over rarer entities in the pre-training data.
In this work, instead of using the pre-training corpus, we define tail entities using Wikidata knowledge
graphs and construct a long-tail knowledge dataset that can be used to study the open-domain QA performance of LLMs.
AUTOMATIC GENERATION OF QA DATASETS FOR LONG-TAIL KNOWLEDGE
We define tail entities based on each entity’s node degree (i.e., the number of triplets that have the target entity as a subject node s1) in the knowledge graph. We first sample tail entities based on their degree information and extract all triplets that have the tail entities as the subject entity from Wikidata (proper degree bounds of tail entities will be discussed in the following section). Then we generate factoid questions by prompting LLMs with triplets.
Prompt
3.2.1 Degree bounds for tail entities. There are no strictly-formulated definitions for tail entities that are widely accepted. Degree bounds that instantly bring in differences in model performance are also
hard to be decided in advance. As a result, degree bounds for tail entities should be selected arbitrarily. In our experiments, we classify entities with node degrees between 15 and 100 as coarse-tail entities and entities with node degrees below 3 as fine-tail entities and compare the LLM performance on them.
[Degree não leva em consideração qualificadores ou referências dos statements ligados os subject node]
Ambiguous entities: Multiple entities can have the same surface forms.
[Diferenciar pela Identidade da Entidade que não seria o QNode, que é uma chave artificial]
Ambiguous properties: In Wikidata, a large number of properties cannot be used to generate sensible questions. For instance, subclass of, instance of, or part of would generate questions that are too vague to answer even for humans.
[Part of para objetos espaciais pode ser Contexto de Localização/Localidade]
3.2.3 Difficulty control. Questions generated from different properties can have different levels of difficulty.
[Número de respostas possÃveis]
3.2.4 LLM prompt for question generation. While the answer entity of a triplet is not part of the generated question, we find that the quality of generated questions improves when the complete triplet is provided in the prompt, instead of the first two elements (i.e., subject entity and property). For instance, given a triplet [david peel yates, conflict, world war ii], we get "What conflict was David Peel Yates involved in?" from GPT3 when using just the subject entity and property in prompt. On the contrary, when we use all subject, property, and object entities, the generated question becomes "What conflict did David Peel Yates serve in?".
[LLM precisa saber a resposta para formular a melhor pergunta. Como uma pessoa em um processo de exploração pode fazer isto se tiver pouco conhecimento do domÃnio? Somente com refinamentos sucessivos a partir do que aprender com as respostas anteriores]
3.2.5 Granularity of questions. Given a question, there could be several correct answers with different granularity. Unless the question specifies the granularity of the answer (e.g., which country or which city), QA datasets and models could easily pick different granularity of answers. For instance, when asked Where was Lovelyz formed?, a model could answer South Korea while the QA dataset has Seoul (the capital of South Korea) as the correct answer and marks the predicted answer wrong.
EVALUATION WITH LLMS AND EXTERNAL RESOURCES
Wikidata: Wikidata knowledge graph consists of 103, 305, 143 entities and 11, 007 properties. We access Wikidata using the Sling tool [17] in a triplet format (subject, property, object).
[Não usaram qualificadores e nem referencias]
Tail-entity datasets: We sample triplets from Wikidata to create Coarse-tail and Fine-tail datasets. Each dataset has 27, 691 triplets and 422 unique properties after the difficulty control (details in Section 3.2.3). One question&answer pair consists of a GPT3-generated question, an answer (i.e., object entity in the original triplet), and associated aliases for the answer.
4.4 LLM prompting with DPR and knowledge graphs
Knowledge graphs (KG) have been widely used to augment LLMs [19 , 25 ]. In this section, we examine how external knowledge graphs can cooperate with another external resource, Wikipedia to improve LLM performance for tail entities. We use Wikidata as our external knowledge graph after removing all triplets used for the QA generation.
5 CONCLUSION
Our work highlights the limitations of pre-trained LLMs in hadling long-tail knowledge in open-domain Question Answering. To investigate this limitation, we first propose to generate QA datasets specialized for tail entities automatically using degree information from the Wikidata knowledge graph. Our automatic QA generation approach aims to overcome the resource-intensive nature of manual dataset construction, allowing for the creation of diverse long-tail QA datasets.
[Não usou qualificadores e referencias da WD. Seria um possÃvel trabalho futuro considerar o contexto na métrica para seelcionar as long-tails]
Comentários
Postar um comentário
Sinta-se a vontade para comentar. CrÃticas construtivas são sempre bem vindas.