Yichong Xu, Chenguang Zhu, Ruochen Xu, Yang Liu, Michael Zeng, and Xuedong Huang. 2021. Fusing Context Into Knowledge Graph for Commonsense Question Answering. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 1201–1207, Online. Association for Computational Linguistics.
Abstract
Commonsense reasoning requires a model to make presumptions about world events via language understanding. Many methods couple pre-trained language models with knowledge graphs in order to combine the merits in language modeling and entity-based relational learning. However, although a knowledge graph contains rich structural information, it lacks the context to provide a more precise understanding of the concepts and relations. This creates a gap when fusing knowledge graphs into language modeling, especially in the scenario of insufficient paired text-knowledge data.
[Contexto aqui é a descrição das entidades e é obtido em fontes externas ao KG]
1 Introduction
While massive pre-trained models such as BERT (Devlin et al., 2018) and RoBERTa (Liu et al., 2019) are effective in language understanding, they lack modules to explicitly handle knowledge and commonsense. Also, text is much less efficient in representing commonsense compared with structured data.
[Senso comum seria melhor representado pelas relações em um KG]
For example, to understand that the painting Mona Lisa is in Louvre, it requires multiple sentences containing this fact for the language model to implicitly encode this information, whereas an edge with relation “LocatedAt” between two entity nodes “Mona Lisa” and “Louvre” can exactly represent the same information.
[Uma tripla representa a relação. Um modelo de linguagem precisa de muitos documentos para aprenedr essa relação]
However, there is still a non-negligible gap between the performance of these models and humans. One reason is that although a knowledge graph can encode topological information between the concepts, it lacks rich context information. For instance, for the entity node “Mona Lisa”, the graph depicts its relations to multiple other entities. But given this neighborhood information, it is still hard to infer that it is a painting.
[Mesmo com várias relações explÃcitas no KG ele ainda é incompleto e podem faltar informações importantes como qual é o tipo de entidade][Essas informações poderiam ser extraÃdas de dicionários de descrições através de abordagens de Open Information Extraction e acrescentadas ao KG? Não seria o caso de enriquecer o KG ao invés de tratar como fontes de informação separadas? ]
On the other hand, we can retrieve the precise definition of “Mona Lisa” from external sources, e.g. Wiktionary: A painting by Leonardo da Vinci, widely considered as the most famous painting in history.
[Faltam descrições sobre as entidades nos KGs que possam ser interpretadas por humanos]
Thus, we propose the DEKCOR model, i.e. DEscriptive Knowledge for COmmonsense Reasoning.
Given a commonsense question and a choice, we first extract the contained concepts. Then, we extract the edge between the question concept and the choice concept in ConceptNet (Liu and Singh, 2004). If such an edge does not exist, we compute a relevance score for each triple (node-edge-node) containing the choice concept, and select the one with the highest score. Next, we retrieve the definition of these concepts from Wiktionary via multiple criteria of text matching. Finally, we feed the question, choice, selected triple and definitions into the language model Albert (Lan et al., 2019), and the relevance score is generated by the appended attention layer and softmax layer.
[Extrair os conceitos da pergunta e da resposta é mapear as palavras em entidades do KG (?). Verifica se existe uma aresta entre dois conceitos, se sim escolhe essa pq é uma relação direta. Se não existir, verificar todas as arestas do conceito de resposta e computa um score para escolher qual tripla será entrada no modelo. A pergunta tem um conceito (E se tiver mais de um?) e a resposta o segundo conceito e cada conceito é associado a uma descrição em linguagem natural obtido em fontes externas. Esses itens são usados como entrada em um modelo de linguagem. E não poderia ser mais de uma tripla, algumas com o contexto dos conceitos /entidades envolvidos?]
2 Related work
3 Method
3.1 Knowledge Retrieval
Problem formulation. Given a commonsense question Q and several answer choices c1, ..., cn, the task is to select the correct answer.
In most cases, the question does not contain any mentions of the answer. Therefore, external knowledge source can be used to provide additional information. We adopt ConceptNet (Liu and Singh, 2004) as our knowledge graph G = (V, E)
[Poderia ser outro KG? QQ outro KG com adição de descrições de fontes externas teria um resultado semelhante?. É generalizável? ]
For each question and answer we get the corresponding concept in the knowledge graph provided by CommonsenseQA. Suppose the question entity is eq ∈ V and the choice entity is ec ∈ V . In order to conduct knowledge reasoning, we employ the KCR method (Knowledge Chosen by Relations). If there is a direct edge r from eq to ec in G, we choose this triple (eq, r, ec). Otherwise, we retrieve all the N triples containing ec. Each triple j is assigned a score sj
[Semelhante ao https://github.com/jessionlin/csqa/blob/master/Model_details.md mas esse busca o menor caminho, que pode ser uma aresta direta ou não. ]
3.2 Contextual information
The retrieved entities and relations from the knowledge graph are described by their surface form. Without additional context, it is hard to for the language model to understand its exact meaning, especially for proper nouns.Therefore, we leverage large-scale online dictionaries to provide definitions as context.
[Aqui o contexto são informações descritivas sobre as entidades recuperadas, o tipo da entidade entra como contexto, se é pintura, se é um museu de arte, etc ...]
Finally, we feed the question, answer, descriptions and triple into the Albert (Lan et al., 2019) encoder in the following format: [CLS] Q ci [SEP] eq: dq [SEP] ec: dc [SEP] triple.
4.1 Datasets
We evaluate our model on the benchmark dataset for commonsense reasoning: CommonsenseQA
[CommonsenseQA é um benchmark de perguntas com respostas de múltipla escolha que requer diferentes tipos de conhecimento de senso comum para prever as respostas corretas. Ele contém 12.102 perguntas com uma resposta correta e quatro respostas distratoras.]
4.2 Baselines
We compare our models with state-of-the-art baselines on CommonsenseQA. All baselines employ pre-trained models including RoBERTa .... Some baselines employ additional modules to process knowledge information.
4.4 Results
This demonstrates the effectiveness of the usage of knowledge description to provide context.
Furthermore, we notice two trends based on the results. First, the underlying pre-trained language model is important in commonsense reasoning quality. In general, we observe this order of accuracy among these language models: BERT<RoBERTa<XLNet<Albert<T5. Second, the additional knowledge module is critical to provide external information for reasoning. For example, RoBERTa+KEDGN outperforms the vanilla RoBERTa by 1.9% in accuracy, and our model outperforms the vanilla Albert model by 6.8% in accuracy.
[Tanto a qualidade da fonte externa para obter as descrições quanto o modelo de linguagem usado interferem na acurácia do resultado. Se mjudar o KG ou a fontes de descrições isso tem impacto no modelo, retreinar.]
5 Conclusions
In this paper, we propose to fuse context information into knowledge graph for commonsense reasoning. As a knowledge graph often lacks description for the contained entities and edges, we leverage Wiktionary to provide definitive text for each question/choice entity. This description is combined with entity names and sent into a pretrained language model to produce predictions.
Abstract
Commonsense reasoning requires a model to make presumptions about world events via language understanding. Many methods couple pre-trained language models with knowledge graphs in order to combine the merits in language modeling and entity-based relational learning. However, although a knowledge graph contains rich structural information, it lacks the context to provide a more precise understanding of the concepts and relations. This creates a gap when fusing knowledge graphs into language modeling, especially in the scenario of insufficient paired text-knowledge data.
[Contexto aqui é a descrição das entidades e é obtido em fontes externas ao KG]
1 Introduction
While massive pre-trained models such as BERT (Devlin et al., 2018) and RoBERTa (Liu et al., 2019) are effective in language understanding, they lack modules to explicitly handle knowledge and commonsense. Also, text is much less efficient in representing commonsense compared with structured data.
[Senso comum seria melhor representado pelas relações em um KG]
For example, to understand that the painting Mona Lisa is in Louvre, it requires multiple sentences containing this fact for the language model to implicitly encode this information, whereas an edge with relation “LocatedAt” between two entity nodes “Mona Lisa” and “Louvre” can exactly represent the same information.
[Uma tripla representa a relação. Um modelo de linguagem precisa de muitos documentos para aprenedr essa relação]
However, there is still a non-negligible gap between the performance of these models and humans. One reason is that although a knowledge graph can encode topological information between the concepts, it lacks rich context information. For instance, for the entity node “Mona Lisa”, the graph depicts its relations to multiple other entities. But given this neighborhood information, it is still hard to infer that it is a painting.
[Mesmo com várias relações explÃcitas no KG ele ainda é incompleto e podem faltar informações importantes como qual é o tipo de entidade][Essas informações poderiam ser extraÃdas de dicionários de descrições através de abordagens de Open Information Extraction e acrescentadas ao KG? Não seria o caso de enriquecer o KG ao invés de tratar como fontes de informação separadas? ]
On the other hand, we can retrieve the precise definition of “Mona Lisa” from external sources, e.g. Wiktionary: A painting by Leonardo da Vinci, widely considered as the most famous painting in history.
[Faltam descrições sobre as entidades nos KGs que possam ser interpretadas por humanos]
Thus, we propose the DEKCOR model, i.e. DEscriptive Knowledge for COmmonsense Reasoning.
Given a commonsense question and a choice, we first extract the contained concepts. Then, we extract the edge between the question concept and the choice concept in ConceptNet (Liu and Singh, 2004). If such an edge does not exist, we compute a relevance score for each triple (node-edge-node) containing the choice concept, and select the one with the highest score. Next, we retrieve the definition of these concepts from Wiktionary via multiple criteria of text matching. Finally, we feed the question, choice, selected triple and definitions into the language model Albert (Lan et al., 2019), and the relevance score is generated by the appended attention layer and softmax layer.
[Extrair os conceitos da pergunta e da resposta é mapear as palavras em entidades do KG (?). Verifica se existe uma aresta entre dois conceitos, se sim escolhe essa pq é uma relação direta. Se não existir, verificar todas as arestas do conceito de resposta e computa um score para escolher qual tripla será entrada no modelo. A pergunta tem um conceito (E se tiver mais de um?) e a resposta o segundo conceito e cada conceito é associado a uma descrição em linguagem natural obtido em fontes externas. Esses itens são usados como entrada em um modelo de linguagem. E não poderia ser mais de uma tripla, algumas com o contexto dos conceitos /entidades envolvidos?]
2 Related work
3 Method
3.1 Knowledge Retrieval
Problem formulation. Given a commonsense question Q and several answer choices c1, ..., cn, the task is to select the correct answer.
In most cases, the question does not contain any mentions of the answer. Therefore, external knowledge source can be used to provide additional information. We adopt ConceptNet (Liu and Singh, 2004) as our knowledge graph G = (V, E)
[Poderia ser outro KG? QQ outro KG com adição de descrições de fontes externas teria um resultado semelhante?. É generalizável? ]
For each question and answer we get the corresponding concept in the knowledge graph provided by CommonsenseQA. Suppose the question entity is eq ∈ V and the choice entity is ec ∈ V . In order to conduct knowledge reasoning, we employ the KCR method (Knowledge Chosen by Relations). If there is a direct edge r from eq to ec in G, we choose this triple (eq, r, ec). Otherwise, we retrieve all the N triples containing ec. Each triple j is assigned a score sj
[Semelhante ao https://github.com/jessionlin/csqa/blob/master/Model_details.md mas esse busca o menor caminho, que pode ser uma aresta direta ou não. ]
3.2 Contextual information
The retrieved entities and relations from the knowledge graph are described by their surface form. Without additional context, it is hard to for the language model to understand its exact meaning, especially for proper nouns.Therefore, we leverage large-scale online dictionaries to provide definitions as context.
[Aqui o contexto são informações descritivas sobre as entidades recuperadas, o tipo da entidade entra como contexto, se é pintura, se é um museu de arte, etc ...]
Finally, we feed the question, answer, descriptions and triple into the Albert (Lan et al., 2019) encoder in the following format: [CLS] Q ci [SEP] eq: dq [SEP] ec: dc [SEP] triple.
4.1 Datasets
We evaluate our model on the benchmark dataset for commonsense reasoning: CommonsenseQA
[CommonsenseQA é um benchmark de perguntas com respostas de múltipla escolha que requer diferentes tipos de conhecimento de senso comum para prever as respostas corretas. Ele contém 12.102 perguntas com uma resposta correta e quatro respostas distratoras.]
4.2 Baselines
We compare our models with state-of-the-art baselines on CommonsenseQA. All baselines employ pre-trained models including RoBERTa .... Some baselines employ additional modules to process knowledge information.
4.4 Results
This demonstrates the effectiveness of the usage of knowledge description to provide context.
Furthermore, we notice two trends based on the results. First, the underlying pre-trained language model is important in commonsense reasoning quality. In general, we observe this order of accuracy among these language models: BERT<RoBERTa<XLNet<Albert<T5. Second, the additional knowledge module is critical to provide external information for reasoning. For example, RoBERTa+KEDGN outperforms the vanilla RoBERTa by 1.9% in accuracy, and our model outperforms the vanilla Albert model by 6.8% in accuracy.
[Tanto a qualidade da fonte externa para obter as descrições quanto o modelo de linguagem usado interferem na acurácia do resultado. Se mjudar o KG ou a fontes de descrições isso tem impacto no modelo, retreinar.]
5 Conclusions
In this paper, we propose to fuse context information into knowledge graph for commonsense reasoning. As a knowledge graph often lacks description for the contained entities and edges, we leverage Wiktionary to provide definitive text for each question/choice entity. This description is combined with entity names and sent into a pretrained language model to produce predictions.
Comentários
Postar um comentário
Sinta-se a vontade para comentar. CrÃticas construtivas são sempre bem vindas.