Pular para o conteúdo principal

Leitura de Artigo - A Comprehensive Approach to Assess Trustworthiness and Completeness of Knowledge Graphs

 A Comprehensive Approach to Assess Trustworthiness and Completeness of Knowledge Graphs

 International Journal of Knowledge Engineering, Vol. 10, No. 1, 2024

 

 Abstract

Completeness and trustworthiness are two dimensions that are used to assess the quality of KGs. Estimation of the completeness and trustworthiness of a largescale knowledge graph often requires humans to annotate samples from the graph.

Estimativa de métricas de qualidade que dependem de interferência humana

Nowadays, to reduce the costs of the manual construction of knowledge graphs, many KGs have been constructed automatically from sources with varying degrees of trustworthiness. Therefore, possible noises and conflicts are inevitably introduced in the process of construction, which severely interferes with the quality of constructed KGs. 

O processo de construção automática para aumentar a completude pode degradar a confiabilidade. Introduz "fatos" incorretos ou conflitantes.

we propose a new approach to automatically evaluate and assess existing KGs in terms of completeness and trustworthiness.

Como avaliar estas métricas de modo automático?

INTRODUCTION

There are several ways to define trustworthiness. For instance, the user’s acceptance of the information as right, genuine, real, and credible is defined by its trustworthiness; trustworthiness also refers to an entity’s or KG’s reputation, which is based on personal experience or third-party recommendations

Certo x Errado está vinculado a Verdade Absoluta

To evaluate the quality of KGs, some papers explore several main evaluation dimensions of KG quality, such as accuracy, completeness, consistency, timeliness, trustworthiness, and availability [5]. Nevertheless, when we improve the accuracy, timeliness, and consistency of a KG, we also increase its trustworthiness.

Dimensões de qualidade que podem influencia na Confiabilidade

we evaluate the entire KG to come up with an accurate trust score, which has been shown in the results of our experiments. To the best of our knowledge, this work is among the first to propose a new approach to evaluate KGs and assign a certain trustworthiness factor score to compare these KGs in terms of their degrees of trustworthiness.

Métrica que permite comparabilidade para escolha de qual KG usar em uma tarefa

Completeness can be subjective because it implies that the quantity of data is adequate for the user’s needs, which might vary considerably. In this context, completeness can be measured as the percentage of available data divided by the required data.

Completude pode variar por tarefa/objetivo/intenção de uso

Unlike other quality dimensions of a KG, the evaluation of KG completeness needs a reference or gold standard to compare results against

Neste caso seria independente da tarefa

PTrustE: J. Ma, C. Zhou, Y. Wang, Y. Guo, G, Hu, Y. Qiao, and Y. Wang, “PTrustE: A high-accuracy knowledge graph noise detection method based on path trustworthiness and triple embedding,” Knowledge-Based Systems, vol. 256, 109688, 2022.

CKRL suggests three different sorts of triple confidences based on local triple and global path information.

Uma métrica por tripla. Em um Multi-layer KG seria possível associar esta métrica a cada tripla mas este par chave/valro não corresponde a um contexto. 

CKRL:  R. Xie, Z. Liu, F. Lin, and L. Lin, “Does William Shakespeare really write Hamlet? Knowledge representation learning with confidence,” in Proc. the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, 2018, April.

Therefore, in this research paper, we first identify the noisy triples using CKRL, then we assess the trustworthiness of KGs by calculating the percentage of correct triples in the KG. Furthermore, we evaluate the completeness of knowledge graphs by measuring the percentage of found triples divided by the queried triples.

Método para calcular as duas métricas

DESIGN OF EXPERIMENTAL STUDY

we use three standard datasets, namely, FB15K, WN18, and NELL995.

FB = FreeBase

In other words, to construct a negative triple, they randomly alter one of the head or tail entities for a given positive triple in KG. It is required for the formation of negative triples that the new head or tail exists in the head or tail position with the same relation in the KG to make it harder and more confusing. ....

To be more precise, to add a noisy triple from an initial positive triple (h, r, t) in KG, either h or t was randomly switched to generate a negative triple. Following this idea, three noisy KGs based on the aforementioned datasets were acquired, with noisy triples making up 10%, 20%, and 40% of positive triples, respectively.

Ruído foi introduzido nos datasets

To calculate the trust score for the entire KG, we calculate the number of trusted triples over the total number of triples, including noise.

Se o trust for maior que 0.5 é fato, caso contrário é ruido

We removed 10, 20, and 40 percent of the triples to have three KGs with different levels of completeness. Then, we run random queries on these three KGs. If there is a matching triple for the query in the knowledge graph, we increase the completeness score. We query 40% of the triples to get a great estimate of the completeness of each KG. To calculate the completeness score for each KG, we divide the number of found triples by the total number of queried triples. The results show that the calculated completeness score mirrors the level of completeness of a knowledge graph.

Triplas foram removidas e consultas executadas aleatóriamente. O resultado das consultas é comparado com o gabarito da consulta ao KG completo.

As consultas poderiam corresponder as perguntas de competência que se espera que o KG responda?


PARA LER


X. Wang, L. Chen, T. Ban, M. Usman, Y. Guan, S. Liu, and H. Chen, “Knowledge graph quality control: A survey,” Fundamental Research, 2021.


 

 



Comentários

Postagens mais visitadas deste blog

Aula 12: WordNet | Introdução à Linguagem de Programação Python *** com NLTK

 Fonte -> https://youtu.be/0OCq31jQ9E4 A WordNet do Brasil -> http://www.nilc.icmc.usp.br/wordnetbr/ NLTK  synsets = dada uma palavra acha todos os significados, pode informar a língua e a classe gramatical da palavra (substantivo, verbo, advérbio) from nltk.corpus import wordnet as wn wordnet.synset(xxxxxx).definition() = descrição do significado É possível extrair hipernimia, hiponimia, antonimos e os lemas (diferentes palavras/expressões com o mesmo significado) formando uma REDE LEXICAL. Com isso é possível calcular a distância entre 2 synset dentro do grafo.  Veja trecho de código abaixo: texto = 'útil' print('NOUN:', wordnet.synsets(texto, lang='por', pos=wordnet.NOUN)) texto = 'útil' print('ADJ:', wordnet.synsets(texto, lang='por', pos=wordnet.ADJ)) print(wordnet.synset('handy.s.01').definition()) texto = 'computador' for synset in wn.synsets(texto, lang='por', pos=wn.NOUN):     print('DEF:',s...

truth makers AND truth bearers - Palestra Giancarlo no SBBD

Dando uma googada https://iep.utm.edu/truth/ There are two commonly accepted constraints on truth and falsehood:     Every proposition is true or false.         [Law of the Excluded Middle.]     No proposition is both true and false.         [Law of Non-contradiction.] What is the difference between a truth-maker and a truth bearer? Truth-bearers are either true or false; truth-makers are not since, not being representations, they cannot be said to be true, nor can they be said to be false . That's a second difference. Truth-bearers are 'bipolar,' either true or false; truth-makers are 'unipolar': all of them obtain. What are considered truth bearers?   A variety of truth bearers are considered – statements, beliefs, claims, assumptions, hypotheses, propositions, sentences, and utterances . When I speak of a fact . . . I mean the kind of thing that makes a proposition true or false. (Russe...

DGL-KE : Deep Graph Library (DGL)

Fonte: https://towardsdatascience.com/introduction-to-knowledge-graph-embedding-with-dgl-ke-77ace6fb60ef Amazon recently launched DGL-KE, a software package that simplifies this process with simple command-line scripts. With DGL-KE , users can generate embeddings for very large graphs 2–5x faster than competing techniques. DGL-KE provides users the flexibility to select models used to generate embeddings and optimize performance by configuring hardware, data sampling parameters, and the loss function. To use this package effectively, however, it is important to understand how embeddings work and the optimizations available to compute them. This two-part blog series is designed to provide this information and get you ready to start taking advantage of DGL-KE . Finally, another class of graphs that is especially important for knowledge graphs are multigraphs . These are graphs that can have multiple (directed) edges between the same pair of nodes and can also contain loops. The...