Pular para o conteúdo principal

Capturing Concept Similarity with Knowledge Graphs - Leitura de Artigo

Filip Ilievski, Kartik Shenoy, Nicholas Klein, Hans Chalupsky, Pedro Szekely
Information Sciences Institute, University of Southern California, Marina del Rey, CA, USA

Abstract—Robust estimation of concept similarity is crucial for a range of AI applications, like
deduplication, recommendation, and entity linking. Rich and diverse knowledge in large
knowledge graphs like Wikidata can be exploited for this purpose.

Introduction

... we need methods to automatically infer whether two arbitrary concepts are identical, dissimilar, or nearly identical [1].

** Conceitos ... mas também é necessário calcular a similaridade entre entidades de um KG **

The task of concept word similarity has been very popular.... Early work generally relies on taxonomy-based methods that leverage the distance between two words in a taxonomy hierarchy. 

** Como por exemplo calculando a distância de cada palavra em relação ao mais próximo ancestral em comum **

More recently, pre-trained word embeddings have been shown to natively capture word similarity at scale. Word embeddings may benefit from retrofitting to lexical resources like WordNet. It is unclear
how to best estimate similarity of concepts described in KGs. 

** Word Embeddings usam técnicas baseadas em corpus e podem refletir melhor a relação entre as palavras com base em co-ocorrência no mesmo contexto (e não a similaridade do sentido). WordNet aqui é visto como uma taxonomia, um dicionário mas somente terminológico e não um KG **

** Ver no material do Jonatas sobre retrofitting ** 

Besides language models and taxonomy-based metrics, we can leverage graph embeddings, like TransE  and ComplEx , which organize nodes in a geometric space according to their structural links to other nodes. Random walk methods, such as node2vec variants, leverage the generalizability of language modeling, applying it to graph nodes instead of words. Furthermore, the embeddings created by language models (LMs) or KGs can be retrofitted based on background knowledge, coming from the target graph or additional resources.

** Como combinar diferentes métodos? **

Background 

Similarity is a central theoretical construct in psychology, facilitating the transfer of a situation to an original training context. Tversky poses that the literal similarity between two objects A and B is proportional to the intersection of their features and inversely proportional to the features that differ (A − B and B − A). 

** Definição de similaridade nessa pesquisa **

In this paper, we consider the task of literal similarity between two concepts. Given two concept nodes, c1 and c2 in a KG G, a system is asked to provide a pairwise similarity score sim(c1, c2). We consider similarity to be asymmetric, i.e., sim(c1, c2) ̸= sim(c2, c1). Following common practice in the concept and word similarity tasks, we assume that the similarity of two concepts can be measured on a continuous numeric scale.

Natural Language Processing research has studied the extent to which two concepts are similar or related. Here, similarity likens the notion of literal similarity in psycholinguistics, while relatedness is a broader notion that indicates that two concepts tend to appear in the same topical context.

Framework for estimating similarity

The proposed framework is visually depicted in Figure 1. 

** Tem duas fases: uma offline e outra online, como na proposta de busca **

We use graph embedding and text embedding models, as well as ontology-based metrics, as initial similarity estimators. We also concatenate the embeddings in order to combine their scores. 

We use retrofitting to further tune the individual embedding models, through distant supervision over millions of weighted pairs extracted automatically from large-scale knowledge graphs. For a given concept pair, the similarity scores generated by the retrofitted embedding models can be combined with the scores by the ontology-based models.


 

Similarity models - Offline

Graph embedding models - We experiment with four KG embedding models, which can be divided
into: translation based models (TransE [12] and ComplEx [13]) and random walk models  (DeepWalk [40] and S-DeepWalk [15]). For all models, we compute the cosine similarity between their embeddings for c1 and c2.

** Ainda não usou os modelos que consideram os qualificadores como elementos diferentes das triplas/quads **

Language models - We use Transformer LMs to represent the textual information node associated with a node in the graph. Similarity between two nodes is then measured through the cosine similarity between two LM embeddings. We experiment with four kinds of textual information: 1) labels, which consider only the English label; 2) labels+desc, which considers a concatenation between a node label and its description; 3) lexicalization, where we automatically generate a node description based on the properties: P31 (instance of), P279 (subclass of), P106 (occupation), P39 (position held), P1382 (partially coincident with), P373 (Commons Category), P452 (industry); and 4) abstract, which is based on the first sentences from entity abstracts in the DBpedia KG, mapped to Wikidata through their sitelinks.

** Transformers para texto ... não é GPT3 ainda **

Ontology-aware models - We use three structureaware metrics. Class similarity computes the set of common IsA parents for two nodes. .... Jiang Conrath is an information theoretic node-based similarity measure that leverages the information content of the least common subsumer ... TopSim computes top-similar regions for each node by enumerating nearest neighbors based on the KG ontology and from embeddings. ...

Self-supervision knowledge - Offline

We tune the original embeddings by selfsupervision to two KGs: Wikidata and ProBase.
We derive three datasets from Wikidata’s subclass-of (P279) ontology...

We define three weighing methods for the generated pairs from these two datasets: (1) constant
weighting value of 1; (2) class similarity between the two nodes (using the class metric described in the last section); and (3) cosine similarity between the concatenated labels and descriptions of the two nodes. ... We focus our experiments on cosine similarity as a weighting function, because we observed empirically that it consistently performs better or comparable to the other two weighting functions.

Retrofitting - Offline

We use the retrofitting technique ..., which iteratively updates node embeddings in order to bring them closer in accordance to their connections in an external dataset.

Experimental setup

We experiment with three benchmarks: 1) WD-WordSim353 ... 2) WD-RG65 is a benchmark which is based on the DBpedia disambiguation ... 3) WD-MC30 is a benchmark which is based on the DBpedia disambiguation ...

We measure the impact of retrofitting with subsets from Wikidata and ProBase, scored based on language models.

We use the KGTK [48] toolkit to lexicalize a node, subset the graphs, and create various graph and language model-based embeddings. We use scikit-learn for supervised learning. We use KGTK’s similarity API to obtain scores for the metrics Class, Jiang Conrath, and TopSim.

** GitHub com os notebooks -> https://github.com/usc-isi-i2/wd-similarity **

Results

How well do different algorithms and combinations capture semantic similarity?

The Abstract-based method performs best among all language model variants, and overall. It outperforms the other LMs because DBpedia’s abstracts contain information that is more comprehensive and tailored to entity types than Wikidata labels, descriptions, or static property sets.

These methods (graph embedding methods) are consistently outperformed by the Lexicalization and Abstract methods, suggesting that the graph embeddings’ wealth of information to consider is a double-edged-sword: many properties are considered that may not be useful for determining similarity, adding distractions that can decrease performance. The Abstract method has an additional advantage over the graph embeddings in that it is less restricted in terms of the kind of information it can consider, whereas the graph embeddings focus solely on relations and can not make use of numeric- or string-valued properties. The combination methods that we evaluated generally did not yield improved performance
over the best individual method (Abstract).

** Métodos de embeddings de texto (LM) foram melhores que embeddings de grafo e a combinação deles não trouxe melhoria **

What is the impact of retrofitting?

Retrofitting is overall beneficial for estimating similarity. On average across the three benchmarks, it improves the performance of nine out of the eleven methods.

The impact of retrofitting is lower on methods that consider richer information already, like Abstract and Lexicalized. This is because these methods already integrate taxonomic information, and retrofitting might bring concepts that are nearly identical or merely related too close in the embedding space.

These findings indicate that similarity between highly similar and dissimilar concepts is well-understood and captured by current methods, whereas the intermediate spectrum of near-identity and relatedness requires further study and focused evaluation.

** Retrofiting melhorou os métodos mas foi mais útil para ajustar métodos de grafo e texto com pouca informação. E dentro desses, melhorou os resultados que não estão nos pontos mais extremos **

Conclusions

The experiments revealed that:

  1. pairing language models with contextualized information found in abstracts led to optimal performance.
  2. retrofitting with taxonomic information from Wikidata generally improved performance across methods, with the simpler methods benefiting more from retrofitting. 
  3. retrofitting with the ProBase KG yielded consistently negative results, indicating that the impact of retrofitting directly depends on the quality of the underlying data.
  4. analysis demonstrated that both vanilla models and retrofitted models perform best on identical and dissimilar pairs. 

Experiments on three benchmarks reveal that pairing language models with rich information performs best, whereas the impact of retrofitting is most positive on methods which originally do not consider comprehensive information. The performance of retrofitting depends on the source of knowledge and the edge weighting function. 

Future work should investigate contextual similarity between concepts,which would characterize partial identity and relatedness of concept pairs.


Mais sobre retrofitting em NLP

https://odsc.medium.com/the-promise-of-retrofitting-building-better-models-for-natural-language-processing-20783b19cdcb

To use common sense with deep learning, one must connect the curated, organized information about the world — like ConceptNet — with previously unseen, domain-specific data, such as a set of documents to analyze. The best way to do that is a family of algorithms called ‘retrofitting,’ which were first published by Manaal Faruqui in 2015. The goal of retrofitting is to combine structure information like a knowledge graph (ConceptNet or WordNet, for example) with an embedding of word vectors, similar to Word2Vec. By modifying the embedding so that related concepts in the knowledge graph are related in similar ways in the embedding, we’ve applied knowledge-based constraints after training the distributional word vectors. The thinking is that connected terms in the knowledge graph should have vectors that are closer together inside the embedding itself.

** Aqui WordNet já é tratada como um KG. A ideia é calcular os embeddings baseados em corpus e depois ajustar com base na estrutura do KG ** 

https://krayush.medium.com/retrofitting-word-vectors-to-semantic-lexicons-3f85f4208f4f

Retrofitting Word Vectors to Semantic Lexicons
How to integrate information that comes from existing lexicons to word vectors? A method as published in “Retrofitting Word Vectors to Semantic Lexicons” in NAACL, 2015.

** Aqui WordNet é só o léxico (taxonomia, dicionário) e não um KG **

A quick overview: The paper formulates a post-processing method incorporating the idea of belief propagation across relational graph constructed from the lexicon at hand.
What they propose? How about a setup in which a lexicon is represented as a graph with edges denoting the relation between two nodes (words). Now, each word looks at its neighbors, collect information (their word embeddings) from them and updates itself iteratively.

Comentários

Postagens mais visitadas deste blog

Connected Papers: Uma abordagem alternativa para revisão da literatura

Durante um projeto de pesquisa podemos encontrar um artigo que nos identificamos em termos de problema de pesquisa e também de solução. Então surge a vontade de saber como essa área de pesquisa se desenvolveu até chegar a esse ponto ou quais desdobramentos ocorreram a partir dessa solução proposta para identificar o estado da arte nesse tema. Podemos seguir duas abordagens:  realizar uma revisão sistemática usando palavras chaves que melhor caracterizam o tema em bibliotecas digitais de referência para encontrar artigos relacionados ou realizar snowballing ancorado nesse artigo que identificamos previamente, explorando os artigos citados (backward) ou os artigos que o citam (forward)  Mas a ferramenta Connected Papers propõe uma abordagem alternativa para essa busca. O problema inicial é dado um artigo de interesse, precisamos encontrar outros artigos relacionados de "certa forma". Find different methods and approaches to the same subject Track down the state of the art rese...

Knowledge Graph Embedding with Triple Context - Leitura de Abstract

  Jun Shi, Huan Gao, Guilin Qi, and Zhangquan Zhou. 2017. Knowledge Graph Embedding with Triple Context. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM '17). Association for Computing Machinery, New York, NY, USA, 2299–2302. https://doi.org/10.1145/3132847.3133119 ABSTRACT Knowledge graph embedding, which aims to represent entities and relations in vector spaces, has shown outstanding performance on a few knowledge graph completion tasks. Most existing methods are based on the assumption that a knowledge graph is a set of separate triples, ignoring rich graph features, i.e., structural information in the graph. In this paper, we take advantages of structures in knowledge graphs, especially local structures around a triple, which we refer to as triple context. We then propose a Triple-Context-based knowledge Embedding model (TCE). For each triple, two kinds of structure information are considered as its context in the graph; one is the out...

KnOD 2021

Beyond Facts: Online Discourse and Knowledge Graphs A preface to the proceedings of the 1st International Workshop on Knowledge Graphs for Online Discourse Analysis (KnOD 2021, co-located with TheWebConf’21) https://ceur-ws.org/Vol-2877/preface.pdf https://knod2021.wordpress.com/   ABSTRACT Expressing opinions and interacting with others on the Web has led to the production of an abundance of online discourse data, such as claims and viewpoints on controversial topics, their sources and contexts . This data constitutes a valuable source of insights for studies into misinformation spread, bias reinforcement, echo chambers or political agenda setting. While knowledge graphs promise to provide the key to a Web of structured information, they are mainly focused on facts without keeping track of the diversity, connection or temporal evolution of online discourse data. As opposed to facts, claims are inherently more complex. Their interpretation strongly depends on the context and a vari...