Pular para o conteúdo principal

CoKE: Contextualized Knowledge Graph Embedding

 Quan Wang, Pingping Huang, Haifeng Wang, Songtai Dai, Wenbin Jiang, Jing Liu, Yajuan Lyu, Yong Zhu, Hua Wu: CoKE: Contextualized Knowledge Graph Embedding. CoRR abs/1911.02168 (2019)

Abstract

Knowledge graph embedding, which projects symbolic entities and relations into continuous vector spaces, is gaining increasing attention. Previous methods allow a single static embedding for each entity or relation, ignoring their intrinsic contextual nature, i.e., entities and relations may appear in different graph contexts, and accordingly, exhibit different properties.

[O contexto de uma entidade ou relacionamento depende do grafo onde ela aparece e isso afeta o embeddings gerado]

This work presents Contextualized Knowledge Graph Embedding (CoKE), a novel paradigm that takes into account such contextual nature, and learns dynamic, flexible, and fully contextualized entity and relation embeddings. Two types of graph contexts are studied: edges and paths, both formulated as sequences of entities and relations. CoKE takes a sequence as input and uses a Transformer encoder to obtain contextualized representations. These representations are hence naturally adaptive to the input, capturing contextual meanings of entities and relations therein.

[Contexto sintático]

Evaluation on a wide variety of public benchmarks verifies the superiority of CoKE in link prediction and path query answering. It performs consistently better than, or at least equally well as current state-of-the-art in almost every case, in particular offering an absolute improvement of 21.0% in H@10 on path query answering.

Our code is available at -> https://github.com/PaddlePaddle/Research/tree/master/KG/CoKE

Introduction

Current approaches typically learn for each entity or relation a single static representation, to describe its global meaning in a given KG. However, entities and relations rarely appear in isolation. Instead, they form rich, varied graph contexts such as edges, paths, or even subgraphs. We argue that entities and relations, when involved in different graph contexts, might exhibit different meanings, just like words do when they appear in different textual contexts (Peters et al., 2018)

[Também é sintático no sentido de ser estrutura mas no exemplo separa as triplas ligadas a um sujeito por assunto: Contexto Político X Contexto Família. Mas a abordagem não diferencia, talvez o engenheiro possa delimitar o caminho/subgrafo que define o contexto mas não tem um regra ou diretriz]

Two types of graph contexts are considered: edges and paths, both formalized as sequences of entities and relations. Given an input sequence, CoKE employs a stack of Transformer (Vaswani et al., 2017) blocks to encode the input and obtain contextualized representations for its components. The model is then trained by predicting a missing component in the sequence, based on these contextualized representations.

[Como as arestas e os caminhos são selecionados? Random Walks?]

We summarize our contributions as follows: (1) We propose the notion of contextualized KG embedding, which differs from previous paradigms by modeling contextual nature of entities and relations in KGs. (2) We devise a new approach CoKE to learn fully contextualized KG embeddings. We show that CoKE can be naturally applied to a variety of tasks like link prediction and path query answering. (3) Extensive experiments demonstrate the superiority of CoKE. It achieves new state-of-the-art results on a number of public benchmarks.

[As arestas selecionadas poderiam ser as que apresentam qualificadores (chave e/ou chave/valor) de uma determinada dimensão contextual]

Related Work

Beyond triples, recent work tried to use more global graph structures like multi-hop paths (Lin et al., 2015a; Das et al., 2017) and k-degree neighborhoods (Feng et al., 2016; Schlichtkrull et al., 2017) to learn better embeddings. Although such approaches take into account rich graph contexts, they are not “contextualized”, still learning a static global representation for each entity/relation.

[Reforça o comentário anteior, cada entidade poderia ter um embeddings por dimensão contextual ou por chave/valor de cada dimensão]

This work is inspired by recent advances in learning contextualized word representations (McCann et al., 2017; Peters et al., 2018; Devlin et al., 2019), by drawing connections of graph edges/paths to natural language phrases/sentences. Such connections have been studied extensively in graph embedding (Perozzi et al., 2014; Grover and Leskovec, 2016; Ristoski and Paulheim, 2016; Cochez et al., 2017). But most of these approaches obtain static embeddings via traditional word embedding techniques, and fail to capture the contextual nature of entities and relations.

Our Approach

Unlike previous methods that assign a single static representation to each entity/relation learned from the whole KG, CoKE models that representation as a function of each individual graph context, i.e., an edge or a path. Given a graph context as input, CoKE employs Transformer blocks to encode the input and obtain contextualized representations for entities and relations therein. The model is trained by predicting a missing entity in the input, based on these contextualized representations. 

[Mais de um embedding por entidade e relação]

 

 3.1 Problem Formulation

We are given a KG composed of subject-relation-object triples {(s, r, o)}. Each triple indicates a relation r ∈ R between two entities s, o ∈ E, e.g., (BarackObama, HasChild, SashaObama). Here, E is the entity vocabulary and R the relation set. These entities and relations form rich, varied graph contexts. Two types of graph contexts are considered here: edges and paths, both formalized as sequences composed of entities and relations.

Here we follow (Guu et al., 2015) and exclude intermediate entities from paths, by which the paths will get a close relationship with Horn clauses and first-order logic rules (Lao and Cohen, 2010). We leave the investigation of other path forms for future work. Given edges and paths that reveal rich graph structures, the aim of CoKE is to learn entity and relation representations dynamically adaptive to each input graph context.

[Ignoram entidades intermediárias do caminho]

4 Experiments

We demonstrate the effectiveness of CoKE in link prediction and path query answering. We further visualize CoKE embeddings to show how they can discern contextual usage of entities and relations.

Datasets We conduct experiments on four widely used benchmarks. FB15k and WN18 were introduced in (Bordes et al., 2013), with the former sampled from Freebase and the latter from WordNet.

[Path query é para validar se existe ou não um caminho entre as entidades A e B]

5 Conclusion

As future work, we would like to (1) Generalize CoKE to other types of graph contexts beyond edges and paths, e.g., subgraphs of arbitrary forms. (2) Apply CoKE to more downstream tasks, not only those within a given KG, but also those scaling to broader domains.

[A generalização poderia ser através de um subgrafo composto por afirmações da melhor resposta mas ainda seria para a tarefa de completação do grafo]

 

Comentários

Postagens mais visitadas deste blog

Aula 12: WordNet | Introdução à Linguagem de Programação Python *** com NLTK

 Fonte -> https://youtu.be/0OCq31jQ9E4 A WordNet do Brasil -> http://www.nilc.icmc.usp.br/wordnetbr/ NLTK  synsets = dada uma palavra acha todos os significados, pode informar a língua e a classe gramatical da palavra (substantivo, verbo, advérbio) from nltk.corpus import wordnet as wn wordnet.synset(xxxxxx).definition() = descrição do significado É possível extrair hipernimia, hiponimia, antonimos e os lemas (diferentes palavras/expressões com o mesmo significado) formando uma REDE LEXICAL. Com isso é possível calcular a distância entre 2 synset dentro do grafo.  Veja trecho de código abaixo: texto = 'útil' print('NOUN:', wordnet.synsets(texto, lang='por', pos=wordnet.NOUN)) texto = 'útil' print('ADJ:', wordnet.synsets(texto, lang='por', pos=wordnet.ADJ)) print(wordnet.synset('handy.s.01').definition()) texto = 'computador' for synset in wn.synsets(texto, lang='por', pos=wn.NOUN):     print('DEF:',s

truth makers AND truth bearers - Palestra Giancarlo no SBBD

Dando uma googada https://iep.utm.edu/truth/ There are two commonly accepted constraints on truth and falsehood:     Every proposition is true or false.         [Law of the Excluded Middle.]     No proposition is both true and false.         [Law of Non-contradiction.] What is the difference between a truth-maker and a truth bearer? Truth-bearers are either true or false; truth-makers are not since, not being representations, they cannot be said to be true, nor can they be said to be false . That's a second difference. Truth-bearers are 'bipolar,' either true or false; truth-makers are 'unipolar': all of them obtain. What are considered truth bearers?   A variety of truth bearers are considered – statements, beliefs, claims, assumptions, hypotheses, propositions, sentences, and utterances . When I speak of a fact . . . I mean the kind of thing that makes a proposition true or false. (Russell, 1972, p. 36.) “Truthmaker theories” hold that in order for any truthbe

DGL-KE : Deep Graph Library (DGL)

Fonte: https://towardsdatascience.com/introduction-to-knowledge-graph-embedding-with-dgl-ke-77ace6fb60ef Amazon recently launched DGL-KE, a software package that simplifies this process with simple command-line scripts. With DGL-KE , users can generate embeddings for very large graphs 2–5x faster than competing techniques. DGL-KE provides users the flexibility to select models used to generate embeddings and optimize performance by configuring hardware, data sampling parameters, and the loss function. To use this package effectively, however, it is important to understand how embeddings work and the optimizations available to compute them. This two-part blog series is designed to provide this information and get you ready to start taking advantage of DGL-KE . Finally, another class of graphs that is especially important for knowledge graphs are multigraphs . These are graphs that can have multiple (directed) edges between the same pair of nodes and can also contain loops. The