Quan Wang, Pingping Huang, Haifeng Wang, Songtai Dai, Wenbin Jiang, Jing Liu, Yajuan Lyu, Yong Zhu, Hua Wu: CoKE: Contextualized Knowledge Graph Embedding. CoRR abs/1911.02168 (2019)
Abstract
Knowledge graph embedding, which projects symbolic entities and relations into continuous vector spaces, is gaining increasing attention. Previous methods allow a single static embedding for each entity or relation, ignoring their intrinsic contextual nature, i.e., entities and relations may appear in different graph contexts, and accordingly, exhibit different properties.
[O contexto de uma entidade ou relacionamento depende do grafo onde ela aparece e isso afeta o embeddings gerado]
This work presents Contextualized Knowledge Graph Embedding (CoKE), a novel paradigm that takes into account such contextual nature, and learns dynamic, flexible, and fully contextualized entity and relation embeddings. Two types of graph contexts are studied: edges and paths, both formulated as sequences of entities and relations. CoKE takes a sequence as input and uses a Transformer encoder to obtain contextualized representations. These representations are hence naturally adaptive to the input, capturing contextual meanings of entities and relations therein.
[Contexto sintático]
Evaluation on a wide variety of public benchmarks verifies the superiority of CoKE in link prediction and path query answering. It performs consistently better than, or at least equally well as current state-of-the-art in almost every case, in particular offering an absolute improvement of 21.0% in H@10 on path query answering.
Our code is available at -> https://github.com/PaddlePaddle/Research/tree/master/KG/CoKE
Introduction
Current approaches typically learn for each entity or relation a single static representation, to describe its global meaning in a given KG. However, entities and relations rarely appear in isolation. Instead, they form rich, varied graph contexts such as edges, paths, or even subgraphs. We argue that entities and relations, when involved in different graph contexts, might exhibit different meanings, just like words do when they appear in different textual contexts (Peters et al., 2018)
[Também é sintático no sentido de ser estrutura mas no exemplo separa as triplas ligadas a um sujeito por assunto: Contexto Político X Contexto Família. Mas a abordagem não diferencia, talvez o engenheiro possa delimitar o caminho/subgrafo que define o contexto mas não tem um regra ou diretriz]
Two types of graph contexts are considered: edges and paths, both formalized as sequences of entities and relations. Given an input sequence, CoKE employs a stack of Transformer (Vaswani et al., 2017) blocks to encode the input and obtain contextualized representations for its components. The model is then trained by predicting a missing component in the sequence, based on these contextualized representations.
[Como as arestas e os caminhos são selecionados? Random Walks?]
We summarize our contributions as follows: (1) We propose the notion of contextualized KG embedding, which differs from previous paradigms by modeling contextual nature of entities and relations in KGs. (2) We devise a new approach CoKE to learn fully contextualized KG embeddings. We show that CoKE can be naturally applied to a variety of tasks like link prediction and path query answering. (3) Extensive experiments demonstrate the superiority of CoKE. It achieves new state-of-the-art results on a number of public benchmarks.
[As arestas selecionadas poderiam ser as que apresentam qualificadores (chave e/ou chave/valor) de uma determinada dimensão contextual]
Related Work
Beyond triples, recent work tried to use more global graph structures like multi-hop paths (Lin et al., 2015a; Das et al., 2017) and k-degree neighborhoods (Feng et al., 2016; Schlichtkrull et al., 2017) to learn better embeddings. Although such approaches take into account rich graph contexts, they are not “contextualized”, still learning a static global representation for each entity/relation.
[Reforça o comentário anteior, cada entidade poderia ter um embeddings por dimensão contextual ou por chave/valor de cada dimensão]
This work is inspired by recent advances in learning contextualized word representations (McCann et al., 2017; Peters et al., 2018; Devlin et al., 2019), by drawing connections of graph edges/paths to natural language phrases/sentences. Such connections have been studied extensively in graph embedding (Perozzi et al., 2014; Grover and Leskovec, 2016; Ristoski and Paulheim, 2016; Cochez et al., 2017). But most of these approaches obtain static embeddings via traditional word embedding techniques, and fail to capture the contextual nature of entities and relations.
Our Approach
Unlike previous methods that assign a single static representation to each entity/relation learned from the whole KG, CoKE models that representation as a function of each individual graph context, i.e., an edge or a path. Given a graph context as input, CoKE employs Transformer blocks to encode the input and obtain contextualized representations for entities and relations therein. The model is trained by predicting a missing entity in the input, based on these contextualized representations.
[Mais de um embedding por entidade e relação]
3.1 Problem Formulation
We are given a KG composed of subject-relation-object triples {(s, r, o)}. Each triple indicates a relation r ∈ R between two entities s, o ∈ E, e.g., (BarackObama, HasChild, SashaObama). Here, E is the entity vocabulary and R the relation set. These entities and relations form rich, varied graph contexts. Two types of graph contexts are considered here: edges and paths, both formalized as sequences composed of entities and relations.
Here we follow (Guu et al., 2015) and exclude intermediate entities from paths, by which the paths will get a close relationship with Horn clauses and first-order logic rules (Lao and Cohen, 2010). We leave the investigation of other path forms for future work. Given edges and paths that reveal rich graph structures, the aim of CoKE is to learn entity and relation representations dynamically adaptive to each input graph context.
[Ignoram entidades intermediárias do caminho]
4 Experiments
We demonstrate the effectiveness of CoKE in link prediction and path query answering. We further visualize CoKE embeddings to show how they can discern contextual usage of entities and relations.
Datasets We conduct experiments on four widely used benchmarks. FB15k and WN18 were introduced in (Bordes et al., 2013), with the former sampled from Freebase and the latter from WordNet.
[Path query é para validar se existe ou não um caminho entre as entidades A e B]
5 Conclusion
As future work, we would like to (1) Generalize CoKE to other types of graph contexts beyond edges and paths, e.g., subgraphs of arbitrary forms. (2) Apply CoKE to more downstream tasks, not only those within a given KG, but also those scaling to broader domains.
[A generalização poderia ser através de um subgrafo composto por afirmações da melhor resposta mas ainda seria para a tarefa de completação do grafo]
Comentários
Postar um comentário
Sinta-se a vontade para comentar. Críticas construtivas são sempre bem vindas.