Pular para o conteúdo principal

CoKE: Contextualized Knowledge Graph Embedding

 Quan Wang, Pingping Huang, Haifeng Wang, Songtai Dai, Wenbin Jiang, Jing Liu, Yajuan Lyu, Yong Zhu, Hua Wu: CoKE: Contextualized Knowledge Graph Embedding. CoRR abs/1911.02168 (2019)

Abstract

Knowledge graph embedding, which projects symbolic entities and relations into continuous vector spaces, is gaining increasing attention. Previous methods allow a single static embedding for each entity or relation, ignoring their intrinsic contextual nature, i.e., entities and relations may appear in different graph contexts, and accordingly, exhibit different properties.

[O contexto de uma entidade ou relacionamento depende do grafo onde ela aparece e isso afeta o embeddings gerado]

This work presents Contextualized Knowledge Graph Embedding (CoKE), a novel paradigm that takes into account such contextual nature, and learns dynamic, flexible, and fully contextualized entity and relation embeddings. Two types of graph contexts are studied: edges and paths, both formulated as sequences of entities and relations. CoKE takes a sequence as input and uses a Transformer encoder to obtain contextualized representations. These representations are hence naturally adaptive to the input, capturing contextual meanings of entities and relations therein.

[Contexto sintático]

Evaluation on a wide variety of public benchmarks verifies the superiority of CoKE in link prediction and path query answering. It performs consistently better than, or at least equally well as current state-of-the-art in almost every case, in particular offering an absolute improvement of 21.0% in H@10 on path query answering.

Our code is available at -> https://github.com/PaddlePaddle/Research/tree/master/KG/CoKE

Introduction

Current approaches typically learn for each entity or relation a single static representation, to describe its global meaning in a given KG. However, entities and relations rarely appear in isolation. Instead, they form rich, varied graph contexts such as edges, paths, or even subgraphs. We argue that entities and relations, when involved in different graph contexts, might exhibit different meanings, just like words do when they appear in different textual contexts (Peters et al., 2018)

[Também é sintático no sentido de ser estrutura mas no exemplo separa as triplas ligadas a um sujeito por assunto: Contexto Político X Contexto Família. Mas a abordagem não diferencia, talvez o engenheiro possa delimitar o caminho/subgrafo que define o contexto mas não tem um regra ou diretriz]

Two types of graph contexts are considered: edges and paths, both formalized as sequences of entities and relations. Given an input sequence, CoKE employs a stack of Transformer (Vaswani et al., 2017) blocks to encode the input and obtain contextualized representations for its components. The model is then trained by predicting a missing component in the sequence, based on these contextualized representations.

[Como as arestas e os caminhos são selecionados? Random Walks?]

We summarize our contributions as follows: (1) We propose the notion of contextualized KG embedding, which differs from previous paradigms by modeling contextual nature of entities and relations in KGs. (2) We devise a new approach CoKE to learn fully contextualized KG embeddings. We show that CoKE can be naturally applied to a variety of tasks like link prediction and path query answering. (3) Extensive experiments demonstrate the superiority of CoKE. It achieves new state-of-the-art results on a number of public benchmarks.

[As arestas selecionadas poderiam ser as que apresentam qualificadores (chave e/ou chave/valor) de uma determinada dimensão contextual]

Related Work

Beyond triples, recent work tried to use more global graph structures like multi-hop paths (Lin et al., 2015a; Das et al., 2017) and k-degree neighborhoods (Feng et al., 2016; Schlichtkrull et al., 2017) to learn better embeddings. Although such approaches take into account rich graph contexts, they are not “contextualized”, still learning a static global representation for each entity/relation.

[Reforça o comentário anteior, cada entidade poderia ter um embeddings por dimensão contextual ou por chave/valor de cada dimensão]

This work is inspired by recent advances in learning contextualized word representations (McCann et al., 2017; Peters et al., 2018; Devlin et al., 2019), by drawing connections of graph edges/paths to natural language phrases/sentences. Such connections have been studied extensively in graph embedding (Perozzi et al., 2014; Grover and Leskovec, 2016; Ristoski and Paulheim, 2016; Cochez et al., 2017). But most of these approaches obtain static embeddings via traditional word embedding techniques, and fail to capture the contextual nature of entities and relations.

Our Approach

Unlike previous methods that assign a single static representation to each entity/relation learned from the whole KG, CoKE models that representation as a function of each individual graph context, i.e., an edge or a path. Given a graph context as input, CoKE employs Transformer blocks to encode the input and obtain contextualized representations for entities and relations therein. The model is trained by predicting a missing entity in the input, based on these contextualized representations. 

[Mais de um embedding por entidade e relação]

 

 3.1 Problem Formulation

We are given a KG composed of subject-relation-object triples {(s, r, o)}. Each triple indicates a relation r ∈ R between two entities s, o ∈ E, e.g., (BarackObama, HasChild, SashaObama). Here, E is the entity vocabulary and R the relation set. These entities and relations form rich, varied graph contexts. Two types of graph contexts are considered here: edges and paths, both formalized as sequences composed of entities and relations.

Here we follow (Guu et al., 2015) and exclude intermediate entities from paths, by which the paths will get a close relationship with Horn clauses and first-order logic rules (Lao and Cohen, 2010). We leave the investigation of other path forms for future work. Given edges and paths that reveal rich graph structures, the aim of CoKE is to learn entity and relation representations dynamically adaptive to each input graph context.

[Ignoram entidades intermediárias do caminho]

4 Experiments

We demonstrate the effectiveness of CoKE in link prediction and path query answering. We further visualize CoKE embeddings to show how they can discern contextual usage of entities and relations.

Datasets We conduct experiments on four widely used benchmarks. FB15k and WN18 were introduced in (Bordes et al., 2013), with the former sampled from Freebase and the latter from WordNet.

[Path query é para validar se existe ou não um caminho entre as entidades A e B]

5 Conclusion

As future work, we would like to (1) Generalize CoKE to other types of graph contexts beyond edges and paths, e.g., subgraphs of arbitrary forms. (2) Apply CoKE to more downstream tasks, not only those within a given KG, but also those scaling to broader domains.

[A generalização poderia ser através de um subgrafo composto por afirmações da melhor resposta mas ainda seria para a tarefa de completação do grafo]

 

Comentários

Postagens mais visitadas deste blog

Connected Papers: Uma abordagem alternativa para revisão da literatura

Durante um projeto de pesquisa podemos encontrar um artigo que nos identificamos em termos de problema de pesquisa e também de solução. Então surge a vontade de saber como essa área de pesquisa se desenvolveu até chegar a esse ponto ou quais desdobramentos ocorreram a partir dessa solução proposta para identificar o estado da arte nesse tema. Podemos seguir duas abordagens:  realizar uma revisão sistemática usando palavras chaves que melhor caracterizam o tema em bibliotecas digitais de referência para encontrar artigos relacionados ou realizar snowballing ancorado nesse artigo que identificamos previamente, explorando os artigos citados (backward) ou os artigos que o citam (forward)  Mas a ferramenta Connected Papers propõe uma abordagem alternativa para essa busca. O problema inicial é dado um artigo de interesse, precisamos encontrar outros artigos relacionados de "certa forma". Find different methods and approaches to the same subject Track down the state of the art rese...

Knowledge Graph Embedding with Triple Context - Leitura de Abstract

  Jun Shi, Huan Gao, Guilin Qi, and Zhangquan Zhou. 2017. Knowledge Graph Embedding with Triple Context. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM '17). Association for Computing Machinery, New York, NY, USA, 2299–2302. https://doi.org/10.1145/3132847.3133119 ABSTRACT Knowledge graph embedding, which aims to represent entities and relations in vector spaces, has shown outstanding performance on a few knowledge graph completion tasks. Most existing methods are based on the assumption that a knowledge graph is a set of separate triples, ignoring rich graph features, i.e., structural information in the graph. In this paper, we take advantages of structures in knowledge graphs, especially local structures around a triple, which we refer to as triple context. We then propose a Triple-Context-based knowledge Embedding model (TCE). For each triple, two kinds of structure information are considered as its context in the graph; one is the out...

KnOD 2021

Beyond Facts: Online Discourse and Knowledge Graphs A preface to the proceedings of the 1st International Workshop on Knowledge Graphs for Online Discourse Analysis (KnOD 2021, co-located with TheWebConf’21) https://ceur-ws.org/Vol-2877/preface.pdf https://knod2021.wordpress.com/   ABSTRACT Expressing opinions and interacting with others on the Web has led to the production of an abundance of online discourse data, such as claims and viewpoints on controversial topics, their sources and contexts . This data constitutes a valuable source of insights for studies into misinformation spread, bias reinforcement, echo chambers or political agenda setting. While knowledge graphs promise to provide the key to a Web of structured information, they are mainly focused on facts without keeping track of the diversity, connection or temporal evolution of online discourse data. As opposed to facts, claims are inherently more complex. Their interpretation strongly depends on the context and a vari...