Pular para o conteúdo principal

CoKE: Contextualized Knowledge Graph Embedding

 Quan Wang, Pingping Huang, Haifeng Wang, Songtai Dai, Wenbin Jiang, Jing Liu, Yajuan Lyu, Yong Zhu, Hua Wu: CoKE: Contextualized Knowledge Graph Embedding. CoRR abs/1911.02168 (2019)

Abstract

Knowledge graph embedding, which projects symbolic entities and relations into continuous vector spaces, is gaining increasing attention. Previous methods allow a single static embedding for each entity or relation, ignoring their intrinsic contextual nature, i.e., entities and relations may appear in different graph contexts, and accordingly, exhibit different properties.

[O contexto de uma entidade ou relacionamento depende do grafo onde ela aparece e isso afeta o embeddings gerado]

This work presents Contextualized Knowledge Graph Embedding (CoKE), a novel paradigm that takes into account such contextual nature, and learns dynamic, flexible, and fully contextualized entity and relation embeddings. Two types of graph contexts are studied: edges and paths, both formulated as sequences of entities and relations. CoKE takes a sequence as input and uses a Transformer encoder to obtain contextualized representations. These representations are hence naturally adaptive to the input, capturing contextual meanings of entities and relations therein.

[Contexto sintático]

Evaluation on a wide variety of public benchmarks verifies the superiority of CoKE in link prediction and path query answering. It performs consistently better than, or at least equally well as current state-of-the-art in almost every case, in particular offering an absolute improvement of 21.0% in H@10 on path query answering.

Our code is available at -> https://github.com/PaddlePaddle/Research/tree/master/KG/CoKE

Introduction

Current approaches typically learn for each entity or relation a single static representation, to describe its global meaning in a given KG. However, entities and relations rarely appear in isolation. Instead, they form rich, varied graph contexts such as edges, paths, or even subgraphs. We argue that entities and relations, when involved in different graph contexts, might exhibit different meanings, just like words do when they appear in different textual contexts (Peters et al., 2018)

[Também é sintático no sentido de ser estrutura mas no exemplo separa as triplas ligadas a um sujeito por assunto: Contexto Político X Contexto Família. Mas a abordagem não diferencia, talvez o engenheiro possa delimitar o caminho/subgrafo que define o contexto mas não tem um regra ou diretriz]

Two types of graph contexts are considered: edges and paths, both formalized as sequences of entities and relations. Given an input sequence, CoKE employs a stack of Transformer (Vaswani et al., 2017) blocks to encode the input and obtain contextualized representations for its components. The model is then trained by predicting a missing component in the sequence, based on these contextualized representations.

[Como as arestas e os caminhos são selecionados? Random Walks?]

We summarize our contributions as follows: (1) We propose the notion of contextualized KG embedding, which differs from previous paradigms by modeling contextual nature of entities and relations in KGs. (2) We devise a new approach CoKE to learn fully contextualized KG embeddings. We show that CoKE can be naturally applied to a variety of tasks like link prediction and path query answering. (3) Extensive experiments demonstrate the superiority of CoKE. It achieves new state-of-the-art results on a number of public benchmarks.

[As arestas selecionadas poderiam ser as que apresentam qualificadores (chave e/ou chave/valor) de uma determinada dimensão contextual]

Related Work

Beyond triples, recent work tried to use more global graph structures like multi-hop paths (Lin et al., 2015a; Das et al., 2017) and k-degree neighborhoods (Feng et al., 2016; Schlichtkrull et al., 2017) to learn better embeddings. Although such approaches take into account rich graph contexts, they are not “contextualized”, still learning a static global representation for each entity/relation.

[Reforça o comentário anteior, cada entidade poderia ter um embeddings por dimensão contextual ou por chave/valor de cada dimensão]

This work is inspired by recent advances in learning contextualized word representations (McCann et al., 2017; Peters et al., 2018; Devlin et al., 2019), by drawing connections of graph edges/paths to natural language phrases/sentences. Such connections have been studied extensively in graph embedding (Perozzi et al., 2014; Grover and Leskovec, 2016; Ristoski and Paulheim, 2016; Cochez et al., 2017). But most of these approaches obtain static embeddings via traditional word embedding techniques, and fail to capture the contextual nature of entities and relations.

Our Approach

Unlike previous methods that assign a single static representation to each entity/relation learned from the whole KG, CoKE models that representation as a function of each individual graph context, i.e., an edge or a path. Given a graph context as input, CoKE employs Transformer blocks to encode the input and obtain contextualized representations for entities and relations therein. The model is trained by predicting a missing entity in the input, based on these contextualized representations. 

[Mais de um embedding por entidade e relação]

 

 3.1 Problem Formulation

We are given a KG composed of subject-relation-object triples {(s, r, o)}. Each triple indicates a relation r ∈ R between two entities s, o ∈ E, e.g., (BarackObama, HasChild, SashaObama). Here, E is the entity vocabulary and R the relation set. These entities and relations form rich, varied graph contexts. Two types of graph contexts are considered here: edges and paths, both formalized as sequences composed of entities and relations.

Here we follow (Guu et al., 2015) and exclude intermediate entities from paths, by which the paths will get a close relationship with Horn clauses and first-order logic rules (Lao and Cohen, 2010). We leave the investigation of other path forms for future work. Given edges and paths that reveal rich graph structures, the aim of CoKE is to learn entity and relation representations dynamically adaptive to each input graph context.

[Ignoram entidades intermediárias do caminho]

4 Experiments

We demonstrate the effectiveness of CoKE in link prediction and path query answering. We further visualize CoKE embeddings to show how they can discern contextual usage of entities and relations.

Datasets We conduct experiments on four widely used benchmarks. FB15k and WN18 were introduced in (Bordes et al., 2013), with the former sampled from Freebase and the latter from WordNet.

[Path query é para validar se existe ou não um caminho entre as entidades A e B]

5 Conclusion

As future work, we would like to (1) Generalize CoKE to other types of graph contexts beyond edges and paths, e.g., subgraphs of arbitrary forms. (2) Apply CoKE to more downstream tasks, not only those within a given KG, but also those scaling to broader domains.

[A generalização poderia ser através de um subgrafo composto por afirmações da melhor resposta mas ainda seria para a tarefa de completação do grafo]

 

Comentários

Postagens mais visitadas deste blog

Connected Papers: Uma abordagem alternativa para revisão da literatura

Durante um projeto de pesquisa podemos encontrar um artigo que nos identificamos em termos de problema de pesquisa e também de solução. Então surge a vontade de saber como essa área de pesquisa se desenvolveu até chegar a esse ponto ou quais desdobramentos ocorreram a partir dessa solução proposta para identificar o estado da arte nesse tema. Podemos seguir duas abordagens:  realizar uma revisão sistemática usando palavras chaves que melhor caracterizam o tema em bibliotecas digitais de referência para encontrar artigos relacionados ou realizar snowballing ancorado nesse artigo que identificamos previamente, explorando os artigos citados (backward) ou os artigos que o citam (forward)  Mas a ferramenta Connected Papers propõe uma abordagem alternativa para essa busca. O problema inicial é dado um artigo de interesse, precisamos encontrar outros artigos relacionados de "certa forma". Find different methods and approaches to the same subject Track down the state of the art rese...

Knowledge Graphs as a source of trust for LLM-powered enterprise question answering - Leitura de Artigo

J. Sequeda, D. Allemang and B. Jacob, Knowledge Graphs as a source of trust for LLM-powered enterprise question answering, Web Semantics: Science, Services and Agents on the World Wide Web (2025), doi: https://doi.org/10.1016/j.websem.2024.100858. 1. Introduction These question answering systems that enable to chat with your structured data hold tremendous potential for transforming the way self service and data-driven decision making is executed within enterprises. Self service and data-driven decision making in organizations today is largly made through Business Intelligence (BI) and analytics reporting. Data teams gather the original data, integrate the data, build a SQL data warehouse (i.e. star schemas), and create BI dashboards and reports that are then used by business users and analysts to answer specific questions (i.e. metrics, KPIs) and make decisions. The bottleneck of this approach is that business users are only able to answer questions given the views of existing dashboa...

Knowledge Graph Toolkit (KGTK)

https://kgtk.readthedocs.io/en/latest/ KGTK represents KGs using TSV files with 4 columns labeled id, node1, label and node2. The id column is a symbol representing an identifier of an edge, corresponding to the orange circles in the diagram above. node1 represents the source of the edge, node2 represents the destination of the edge, and label represents the relation between node1 and node2. >> Quad do RDF, definir cada tripla como um grafo   KGTK defines knowledge graphs (or more generally any attributed graph or hypergraph ) as a set of nodes and a set of edges between those nodes. KGTK represents everything of meaning via an edge. Edges themselves can be attributed by having edges asserted about them, thus, KGTK can in fact represent arbitrary hypergraphs. KGTK intentionally does not distinguish attributes or qualifiers on nodes and edges from full-fledged edges, tools operating on KGTK graphs can instead interpret edges differently if they so desire. In KGTK, e...