Pular para o conteúdo principal

Embedding Logical Queries on Knowledge Graphs - Leitura de Artigo

William L. Hamilton, Payal Bajaj, Marinka Zitnik, Dan Jurafsky, Jure Leskovec: Embedding Logical Queries on Knowledge Graphs. NeurIPS 2018: 2030-2041

Abstract

Learning low-dimensional embeddings of knowledge graphs is a powerful approach used to predict unobserved or missing edges between entities. However, an open challenge in this area is developing techniques that can go beyond simple edge prediction and handle more complex logical queries, which might involve multiple unobserved edges, entities, and variables.

[Link Prediction é a tarefa mais comum em GRL, é uma query do tipo <s, p, ?o> ou <s, ?p, o> ou <?s, p, o>, ou seja, Look up ou Existe <s, p, o> (ASK)]

For instance, given an incomplete biological knowledge graph, we might want to predict what drugs are likely to target proteins involved with both diseases X and Y? —a query that requires reasoning about all possible proteins that might interact with diseases X and Y.

[Query conjuntiva, BGP com join ... <?d, target, ?p> <?p, involvedIn, X> <?p, involvedIn, Y> nesse caso uma star-join em ?p]

Here we introduce a framework to efficiently make predictions about conjunctive logical queries—a flexible but tractable subset of first-order logic—on incomplete knowledge graphs. In our approach, we embed graph nodes in a low-dimensional space and represent logical operators as learned geometric operations (e.g., translation, rotation) in this embedding space.

[representa os nós como embeddings]

We demonstrate the utility of this framework in two application studies on real-world datasets with millions of relations: predicting logical relationships in a network of drug-gene-disease interactions and in a graph-based representation of social interactions derived from a popular web forum.

[Dois datasets, somente V e E, não tem P, Q, L?]

1 Introduction

... are conjunctive queries, which correspond to the subset of first-order logic using only the conjunction and existential quantification operators [1 ]. In terms of graph structure, conjunctive queries allow one to reason about the existence of subgraph relationships between sets of nodes, which makes conjunctive queries a natural focus for knowledge graph applications.

Valid answers to such a query correspond to subgraphs.

For instance, a naive approach to make predictions about conjunctive queries would be the following: First, one would run an edge prediction model on all possible pairs of nodes, and—after obtaining these edge likelihoods—one would enumerate and score all candidate subgraphs that might satisfy a query. However, this naive enumeration approach could require computation time that is exponential in the number of existentially quantified (i.e., bound) variables in the query.

[Fazer na força bruta é computacionalmente complexo]

2 Related Work

Logical reasoning and knowledge graphs. Recent years have seen significant progress in using machine learning to reason with relational data .... However, existing work in this area primarily focuses on using logical reasoning to improve edge prediction in knowledge graphs .... In contrast, we seek to directly make predictions about conjunctive logical queries. Another well-studied thread in this space involves leveraging knowledge graphs to improve natural language question answering (QA) ... we focus on queries that are in logical form

[CGQ e não usa NLP. Também não foco em Q&A mas foco em Exploratory Search com consultas especificadas em forma lógica como sub-grafos]

Probabilistic databases. ... The primary distinction between our work and probabilistic databases is the following: Whereas probabilistic databases take a database containing probabilistic facts and score queries, we seek to predict unobserved logical relationships in a knowledge graph. Concretely, a distinguishing challenge in our setting is that while we are given a set of known edge relationships (i.e., facts), all missing edge relationships could possibly be true.

[DBs são CWA enquanto KB/KG são OWA]

3 Background and Preliminaries

We consider knowledge graphs (or heterogeneous networks) G = (V, E) that consists of nodes v from V and directed edges e from E of various types.

[Não tem qualificadores e nem literais]

3.1 Conjunctive graph queries

In this work we seek to make predictions about conjunctive graph queries (Figure 1). Specifically, the queries q from Q(G) that we consider can be written as:

[Potencial uso de referência para especificar as queries que são feitas para exploração]

In general, we define the dependency graph of a query q as the graph with edges Eq = {e1, ..., en} formed between the anchor nodes v1, ..., vk and the variable nodes V?, V1, ..., Vm (Figure 1). For a query to be valid, its dependency graph must be a directed acyclic graph (DAG), with the anchor nodes as the source nodes of the DAG and the query target as the unique sink node. The DAG structure ensures that there are no contradictions or redundancies. Note that there is an important distinction between the query DAG, which contains variables, and a subgraph structure in the knowledge graph that satisfies this query, i.e., a concrete assignment of the query variables (see Figure 1). For instance, it is possible for a query DAG to be satisfied by a subgraph that contains cycles, e.g., by having two bound variables evaluate to the same node.

[A consulta contém variáveis, o sub-grafo de resultado tem o BIND dessas variáveis em nós e arestas. Mas se o KG é incompleto esse BIND pode não ocorrer com simples GQL. A melhor resposta por enquanto só está usando GQL e prefere não responder se a GQL falhar]

4 Proposed Approach

The key idea behind our approach is that we learn how to embed any conjunctive graph query into a low-dimensional space. This is achieved by representing logical query operations as geometric operators that are jointly optimized on a low-dimensional embedding space along with a set of node embeddings

[Operações de projeção e interseção (join)]

Query inference using P and I. Given the geometric projection operator P (Equation 4) and the geometric intersection operator I (Equation 5) we can use Algorithm 1 to efficiently generate an embedding q that corresponds to any DAG-structured conjunctive query q on the network. To generate a query embedding, we start by projecting the anchor node embeddings according to their outgoing edges; then if a node has more than one incoming edge in the query DAG, we use the intersection operation to aggregate the incoming information, and we repeat this process as necessary until we reach the target variable of the query.

[Começa com as variáveis ou constantes de partida ou chegada e não os intermediários]

In the end, Algorithm 1 generates an embedding q of a query in O(d2E) operations, where d is the embedding dimension and E is the number of edges in the query DAG. Using the generated embedding q we can predict nodes that are likely to satisfy this query by doing a nearest neighbor search in the embedding space. Moreover, since the set of nodes is known in advance, this nearest neighbor search can be made highly efficient (i.e., sublinear in |V|) using locality sensitive hashing, at a small approximation cost [21].

[O grafo seria incompleto de arestas mas não de nós]



4.1 Theoretical analysis

Formally, we can show that in an ideal setting Algorithm 1 can exactly answer any conjunctive query on a network. This provides an equivalence between conjunctive queries on a network and sequences of geometric projection and intersection operations in an embedding space

[Prova por Teorema]

4.2 Node embeddings

In principle any efficient differentiable algorithm that generates node embeddings can be used as the base of our query embeddings.

[QQ abordagem para node embeddings pq o diferencial está no algoritmo e nas operações]

5 Experiments

We run experiments on the biological interaction (Bio) and Reddit datasets (Figure 2). Code and data is available at https://github.com/williamleif/graphqembed.

5.2 Dataset of train and test queries

To test our approach, we sample sets of train/test queries from a knowledge graph, i.e., pairs (q, ?v), where q is a query and ?v is a node that satisfies this query. In our sampling scheme, we sample a fixed number of example queries for each possible query DAG stru... but we ensure that the test query examples are not directly answerable in the training graph.

[Seleção randomica de DAGs mas que não seriam diretamente respondidos pelo grafo de treinamento com GQL]

[Em vermelho o nó de resposta da query, em cinza os nós de partida que são variáveis ou constantes e os nós intermediários que são variáveis]

6 Conclusion

We proposed a framework to embed conjunctive graph queries, demonstrating how to map a practical subset of logic to efficient geometric operations in an embedding space. Our experiments showed that our approach can make accurate predictions on real-world data with millions of relations. Of course, there are limitations of our framework: for instance, it cannot handle logical negation or disjunction, and we also do not consider features on edge.

[Features nas arestas seriam os qualificadores ou é a predição de arestas podendo usar variáveis nos relacionamentos?]

@inproceedings{DBLP:conf/nips/HamiltonBZJL18,
  author    = {William L. Hamilton and
               Payal Bajaj and
               Marinka Zitnik and
               Dan Jurafsky and
               Jure Leskovec},
  editor    = {Samy Bengio and
               Hanna M. Wallach and
               Hugo Larochelle and
               Kristen Grauman and
               Nicol{\`{o}} Cesa{-}Bianchi and
               Roman Garnett},
  title     = {Embedding Logical Queries on Knowledge Graphs},
  booktitle = {Advances in Neural Information Processing Systems 31: Annual Conference
               on Neural Information Processing Systems 2018, NeurIPS 2018, December
               3-8, 2018, Montr{\'{e}}al, Canada},
  pages     = {2030--2041},
  year      = {2018},
  url       = {https://proceedings.neurips.cc/paper/2018/hash/ef50c335cca9f340bde656363ebd02fd-Abstract.html},
  timestamp = {Mon, 16 May 2022 15:41:51 +0200},
  biburl    = {https://dblp.org/rec/conf/nips/HamiltonBZJL18.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Comentários

Postagens mais visitadas deste blog

Connected Papers: Uma abordagem alternativa para revisão da literatura

Durante um projeto de pesquisa podemos encontrar um artigo que nos identificamos em termos de problema de pesquisa e também de solução. Então surge a vontade de saber como essa área de pesquisa se desenvolveu até chegar a esse ponto ou quais desdobramentos ocorreram a partir dessa solução proposta para identificar o estado da arte nesse tema. Podemos seguir duas abordagens:  realizar uma revisão sistemática usando palavras chaves que melhor caracterizam o tema em bibliotecas digitais de referência para encontrar artigos relacionados ou realizar snowballing ancorado nesse artigo que identificamos previamente, explorando os artigos citados (backward) ou os artigos que o citam (forward)  Mas a ferramenta Connected Papers propõe uma abordagem alternativa para essa busca. O problema inicial é dado um artigo de interesse, precisamos encontrar outros artigos relacionados de "certa forma". Find different methods and approaches to the same subject Track down the state of the art rese...

Knowledge Graphs as a source of trust for LLM-powered enterprise question answering - Leitura de Artigo

J. Sequeda, D. Allemang and B. Jacob, Knowledge Graphs as a source of trust for LLM-powered enterprise question answering, Web Semantics: Science, Services and Agents on the World Wide Web (2025), doi: https://doi.org/10.1016/j.websem.2024.100858. 1. Introduction These question answering systems that enable to chat with your structured data hold tremendous potential for transforming the way self service and data-driven decision making is executed within enterprises. Self service and data-driven decision making in organizations today is largly made through Business Intelligence (BI) and analytics reporting. Data teams gather the original data, integrate the data, build a SQL data warehouse (i.e. star schemas), and create BI dashboards and reports that are then used by business users and analysts to answer specific questions (i.e. metrics, KPIs) and make decisions. The bottleneck of this approach is that business users are only able to answer questions given the views of existing dashboa...