Graph-Query Suggestions for Knowledge Graph Exploration

Graph-Query Suggestions for Knowledge Graph Exploration - Leitura de Artigo

Matteo Lissandrini, Davide Mottin, Themis Palpanas, and Yannis Velegrakis. 2020. Graph-Query Suggestions for Knowledge Graph Exploration. Proceedings of The Web Conference 2020. Association for Computing Machinery, New York, NY, USA, 2549–2555. https://doi.org/10.1145/3366423.3380005

ABSTRACT

We consider the task of exploratory search through graph queries on knowledge graphs.

[Busca Exploratória: interativo, consulta de entrada (exemplo de resultado ou formulada pelo usuário), resposta como subgrafo]

We propose to assist the user by expanding the query with intuitive suggestions to provide a more informative (full) query that can retrieve more detailed and relevant answers. To achieve this result, we propose a model that can bridge graph search paradigms with well-established techniques for information-retrieval. Our approach does not require any additional knowledge from the user and builds on principled language modelling approaches.

[Usuário não precisa saber uma GQL]

We empirically show the effectiveness and efficiency of our approach on a large knowledge graph and how our suggestions are able to help build more complete and informative queries.

[Eficiência e Eficácia]

1 INTRODUCTION

One useful exploratory query paradigm in knowledge graphs is exemplar queries. As opposed to traditional query answering, in which the query is a set of specifications for the elements of interest, an exemplar query is an example from the many elements of interest. As such, the answers are not the structures that comply exactly to the query-graph, but elements that have a similar structure to the one in the user input. For instance, in the query ⟨A. Kleiner, supervised, A. Einstein⟩, one answer can be ⟨D. Sciama, supervised, S. Hawking⟩.

[Query by example, a entrada não é uma query e sim exemplos de resultados, pode usar embeddings para obter os subgrafos semelhantes]

We are the first to propose an interactive graph-query expansion for graph exemplar queries on knowledge graphs. Interactive query expansion refers to the problem of retrieving a set of suggestions and the user selecting one of them. .... In our case, the user provides a partial graph exemplar query and the system responds with the k most relevant expansions to complement such query.

[Expansão de consulta com os K mais semelhantes de modo interativo. Não estou seguindo a abordagem interativo mas sim iterativo (ao invés do sistema perguntar, o usuário reformula a consulta)]

Recent work has proposed the contextualization of knowledge graph facts [43] as a supervised learning problem which requires complex training and manually labelled data and does not offer a model for query expansion compatible with the classical IR models.

[O que se entende por contextualização em [43]? ]

In this work, we present a novel approach to suggest query expansions for graph-queries. Graph-query expansions are edges that can be added to the current query. Our model expands IR methods based on pseudo-relevance feedback and language-models by including structural information described by the query and answer graphs through neighbour edge-labels.

[A recuperação da melhor resposta também acrescenta arestas a consulta original a medida que o contexto das afirmações é obrigatoriamente parte da resposta ]

Graph-query expansions are then ranked and proposed to the user. Our approach does not pose restrictions on the structure of the knowledge graph nor requires labeled data (e.g., query logs) as in previous methods. The simplicity of the model offers much higher flexibility and efficiency to run in massive graphs like Freebase.

[Poderia ser aplicado em hiper relacional? Na Wikidata?]

2 RELATED WORK

Note that oftentimes in exploratory search, the user is not able to express their information needs in clear terms. Hence we cannot rely on the user to provide a precise description of the graph structures they are interested in.

[Dificuldade de expressar a necessidade de informação em uma consulta]

Query Expansion and Suggestion. Query expansion improves document search by including additional terms in the user’s query either automatically (implicitly), or interactively.

[De modo automático pode obter resultados que não atendam, de modo interativo depende de feedback do usuário para aumentar a precisão]

Exploratory search in knowledge graphs. Exploratory methods overcome the rigidity of declarative languages in graphs, such as SPARQL. Entity search allows for automatic completion of a set of seed entities (persons, organizations, places). Such an expanded list of entities can complement an ambiguous user query or provide explanations of the original seed entities. However, these methods do not interact with the users towards their intended answers, nor do they provide any query construction mechanism.

[Métodos para completar lista de entidades]

Exemplar queries allows users to specify a representative of the results of an unknown query; the algorithm then discovers and returns the other results. Similarly, Graph Query by Example (GQBE) [18] extends the idea to multiple exemplar entity tuples. Example-based queries, albeit expressive, do not interact with the user and require the input of a full example (a subgraph or a tuple)

[O exemplo precisa ser pelo menos uma tupla completa, não partem de um nó ou um rótulo de aresta]

Graph navigation systems. User interfaces for query formulation either (i) assume the user can navigate through the entire graph structure (often ranking suggestions in decreasing frequency order), but do not restrict the potentially large number of expansions, or (ii) rely on heuristic approaches based on query logs.

[Interfaces de navegação em grafos, a usabilidade e a destreza do usuário podem fazer diferença]

Similarly, faceted search allows smart filtering of large result sets along different attributes. Faceted search proposes no preferred suggestion, leaving the user unassisted.

[Busca facetada ajuda a filtrar mas o usuário tem que ter algum conhecimento sobre os dados para aplicar filtros que funcionem]

Knowledge graph fact contextualization augments a given knowledge graph fact (edge) with additional facts that help the user understand its context. A supervised fact contextualization method (NFCM) trains a neural network on hand-crafted features and human-annotated data enriched with meta-data extracted from Wikipedia. This supervised machine-learning process is aimed at producing short natural language descriptions of a fact. Instead, we assist the graph-query construction in an unsupervised manner.

[Como esses fatos adicionais são acrescentados? Wikipedia para obter descrições? Que tipos de metadados seriam esses?]

3 PROBLEM FORMULATION

A knowledge graph is a set of entities and relationships among them. Knowledge graphs are represented as directed labelled multi-graphs, in which nodes model entities, edges relationships among them, and labels the names of entities and relationships.

[Considera Labels mas não considera Literais (outros valores além dos rótulos). A formalização das definições é compreensível, pode servir de base para minha proposta]

Definition 3.2 (Exemplar Queries). The answers to a query Q is a set of subgraphs of the knowledge graph that is similar to Q for some similarity relation ∼, i.e., A ={A | A⊑K ∧ A∼Q }

The above definition depends on the specific choice of the similarity ∼.

[Usou o predicado em comum como similaridade. Poderia usar graph embeddings]

Suggesting Query Expansions. A graph-query suggestion system needs to select only a subset of the possible expansions to be presented to the user. Indeed, the straightforward approach, which generates all possible expansions through all the neighbouring relationships around the initial query, overloads the user with too many options. ... We formulate the problem in a ranking-retrieval fashion by defining a relevance function ρ on the possible relationships that can be added to the current query Q to obtain the expansion Q ′ on the knowledge graph K :⟨V,E,L⟩. We denote the set of candidate expansion edges as Eδ .

[Top K mais semelhantes]

Problem 1 (Graph Query Suggestion). Given a knowledge graph K :⟨V,E,L⟩, a number k >0, and an initial query Q, retrieve the top-k edges set Eδ ⊂E, |Eδ |≤k, ranked according to relevance function ρ. The elements from Eδ are returned as suggested query expansions.

4 GRAPH-QUERY SUGGESTION MODEL

4.1 Bag-of-Labels Model for Graph-Query

One of our core contributions is a simple, yet powerful, modelling that bridges the gap between knowledge graphs and the well-known retrieval models applied to documents.

[Inspirado em IR, modelos de linguagem para palavras-chave]

We first establish a parallel between graph-query-suggestion and query expansion for keyword queries with reference to language models. The language model in keyword queries relates the relevance of a document D to the intent expressed by a keyword query q. The intent behind a query is defined by a distribution over the words from which the user samples the content of the query. In language models, each document D in a collection C is described by a multinomial distribution ˆp(w |D) over each keyword w. Under such model, the likelihood ˆp(D|q) for a query q=⟨w1,w2...wn⟩ is proportional to
ˆp(q|D) ˆp(D), where ˆp(D) is a prior on D. Intuitively, the user’s query describes a summary of a document through a sample of words drawn from a distribution over the document collection. Therefore, identifying which new keyword w′ could be added to q turns into estimating the probability ˆp(w ′|q). Hence, we estimate the probabilistic model Mq (a multinomial distribution over keywords) that generated q as well as the one generating D, i.e., MD.

[A consulta em palavras chaves representa os documentos recuperados. Para uma lista de palavras chaves é possível calcular a probabilidade de uma palavra do vocabulário fazer parte da lista para recuperar um documento]

To bridge the gap between traditional search models and graph-search, we propose an adapted representation for a graph. In particular, we note that a graph query describes a set of relationships, which are characterized by the respective edge labels. As such, a graph can be modelled as follows:

Definition 4.1 (Bag of Labels Model of a Graph). Given a graph G :⟨VG,EG,LG⟩,G ⊑K, its adapted representation as a multiset (or bag) is Bag(G):{lm(l ) | ⟨v1,v2,l⟩ ∈ EK ∧ (v1 ∈ VG ∨ v2 ∈ VG) }. m(l) represents the cardinality of label l in the Bag(G) (duplicate edges are preserved) as |{⟨v1,v2,l⟩∈EK ∧(v1∈VG∨v2∈VG)}|.

[Bag of Words - CBOW ... Bag of Labels]

Under this definition, two graphs are similar if they contain similar bags of edge labels. Note that an edge label can appear multiple times around nodes in a graph, so the frequency of edge labels is as important as the frequency of keywords in the corresponding bag-of-words model. Furthermore, we do not include only edges that appear in the current graph, but also edges on the fringe that connects the graph-nodes with the surrounding portion of the knowledge graph.

This choice has two positive effects: (1) it enriches the description of Bag(Q) and Bag(G), and (2) allows the model to be applicable even when Q contains just a single node.

4.2 Baseline Scoring-Functions

We present two baseline techniques for computing ρ(Q,e) as the likelihood of choosing the edge e with label ℓ(e)=l, given a graph query Q with some modeled edge-label distribution M, i.e., ˆp(l |MQ ). The first method is based on maximum likelihood estimation, where the score of a label l is proportional to its relative frequency around the graph-query and in the entire repository, i.e., frequent labels are considered more likely to be part of a query.

[Rótulos mais frequentes no KG são mais indicados para fazer parte da consulta (*1)]

The second technique favours distinctive labels, i.e., labels that are frequent around the query, but infrequent in the dataset. This score is computed with the KL-divergence.

[Rótulos menos frequentes no KG porém mais frequentes ao redor do nó são mais indicados para fazer parte da consulta (*2)]

4.3 Scoring with Pseudo-Relevance Feedback

Next, we describe a model-based approach inspired by the pseudo-relevance feedback framework. Under this model, we estimate the likelihood of a candidate expansion-edge based on its relative frequency within a pseudo-relevance set of graphs retrieved by the original query. Based on that, a new model is estimated (Mr el ) to evaluate the likelihood of a candidate query expansion.

[Usar o resultado da consulta anterior para calcular a relevância do predicado (*3)]

Surprise-based Heuristic We complete our study including a scoring technique for expansions based on the concept of surprise. This heuristic is still implemented within the pseudo-relevance feedback framework. ... Note that, opposed to the earlier models, this score counts the number of documents in
which a term appears, but is not affected if in some documents the term is more or less frequent.

5 EXPERIMENTAL EVALUATION

Here we show that suggesting labels based only on their absolute frequency in the graph does not provide any useful information. The same is true when we favour expansions based only on node popularity or proximity: they result more frequently in unhelpful suggestions. In contrast, our method based on KL-divergence with PRF inclusion provides relevant suggestions even when the query contains one edge. Therefore, the KL-divergence with PRF is the most appropriate score for graph-query suggestion and it is able to effectively support users unfamiliar with the graph schema.

[*1 e *2 são o baseline para a proposta que é o *3]

5.1 Experimental settings

Dataset: We validate our approach on Freebase.

Implementation: We implement our four methods: MLE: The maximum likelihood estimation (Eq. 1); KL: The KL-divergence method (Eq. 2); MLE-rel: The maximum likelihood estimation with PRF (Eq. 3); KL-rel: The KL-divergence method with PRF (Eq. 4).
Additionally, we implement three baseline methods

Queries: For the first set (QALD-7), we obtained graph queries by translating questions and answers from the QALD-7 dataset. We manually selected 65 queries covering diverse topics and with a clear exploratory intent, having potentially multiple answers and an explicit mapping into Freebase nodes and edges.

We then assigned to each graph-query a descriptive, yet unbiased, sentence to describe the exploratory information need. For instance, in the previous example, the assigned exploratory need was “Academic information about Albert Einstein”

[A pergunta do QALD-7 em si não é exploratória. Talvez se combinar um conjunto de perguntas sobre o mesmo assunto ou a mesma entidade pode se tornar exploratório. ]

The second set (KG contextualization) is a larger set of 325 facts designed to test fact contextualization methods [ 43 ]. While this set has more queries, it is designed for a different task, where the query is limited to a single edge and answers are expected to help the user understand the context of such fact. Moreover, all the queries in this set are people-centric (i.e., facts that involve a person).

[Olhar esse outro trabalho para verificar se atende a contextualização]

Running time:

[Não houve de fato uma demostração do tempo de resposta]

5.2 User assessment

For the QALD-7 query set, we computed the top-20 edge expansion suggestions for each query and each method, i.e., for each graph query we obtained a list of 20 distinct edges that could be added to it.

[K = 20]

Through crowdsourcing, for each query and candidate expansion, we collected human judgments assessing their relevance on a four-point scale: irrelevant (0), uninteresting (1), fairly interesting (2), really interesting (3).

[feedback do usuário em relação ao resultado]

The query set for KG contextualization provides judgments with a similar format also obtained through crowdsourcing. In their case, facts are scored based not only on the edge-types but also on the target entity involved.

... the user intent can be described by a very small number of edges. This finding is consistent with the other query set, where less than 10% of the answers are judged relevant.

[Pouca expansão uma vez que são necessários poucas arestas para atender ao usuário]

5.3 Ranking quality

We compare first the ranked suggestions for the QUALD-7 dataset in terms of Normalized Discounted Cumulative Gain (NDCG) from top-1 to top-20, for the case of queries composed by one single entity (Figure 2), one edge (Figure 3), and two or more edges (Figure 4).

As expected, knowing just the entity of interest is too little information to predict the user interest. In such cases, an effective strategy could maximize suggestion-list diversity, in order to cover many different aspects and help the user disambiguate their intention.

...

5.4 Measuring user effort

We estimate the user effort spared on obtaining the desired graph query. We assume each query’s target graph is obtained by selecting the edges with the highest user rating. The initial query is the fact mentioned by keyword queries in the QALD-7 dataset (e.g., for “Academic information on A. Einstein” we consider ⟨A. Einstein, advisor, A. Kleiner⟩). The overall effort compares the number of suggestions required to retrieve all the edges from the initial query to the total number of possible edge types that a user would be required to inspect without our system (i.e., the number of edge-types for each entity).

For instance, for the query above, the final query graph would contain facts about the advisor, the PhD degree, and the university of affiliation. Considering that the entity A. Einstein has 41 edge-types around it and presenting to the user just one instance of each type, the user would be required to inspect all of them to identify the desired edges. With our best scoring (KL-rel), the first relevant fact (⟨A. Einstein, education, PhD⟩) was within the top-5 suggestions, while the information about the department and the university were among the top-6 in the subsequent set of suggestions. This allows the user to limit the inspection to just 11 edge types, instead of a total of 50.

[A expansão sugestão fez o usuário manipular 1/5 das arestas possíveis]

6 CONCLUSIONS

[43] Nikos Voskarides, Edgar Meij, Ridho Reinanda, Abhinav Khaitan, Miles Osborne, Giorgio Stefanoni, Prabhanjan Kambadur, and Maarten de Rijke. 2018. Weaklysupervised Contextualization of Knowledge Graph Facts. In SIGIR. 765–774. https://doi.org/10.1145/3209978.3210031

Pesquisa de Doutorado da Veronica

Pesquisar este blog

Graph-Query Suggestions for Knowledge Graph Exploration - Leitura de Artigo

Marcadores

Comentários

Postar um comentário

Postagens mais visitadas deste blog

Connected Papers: Uma abordagem alternativa para revisão da literatura

Knowledge Graph Embedding with Triple Context - Leitura de Abstract

Exploratory Search: From Finding to Understanding - Leitura de Artigo