Nikos Voskarides, Edgar Meij, Ridho Reinanda, Abhinav Khaitan, Miles Osborne, Giorgio Stefanoni, Prabhanjan Kambadur, and Maarten de Rijke. 2018. Weakly-supervised Contextualization of Knowledge Graph Facts. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR '18). Association for Computing Machinery, New York, NY, USA, 765–774. https://doi.org/10.1145/3209978.3210031
ABSTRACT
...
When presenting a KG fact to the user, providing other facts that are pertinent to that main fact can enrich the user experience and support exploratory information needs. KG fact contextualization is the task of augmenting a given KG fact with additional and useful KG facts. The task is challenging because of the large size of KGs; discovering other relevant facts even in a small neighborhood of the given fact results in an enormous amount of candidates.
[O contexto aqui são fatos vizinhos e relevantes pra o fato "central"]
We introduce a neural fact contextualization method (NFCM) to address the KG fact contextualization task. NFCM first generates a set of candidate facts in the neighborhood of a given fact and then ranks the candidate facts using a supervised learning to rank model.
[Ordenar os fatos vizinhos para identificar os mais relevantes. Contextualização de fatos é uma tarefa]
Evaluation using human assessors shows that it significantly outperforms several competitive baselines
[Avaliação com humanos]
1 INTRODUCTION
Knowledge graphs (KGs) have become essential for applications such as search, query understanding, recommendation and question answering because they provide a unified view of real-world entities and the facts (i.e., relationships) that hold between them [6 , 7, 22, 34 ].
[Fatos são só os relacionamentos entre as entidades ou entre entidades e conceitos]
Previous work has focused on augmenting entity cards with facts that are centered around, i.e., one-hop away from, the main entity of the query [17].
[Query star-join]
..., we can exploit the richness of the KG by providing query-specific additional facts that increase the user’s understanding of the fact as a whole, and that are not necessarily centered around only one of the entities.
[Contexto além dos qualificadores]
Query-specific relevant facts can also be used in other applications to enrich the user experience.
[Em busca exploratória]
In this paper, we address the task of KG fact contextualization, that is, given a KG fact that consists of two entities and a relation that connects them, retrieve additional facts from the KG that are relevant to that fact.
[Fatos adicionais não poderiam causar information overloading?]
We propose a neural fact contextualization method (NFCM), a method that first generates a set of candidate facts that are part of {1,2}-hop paths from the entities of the main fact. NFCM then ranks the candidate facts by how relevant they are for contextualizing the main fact.
[A função de ranqueamento foi aprendida com método supervisionado. Então é aplicada as triplas resultantes das star-join das duas entidades envolvidas. As star-joing são no padrão V->?u->?w ou ?w->?u->V]
We estimate our learning to rank model using supervised data. The ranking model combines (i) features we automatically learn from data and (ii) those that represent the query-candidate facts with a set of hand-crafted features we devised or adjusted for this task.
[Aprendeu e ajustou manualmente]
2 PROBLEM STATEMENT
Let E = En ∪ Ec be a set of entities, where En and Ec are disjoint sets of non-CVT and CVT entities, respectively.
We define a fact as a path in K that either: (i) consists of 1 triple, s0 ∈ E and t0 ∈ En (i.e., s0 may be a CVT entity), or (ii) consists of 2 triples, s0, t1 ∈ En and t0 = s1 ∈ Ec (i.e., t0 = s1 must be a CVT
entity). A fact of type (i) can be an attribute of a fact of type (ii), iff they have a common CVT entity (see Figure 2 for an example).
Let R be a set of relationships where a relationship r ∈ R is a label for a set of facts that share the same predicates but differ in at least one entity. For example, spouseOf is the label of the fact depicted in the top part of Figure 2 and consists of two triples. Our definition of a relationship corresponds to direct relationships between entities, i.e., one-hop paths or two-hop paths through a CVT entity. For the
remainder of this paper, we refer to a specific fact f as r ⟨s, t⟩, where r ∈ R and s, t ∈ E.
2.2 Task definition
Given a query fact fq and a KG K, we aim to find a set of other, relevant facts from K. Specifically, we want to enumerate and rank a set of candidate facts F = { fc : fc ⊆ K, fc , fq } based on their relevance to fq .
In this section we describe our proposed neural fact contextualization method (NFCM) which works in two steps. First, given a query fact fq , we enumerate a set of candidate facts F = { fc : fc ⊆ K }. Second, we rank the facts in F by relevance to fq to obtain a final ranked list F ′ using a supervised learning to rank model.
3.1 Enumerating KG facts
.... [exclusões] (i) CVT entities are not counted as hops, (ii) we do not include fq in F as it is trivial, and (iii) to reduce the search space, we do not expand intermediate neighbors that represent an entity class or a type (e.g., “actor”) as these can have millions of neighbors.
3.2 Fact ranking
For each candidate fact fc ∈ F , we create a pair (fq , fc ) ... and score it using a function u : (fq , fc ) → [0, 1] ∈ R (higher values indicate higher relevance). We then obtain a ranked list of facts F ′ by sorting the facts in F based on their score.
In this section we describe the setup of our experiments that aim to answer the following research questions:
Our dataset consists of query facts, candidate facts, and a relevance label for each query-candidate fact pair.
Gathering relevance labels for our task is challenging due to the size and heterogeneous nature of KGs, i.e., having a large number of facts and relationship types. Therefore, we turn to distant supervision [23] to gather relevance labels at scale. We choose to get a supervision signal from Wikipedia for the following reasons: (i) it has a high overlap of entities with the KG we use, and (ii) facts that are in KGs are usually expressed in Wikipedia articles alongside other, related facts.
In order to evaluate the performance of NFCM on the KG fact contextualization task, we perform crowdsourcing to collect a human-curated evaluation dataset.
• very relevant: I would include the candidate fact in the description of the query fact; the candidate fact provides additional context to the query fact.
• somewhat relevant: I would include the candidate fact in the description of the query fact, but only if there is space.
• irrelevant: I would not include the candidate fact in the description of the query fact
To the best of our knowledge, there is no previously published method that addresses the task introduced in this paper. Therefore, we devise a set of intuitive baselines that are used to showcase
that our task is not trivial.
The models described in Section 3.2 are implemented in TensorFlow v.1.4.1 [ 1]. Table 5 lists the hyperparameters of NFCM. We tune the variable hyper-parameters of this table on the validation set and
optimize for NDCG@5
In our first experiment, we compare NFCM to a set of heuristic baselines we derived to answer RQ1. ... We conclude that the task we define in this paper is not trivial to solve and simple heuristic functions are not sufficient.
In our second experiment we compare NFCM with distant supervision and aim to answer RQ2. ... conclude that learning ranking functions (and in particular NFCM) based on the signal gathered from distant supervision is beneficial for this task.
6 RELATED WORK
[A tarefa de geração de fatos para contextualizar seria inédita e interessante para busca exploratória. E baseada somente nas entidades da tripla.]
The specific task we introduce in this paper has not been addressed before, but there is related work in three main areas: entity relationship explanation, distant supervision, and fact ranking.
6.1 Relationship Explanation
Explanations for relationships between pairs of entities can be provided in two ways: structurally, i.e., by providing paths or sub-graphs in a KG containing the entities, or textually, by ranking or generating text snippets that explain the connection.
[Os caminhos seriam o mais adequado para a busca exploratória no grafo]
6.3 Fact Ranking
In fact ranking, the goal is to rank a set of attributes with respect to an entity. Hasibi et al . [17] consider fact ranking as a component for entity summarization for entity cards. They approach fact ranking as a learning to rank problem. They learn a ranking model based on importance, relevance, and other features relating a query and the facts.
Graph matching involves matching two graphs and discovering the patterns of relationships between them to infer their similarity [ 11 ]. Although our task can be considered as comparing a small query subgraph (i.e., query triples) and a knowledge graph, the goal is different from graph matching which mainly concerns aligning two graphs rather than enhancing one query graph.
[A abordagem expande o resultado com mais triplas mas faz uma seleção das triplas baseada na relevância calculada]
Our work differs from the work discussed above in the following major ways. First, we enrich a query fact between two entities by providing relevant additional facts in the context of the query fact, taking into account both the entities and the relation of the query fact. Second, we rank whole facts from the KG instead of just entities.
author = {Voskarides, Nikos and Meij, Edgar and Reinanda, Ridho and Khaitan, Abhinav and Osborne, Miles and Stefanoni, Giorgio and Kambadur, Prabhanjan and de Rijke, Maarten},
title = {Weakly-Supervised Contextualization of Knowledge Graph Facts},
year = {2018},
isbn = {9781450356572},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3209978.3210031},
doi = {10.1145/3209978.3210031},
booktitle = {The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval},
pages = {765–774},
numpages = {10},
keywords = {distant supervision, fact contextualization, knowledge graphs},
location = {Ann Arbor, MI, USA},
series = {SIGIR '18}
}
Comentários
Postar um comentário
Sinta-se a vontade para comentar. Críticas construtivas são sempre bem vindas.