Pular para o conteúdo principal

Exploring Scholarly Data by Semantic Query on Knowledge Graph Embedding Space - Leitura de Artigo

Tran H.N., Takasu A. (2019) Exploring Scholarly Data by Semantic Query on Knowledge Graph Embedding Space. In: Doucet A., Isaac A., Golub K., Aalberg T., Jatowt A. (eds) Digital Libraries for Open Knowledge. TPDL 2019. Lecture Notes in Computer Science, vol 11799. Springer, Cham. https://doi.org/10.1007/978-3-030-30760-8_14

Abstract.  

... In recent years, the knowledge graph has emerged as a universal data format for representing knowledge about heterogeneous entities and their relationships. The knowledge graph can be modeled by knowledge graph embedding methods, which represent entities and relations as embedding vectors in semantic space, then model the interactions between these embedding vectors. 

... In this paper, we propose to analyze these semantic structures based on the well-studied word embedding space and use them to support data exploration.

We also define the semantic queries, which are algebraic operations between the embedding vectors in the knowledge graph embedding space, to solve queries such as similarity and analogy between the entities on the original datasets. We then design a general framework for data exploration by semantic queries and discuss the solution to some traditional scholarly data exploration tasks. 

1 Introduction

In recent years, digital libraries have moved towards open science and open access with several large scholarly datasets being constructed. 

Notably, instead of using knowledge graphs directly in some tasks, we can model them by knowledge graph embedding methods, which represent entities and relations as embedding vectors in semantic space, then model the interactions between them to solve the knowledge graph completion task. 

2 Related Work

2.1 Knowledge graph for scholarly data
2.2 Knowledge graph embedding
2.3 Word embedding

3 Theoretical analysis

3.2 Semantic query

 We mainly concern with the two following structures of the embedding space.

  Semantic similarity structure: Semantically similar entities are close to each other in the embedding space, and vice versa. This structure can be identified by a vector similarity measure, such as the dot product between two embedding vectors.     


Semantic direction structure: There exist semantic directions in the embedding space, by which only one semantic aspect changes while all other aspects stay the same. It can be identified by a vector difference, such as the subtraction between two embedding vectors.  

 

Definition 1. Semantic queries on knowledge graph embedding space are defined as the algebraic operations between the knowledge graph embedding vectors to approximate a given data exploration task on the original dataset 

4 Semantic query framework

 

Task processing: converting data exploration tasks to algebraic operations on the embedding space by following task-specific conversion templates. Some important tasks and their conversion templates are discussed in Section 5.

5 Exploration tasks and semantic queries conversion

5.1 Similar entities

Given an entity e ∈ E, find entities that are similar to e. For example, given AuthorA, find authors, papers, and venues that are similar to AuthorA. Note that we can restrict to find specific entity types. This is a traditional tasks in scholarly data exploration, whereas other below tasks are new.
Semantic query We can solve this task by looking for the entities with highest similarity to e.


5.2 Similar entities with bias

Given an entity e ∈ E and some positive bias entities A = {a1, . . . , ak} known as expected results, find entities that are similar to e following the bias in A. For example, given AuthorA and some successfully collaborating authors, find other similar authors that may also result in good collaborations with AuthorA.
Semantic query We can solve this task by looking for the entities with highest similarity to both e and A. For example, denoting the arithmetic mean of embedding vectors in A as ̄A,  
 
 


5.3 Analogy query 

Given an entity e ∈ E, positive bias A = {a1, . . . , ak}, and negative bias B = {b1, . . . , bk}, find entities that are similar to e following the biases in A and B. The essence of this task is tracing along a semantic direction defined by the positive and negative biases. For example, start with AuthorA, we can trace along the expertise direction to find authors that are similar to AuthorA but with higher or lower expertise.
Semantic query We can solve this task by looking for the entities with highest similarity to e and A but not B. For example, denoting the arithmetic mean of embedding vectors in A and B as ̄A and ̄B, respectively, note that ̄A ̄B defines the semantic direction along the positive and negative biases


5.4 Analogy browsing 

This task is an extension of the above analogy query task, by tracing along multiple semantic directions defined by multiple pairs of positive and negative biases. This task can be implemented as an interactive data analysis tool. For example, start with AuthorA, we can trace to authors with higher expertise, then continue tracing to new domains to find all authors similar to AuthorA with high expertise in the new domain. For another example, start with Paper1, we can trace to papers with higher quality, then continue tracing to new domain to look for papers similar to Paper1 with high quality in the new domain.
Semantic query We can solve this task by simply repeating the semantic query for analogy query with each pair of positive and negative bias. Note that we can also combine different operations in different order to support flexible browsing.

Outras fontes

Esse artigo está relacionado ao benchmark abaixo

KG20C: A scholarly knowledge graph benchmark dataset

Gerado a partir do MAG, Consultas respondidas com embeddings

Who may work at this organization?    
Where may this author work at?    
Who may write this paper?    
What papers may this author write?    
Which papers may cite this paper?    
Which papers may this paper cite?    
Which papers may belong to this domain?    
Which may be the domains of this paper?    
Which papers may publish in this conference?    
Which conferences may this paper publish in?

https://www.kaggle.com/tranhungnghiep/kg20c-scholarly-knowledge-graph/version/88

 

Comentários

Postagens mais visitadas deste blog

Connected Papers: Uma abordagem alternativa para revisão da literatura

Durante um projeto de pesquisa podemos encontrar um artigo que nos identificamos em termos de problema de pesquisa e também de solução. Então surge a vontade de saber como essa área de pesquisa se desenvolveu até chegar a esse ponto ou quais desdobramentos ocorreram a partir dessa solução proposta para identificar o estado da arte nesse tema. Podemos seguir duas abordagens:  realizar uma revisão sistemática usando palavras chaves que melhor caracterizam o tema em bibliotecas digitais de referência para encontrar artigos relacionados ou realizar snowballing ancorado nesse artigo que identificamos previamente, explorando os artigos citados (backward) ou os artigos que o citam (forward)  Mas a ferramenta Connected Papers propõe uma abordagem alternativa para essa busca. O problema inicial é dado um artigo de interesse, precisamos encontrar outros artigos relacionados de "certa forma". Find different methods and approaches to the same subject Track down the state of the art rese...

Knowledge Graph Embedding with Triple Context - Leitura de Abstract

  Jun Shi, Huan Gao, Guilin Qi, and Zhangquan Zhou. 2017. Knowledge Graph Embedding with Triple Context. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM '17). Association for Computing Machinery, New York, NY, USA, 2299–2302. https://doi.org/10.1145/3132847.3133119 ABSTRACT Knowledge graph embedding, which aims to represent entities and relations in vector spaces, has shown outstanding performance on a few knowledge graph completion tasks. Most existing methods are based on the assumption that a knowledge graph is a set of separate triples, ignoring rich graph features, i.e., structural information in the graph. In this paper, we take advantages of structures in knowledge graphs, especially local structures around a triple, which we refer to as triple context. We then propose a Triple-Context-based knowledge Embedding model (TCE). For each triple, two kinds of structure information are considered as its context in the graph; one is the out...

KnOD 2021

Beyond Facts: Online Discourse and Knowledge Graphs A preface to the proceedings of the 1st International Workshop on Knowledge Graphs for Online Discourse Analysis (KnOD 2021, co-located with TheWebConf’21) https://ceur-ws.org/Vol-2877/preface.pdf https://knod2021.wordpress.com/   ABSTRACT Expressing opinions and interacting with others on the Web has led to the production of an abundance of online discourse data, such as claims and viewpoints on controversial topics, their sources and contexts . This data constitutes a valuable source of insights for studies into misinformation spread, bias reinforcement, echo chambers or political agenda setting. While knowledge graphs promise to provide the key to a Web of structured information, they are mainly focused on facts without keeping track of the diversity, connection or temporal evolution of online discourse data. As opposed to facts, claims are inherently more complex. Their interpretation strongly depends on the context and a vari...