Pular para o conteúdo principal

A Survey on Knowledge Graphs: Representation, Acquisition and Applications - Leitura de Artigo

Ji, S., Pan, S., Cambria, E., Marttinen, P., & Philip, S. Y. (2021). A survey on knowledge graphs: Representation, acquisition, and applications. IEEE Transactions on Neural Networks and Learning Systems33(2), 494-514.
 
Abstract— 1) knowledge graph representation learning, 2) knowledge acquisition and completion, 3) temporal knowledge graph, and 4) knowledge-aware applications, and summarize recent breakthroughs and perspective directions to facilitate future research. 
 
We further explore several emerging topics, including meta relational learning, commonsense reasoning, and temporal knowledge graphs. 
 
To facilitate future research on knowledge graphs, we also provide a curated collection of datasets and open-source libraries on different tasks. 
 
[Olhar os datasets] 

I. INTRODUCTION
 
A knowledge graph is a structured representation of facts, consisting of entities, relationships, and semantic descriptions. Entities can be real-world objects and abstract concepts, relationships represent the relation between entities, and semantic descriptions of entities, and their relationships contain types and properties with a well-defined meaning. Property graphs or attributed graphs are widely used, in which nodes and relations have properties or attributes.
 
[Na definição consideram fatos. Também relacionam com KB em seguida mas não estabelecem com semantic network diretamente]
 
For simplicity and following the trend of the research community, this paper uses the terms knowledge
graph and knowledge base interchangeably.
 
Recent advances in knowledge-graph-based research focus on knowledge representation learning (KRL) or knowledge graph embedding (KGE) by mapping entities and relations into low-dimensional vectors while capturing their semantic meanings [5], [9]. Specific knowledge acquisition tasks include knowledge graph completion (KGC), triple classification, entity recognition, and relation extraction.
 
II. OVERVIEW
 
 
V. TEMPORAL KNOWLEDGE GRAPH

Current knowledge graph research mostly focuses on static knowledge graphs where facts are not changed with time, while the temporal dynamics of a knowledge graph is less explored. However, the temporal information is of great importance because the structured knowledge only holds within a specific period, and the evolution of facts follows a time sequence. Recent research begins to take temporal information into KRL and KGC, which is termed as temporal knowledge graph in contrast to the previous static knowledge graph. Research efforts have been made for learning temporal and relational embedding simultaneously.
 
[Contexto temporal sendo incluído nas abordagens KRL. A mesma abordagem poderia se aplica a outros contextos? Já houve esse movimento em Databases]

B. Entity Dynamics

Real-world events change entities’ state, and consequently, affect the corresponding relations. To improve temporal scope inference, the contextual temporal profile model [181] formulates the temporal scoping problem as state change detection and utilizes the context to learn state and state change vectors.
 
[Contexto temporal aplicado as entidades / nós]
 
C. Temporal Relational Dependency

There exists temporal dependencies in relational chains following the timeline, for example, 
wasBornIn -> graduateFrom -> workAt -> diedIn.
 
[Uma regra semântica poderia inferir contexto temporal em caso de informação faltante]
 
VI. KNOWLEDGE-AWARE APPLICATIONS

Rich structured knowledge can be useful for AI applications. However, how to integrate such symbolic knowledge into the computational framework of real-world applications remains a challenge. The application of knowledge graphs includes two folds: 1) in-KG applications such as link prediction and named entity recognition; and 2) out-of-KG applications, including relation extraction and more downstream knowledge-aware applications such as question answering and recommendation
systems.
 
[Não comenta sobre Busca Exploratória]
 
B. Question Answering

Knowledge-graph-based question answering (KG-QA) answers natural language questions with facts from knowledge graphs. Neural network-based approaches represent questions and answers in distributed semantic space, and some also conduct symbolic knowledge injection for commonsense reasoning.
 
VII. FUTURE DIRECTIONS
 
C. Interpretability

Interpretability of knowledge representation and injection is a vital issue for knowledge acquisition and real-world applications. ... However, recent neural models have limitations on transparency and interpretability, although they have gained impressive performance. Some methods combine black-box neural models and symbolic reasoning by incorporating logical rules to increase the interoperability. Interpretability can convince people to trust predictions. Thus, further work should go into interpretability and improve the reliability of predicted knowledge.
 
[Seria explicabilidade?]
 
APPENDIX D

KRL MODEL TRAINING

Open world assumption (OWA) and closed world assumption (CWA) [214] are considered when training knowledge representation learning models. During training, a negative sample set F0 is randomly generated by corrupting a golden triple set F under the OWA. Mini-batch optimization and Stochastic Gradient Descent (SGD) are carried out to minimize a certain loss function. Under the OWA, negative samples are generated with specific sampling strategies designed to reduce the number
of false negatives.
 
A. Open and Closed World Assumption

The CWA assumes that unobserved facts are false. By contrast, the OWA has a relaxed assumption that unobserved ones can be either missing or false. Generally, OWA has an advantage over CWA because of the incompleteness nature of knowledge graphs. RESCAL [49] is a typical model trained under the CWA, while more models are formulated under the OWA.
 
C. Negative Sampling

Several heuristics of sampling distribution are proposed to corrupt the head or tail entities. The widest applied one is uniform sampling [16], [17], [39] that uniformly replaces entities. But it leads to the sampling of false-negative labels. More effective negative sampling strategies are required to learn semantic representation and improve predictive performance.
 
[Treinamento e a questão CWA/OWA]
 
APPENDIX F

DATASETS AND LIBRARIES

In this section, we introduce and list useful resources of knowledge graph datasets and open-source libraries.

A. Datasets

Many public datasets have been released. We conduct an introduction and a summary of general, domain-specific, taskspecific, and temporal datasets.

1) General Datasets: Datasets with general ontological knowledge include WordNet [234], Cyc [235], DBpedia [236], YAGO [237], Freebase [238], NELL [73] and Wikidata [239]. It is hard to compare them within a table as their ontologies are different.
 
2) Domain-Specific Datasets: Some knowledge bases on specific domains are designed and collected to evaluate domainspecific tasks. Some notable domains include life science, health care, and scientific research, covering complex domains and relations such as compounds, diseases, and tissues. Examples of domain-specific knowledge graphs are ResearchSpace6, a cultural heritage knowledge graph; UMLS [240], a unified medical language system; SNOMED CT7, a commercial clinical terminology; and a medical knowledge graph from Yidu Research8. 
 
More biological databases with domain-specific knowledge include STRING, protein-protein interaction networks 9; SKEMPI, a Structural Kinetic and Energetic database of Mutant Protein Interactions [241]; Protein Data Bank (PDB) database10, containing biological molecular data [242]; GeneOntology11, a gene ontology resource that describes protein function; and DrugBank12, a pharmaceutical knowledge base [243], [244].

3) Task-Specific Datasets: A popular way of generating task-specific datasets is to sample subsets from large general datasets. Statistics of several datasets for tasks on the knowledge graph itself are listed in Table VIII. Notice that WN18 and FB15k suffer from test set leakage [55]. For KRL with auxiliary information and other downstream knowledge-aware applications, texts and images are also collected, for example, WN18-IMG [71] with sampled images and textual relation extraction dataset including SemEval 2010 dataset, NYT [245] and Google-RE13. IsaCore [246], an analogical closure of
Probase for opinion mining and sentiment analysis, is built by common knowledge base blending and multi-dimensional scaling. Recently, the FewRel dataset [247] was built to evaluate the emerging few-shot relation classification task. There are also more datasets for specific tasks such as cross-lingual DBP15K [128] and DWY100K [127] for entity alignment, multi-view knowledge graphs of YAGO26K-906 and DB111K-174 [119] with instances and ontologies.
 
We provide an online collection of knowledge graph publications, together with links to some open-source implementations of them, hosted at
https://shaoxiongji.github.io/knowledge-graphs/.


 

Comentários

Postagens mais visitadas deste blog

Aula 12: WordNet | Introdução à Linguagem de Programação Python *** com NLTK

 Fonte -> https://youtu.be/0OCq31jQ9E4 A WordNet do Brasil -> http://www.nilc.icmc.usp.br/wordnetbr/ NLTK  synsets = dada uma palavra acha todos os significados, pode informar a língua e a classe gramatical da palavra (substantivo, verbo, advérbio) from nltk.corpus import wordnet as wn wordnet.synset(xxxxxx).definition() = descrição do significado É possível extrair hipernimia, hiponimia, antonimos e os lemas (diferentes palavras/expressões com o mesmo significado) formando uma REDE LEXICAL. Com isso é possível calcular a distância entre 2 synset dentro do grafo.  Veja trecho de código abaixo: texto = 'útil' print('NOUN:', wordnet.synsets(texto, lang='por', pos=wordnet.NOUN)) texto = 'útil' print('ADJ:', wordnet.synsets(texto, lang='por', pos=wordnet.ADJ)) print(wordnet.synset('handy.s.01').definition()) texto = 'computador' for synset in wn.synsets(texto, lang='por', pos=wn.NOUN):     print('DEF:',s...

truth makers AND truth bearers - Palestra Giancarlo no SBBD

Dando uma googada https://iep.utm.edu/truth/ There are two commonly accepted constraints on truth and falsehood:     Every proposition is true or false.         [Law of the Excluded Middle.]     No proposition is both true and false.         [Law of Non-contradiction.] What is the difference between a truth-maker and a truth bearer? Truth-bearers are either true or false; truth-makers are not since, not being representations, they cannot be said to be true, nor can they be said to be false . That's a second difference. Truth-bearers are 'bipolar,' either true or false; truth-makers are 'unipolar': all of them obtain. What are considered truth bearers?   A variety of truth bearers are considered – statements, beliefs, claims, assumptions, hypotheses, propositions, sentences, and utterances . When I speak of a fact . . . I mean the kind of thing that makes a proposition true or false. (Russe...

DGL-KE : Deep Graph Library (DGL)

Fonte: https://towardsdatascience.com/introduction-to-knowledge-graph-embedding-with-dgl-ke-77ace6fb60ef Amazon recently launched DGL-KE, a software package that simplifies this process with simple command-line scripts. With DGL-KE , users can generate embeddings for very large graphs 2–5x faster than competing techniques. DGL-KE provides users the flexibility to select models used to generate embeddings and optimize performance by configuring hardware, data sampling parameters, and the loss function. To use this package effectively, however, it is important to understand how embeddings work and the optimizations available to compute them. This two-part blog series is designed to provide this information and get you ready to start taking advantage of DGL-KE . Finally, another class of graphs that is especially important for knowledge graphs are multigraphs . These are graphs that can have multiple (directed) edges between the same pair of nodes and can also contain loops. The...