Ji, S., Pan,
S., Cambria, E., Marttinen, P., & Philip, S. Y. (2021). A survey on
knowledge graphs: Representation, acquisition, and applications. IEEE Transactions on Neural Networks and Learning Systems, 33(2), 494-514.
Abstract— 1) knowledge graph representation learning, 2) knowledge acquisition and completion, 3) temporal knowledge graph, and 4) knowledge-aware applications, and summarize recent breakthroughs and perspective directions to facilitate future research.
We further explore several emerging topics, including meta relational learning, commonsense reasoning, and temporal knowledge graphs.
To facilitate future research on knowledge graphs, we also provide a curated collection of datasets and open-source libraries on different tasks.
[Olhar os datasets]
I. INTRODUCTION
A knowledge graph is a structured representation of facts, consisting of entities, relationships, and semantic descriptions. Entities can be real-world objects and abstract concepts, relationships represent the relation between entities, and semantic descriptions of entities, and their relationships contain types and properties with a well-defined meaning. Property graphs or attributed graphs are widely used, in which nodes and relations have properties or attributes.
[Na definição consideram fatos. Também relacionam com KB em seguida mas não estabelecem com semantic network diretamente]
For simplicity and following the trend of the research community, this paper uses the terms knowledge
graph and knowledge base interchangeably.
graph and knowledge base interchangeably.
Recent advances in knowledge-graph-based research focus on knowledge representation learning (KRL) or knowledge graph embedding (KGE) by mapping entities and relations into low-dimensional vectors while capturing their semantic meanings [5], [9]. Specific knowledge acquisition tasks include knowledge graph completion (KGC), triple classification, entity recognition, and relation extraction.
II. OVERVIEW
V. TEMPORAL KNOWLEDGE GRAPH
Current knowledge graph research mostly focuses on static knowledge graphs where facts are not changed with time, while the temporal dynamics of a knowledge graph is less explored. However, the temporal information is of great importance because the structured knowledge only holds within a specific period, and the evolution of facts follows a time sequence. Recent research begins to take temporal information into KRL and KGC, which is termed as temporal knowledge graph in contrast to the previous static knowledge graph. Research efforts have been made for learning temporal and relational embedding simultaneously.
[Contexto temporal sendo incluÃdo nas abordagens KRL. A mesma abordagem poderia se aplica a outros contextos? Já houve esse movimento em Databases]
B. Entity Dynamics
Real-world events change entities’ state, and consequently, affect the corresponding relations. To improve temporal scope inference, the contextual temporal profile model [181] formulates the temporal scoping problem as state change detection and utilizes the context to learn state and state change vectors.
[Contexto temporal aplicado as entidades / nós]
C. Temporal Relational Dependency
There exists temporal dependencies in relational chains following the timeline, for example,
wasBornIn -> graduateFrom -> workAt -> diedIn.
[Uma regra semântica poderia inferir contexto temporal em caso de informação faltante]
VI. KNOWLEDGE-AWARE APPLICATIONS
Rich structured knowledge can be useful for AI applications. However, how to integrate such symbolic knowledge into the computational framework of real-world applications remains a challenge. The application of knowledge graphs includes two folds: 1) in-KG applications such as link prediction and named entity recognition; and 2) out-of-KG applications, including relation extraction and more downstream knowledge-aware applications such as question answering and recommendation
systems.
[Não comenta sobre Busca Exploratória]
B. Question Answering
Knowledge-graph-based question answering (KG-QA) answers natural language questions with facts from knowledge graphs. Neural network-based approaches represent questions and answers in distributed semantic space, and some also conduct symbolic knowledge injection for commonsense reasoning.
VII. FUTURE DIRECTIONS
C. Interpretability
Interpretability of knowledge representation and injection is a vital issue for knowledge acquisition and real-world applications. ... However, recent neural models have limitations on transparency and interpretability, although they have gained impressive performance. Some methods combine black-box neural models and symbolic reasoning by incorporating logical rules to increase the interoperability. Interpretability can convince people to trust predictions. Thus, further work should go into interpretability and improve the reliability of predicted knowledge.
[Seria explicabilidade?]
APPENDIX D
KRL MODEL TRAINING
Open world assumption (OWA) and closed world assumption (CWA) [214] are considered when training knowledge representation learning models. During training, a negative sample set F0 is randomly generated by corrupting a golden triple set F under the OWA. Mini-batch optimization and Stochastic Gradient Descent (SGD) are carried out to minimize a certain loss function. Under the OWA, negative samples are generated with specific sampling strategies designed to reduce the number
of false negatives.
A. Open and Closed World Assumption
The CWA assumes that unobserved facts are false. By contrast, the OWA has a relaxed assumption that unobserved ones can be either missing or false. Generally, OWA has an advantage over CWA because of the incompleteness nature of knowledge graphs. RESCAL [49] is a typical model trained under the CWA, while more models are formulated under the OWA.
C. Negative Sampling
Several heuristics of sampling distribution are proposed to corrupt the head or tail entities. The widest applied one is uniform sampling [16], [17], [39] that uniformly replaces entities. But it leads to the sampling of false-negative labels. More effective negative sampling strategies are required to learn semantic representation and improve predictive performance.
[Treinamento e a questão CWA/OWA]
APPENDIX F
DATASETS AND LIBRARIES
In this section, we introduce and list useful resources of knowledge graph datasets and open-source libraries.
A. Datasets
Many public datasets have been released. We conduct an introduction and a summary of general, domain-specific, taskspecific, and temporal datasets.
1) General Datasets: Datasets with general ontological knowledge include WordNet [234], Cyc [235], DBpedia [236], YAGO [237], Freebase [238], NELL [73] and Wikidata [239]. It is hard to compare them within a table as their ontologies are different.
2) Domain-Specific Datasets: Some knowledge bases on specific domains are designed and collected to evaluate domainspecific tasks. Some notable domains include life science, health care, and scientific research, covering complex domains and relations such as compounds, diseases, and tissues. Examples of domain-specific knowledge graphs are ResearchSpace6, a cultural heritage knowledge graph; UMLS [240], a unified medical language system; SNOMED CT7, a commercial clinical terminology; and a medical knowledge graph from Yidu Research8.
More biological databases with domain-specific knowledge include STRING, protein-protein interaction networks 9; SKEMPI, a Structural Kinetic and Energetic database of Mutant Protein Interactions [241]; Protein Data Bank (PDB) database10, containing biological molecular data [242]; GeneOntology11, a gene ontology resource that describes protein function; and DrugBank12, a pharmaceutical knowledge base [243], [244].
3) Task-Specific Datasets: A popular way of generating task-specific datasets is to sample subsets from large general datasets. Statistics of several datasets for tasks on the knowledge graph itself are listed in Table VIII. Notice that WN18 and FB15k suffer from test set leakage [55]. For KRL with auxiliary information and other downstream knowledge-aware applications, texts and images are also collected, for example, WN18-IMG [71] with sampled images and textual relation extraction dataset including SemEval 2010 dataset, NYT [245] and Google-RE13. IsaCore [246], an analogical closure of
Probase for opinion mining and sentiment analysis, is built by common knowledge base blending and multi-dimensional scaling. Recently, the FewRel dataset [247] was built to evaluate the emerging few-shot relation classification task. There are also more datasets for specific tasks such as cross-lingual DBP15K [128] and DWY100K [127] for entity alignment, multi-view knowledge graphs of YAGO26K-906 and DB111K-174 [119] with instances and ontologies.
We provide an online collection of knowledge graph publications, together with links to some open-source implementations of them, hosted at
https://shaoxiongji.github.io/knowledge-graphs/.
https://shaoxiongji.github.io/knowledge-graphs/.
Comentários
Postar um comentário
Sinta-se a vontade para comentar. CrÃticas construtivas são sempre bem vindas.