Knowledge Graphs @ ACM Computing Surveys

Knowledge Graphs @ ACM Computing Surveys - Leitura de Artigo

Link https://dl.acm.org/doi/pdf/10.1145/3447772

Abstract

In this article, we provide a comprehensive introduction to knowledge graphs, which have recently garnered significant attention from both industry and academia in scenarios that require exploiting diverse, dynamic, large-scale collections of data. After some opening remarks, we motivate and contrast various graph-based data models, as well as languages used to query and validate knowledge graphs. We explain how knowledge can be represented and extracted using a combination of deductive and inductive techniques. We conclude with high-level future research directions for knowledge graphs.

1 INTRODUCTION

Graphs allow maintainers to postpone the definition of a schema, allowing the data to evolve in a more flexible manner [4].

Aqui me lembra a expressão Schema Later (não seria Schema Less)

Graph query languages support not only standard relational operators (joins, unions, projections, etc.), but also navigational operators for finding entities connected through arbitrary-length paths [4].

Em SPARQL os operadores navigacioanais são da versão 1.1. Em Cypher já haviam operadores desde as primeiras versões. Vide 2.2.3. EM grafos existe algoritimos clássicos apra identificar caminhos(BFS, DFS, ...)

Ontologies [18, 52, 89] and rules [59, 70] can be used to define and reason about the semantics of the terms used in the graph.

Definition: knowledge graph is a graph of data intended to accumulate and convey knowledge of the real world, whose nodes represent entities of interest and whose edges represent potentially different relations between these entities. {It} conforms to a graph-based data model, which may be a directed edge-labelled graph, a heterogeneous graph, a property graph, and so on

Open (BabelNet, DBPedia, Freebase, Wikidata, YAGO, ...) or Enterprise knowledge graph

Simple statements can be accumulated as edges in the data graph. For quantified statements, a more expressive way to represent knowledge—such as ontologies or rules—is required. Deductive methods can then be used to entail and accumulate further knowledge

Simples Statements envolveriam instâncias enquanto que Quantified Statement envolveriam conceitos / classes.

2 DATA GRAPHS

2.1 Models

2.1.1 Directed Edge-labelled Graphs = del graph = multi-relational graph

Modelling data in this way offers more flexibility for integrating new sources of data, compared to the standard relational model, where a schema must be defined upfront and followed at each step. While other structured data models such as trees (XML, JSON, etc.) would offer similar flexibility, graphs do not require organising the data hierarchically.... They also allow cycles to be represented and queried ... A standard data model based on del graphs is the Resource Description Framework (RDF)

Multi-relacional quando entre dois nós podem haver mais de uma relação (arestas) diferenciadas por seus rótulos

2.1.2 Heterogeneous Graphs = heterogeneous information network ...

is a graph where each node and edge is assigned one type. Heterogeneous graphs are thus akin to del graphs—with edge labels corresponding to edge types—but where the type of node forms part of the graph model itself, rather than being expressed as a special relation {rdf:type}. An edge is called homogeneous if it is between two nodes of the same type (e.g., borders); otherwise it is called heterogeneous (e.g., capital). Heterogeneous graphs allow for partitioning nodes according to their type, for example, for the purposes of machine learning tasks [61, 142, 154]. However, unlike del graphs, they typically assume a one-to-one relation between nodes and types...

Uma diferença entre del e heterogêneo é que no último os nós são tipados e assumem somente um tipo possível.

2.1.3 (Lableled) Property Graphs.

A property graph allows a set of property–value pairs and a label to be associated with nodes and edges, offering additional flexibility when modelling data ... Though not yet standardised, property graphs are used in popular graph databases, such as Neo4j... these additional details likewise require more intricate query languages, formal semantics, and inductive techniques versus simpler graph models ...

No LPG as arestas podem ter propriedades do tipo chave/valor, nos del graphs somente os nós possuem propriedades. No LPG é possível associar contexto a cada relação entre os nós.

2.1.4 Graph Dataset.

A graph dataset allows for managing several graphs and consists of a set of named graphs and a default graph. Each named graph is a pair of a graph ID and a graph. The default graph is a graph without an ID and is referenced “by default” if a graph ID is not specified ... Graph datasets are useful for managing and querying data from multiple sources [48], where each source can be managed as a separate graph, allowing individual graphs to be queried, updated, removed, and so on, as needed.

Other Graph Data Models exist with nodes that may contain individual edges or even nested graphs (a.k.a. hypernodes). Likewise, hypergraphs allow edges that connect sets rather than pairs of nodes.

Nonetheless, data can typically be converted from one model to another; in our view, a knowledge graph can thus adopt any such graph data model.

Não tem uma definição para hiper relacional

2.2 Querying

2.2.1 Graph Patterns

homomorphism-based semantics, which allows multiple variables to be mapped to the same term ... (this semantics is adopted by SPARQL); and isomorphism-based semantics, which requires variables on nodes and/or edges to be mapped to unique terms, ... (this semantics is adopted by Cypher for edge variables).

No caso do SPARQL teria que incluir uma cláusula para filtrar onde ?vn1 != ?vn2 (exemplo da figura 5)

2.2.2 Complex Graph Patterns

operators

projection (π, a.k.a. SELECT),

selection (σ, a.k.a. WHERE or FILTER),

union (∪, a.k.a. UNION),

difference (−, a.k.a. EXCEPT),

inner joins (⋈, a.k.a. NATURAL JOIN),

left outer join (⟕, a.k.a. LEFT OUTER JOIN or OPTIONAL),

anti-join (⊳, a.k.a. NOT EXISTS)

Complex graph patterns can give rise to duplicate results.... Query languages typically offer two semantics: bag semantics preserves duplicates according to the multiplicity of the underlying mappings, while set semantics (a.k.a. DISTINCT) removes duplicates from the results.

2.2.3 Navigational Graph Patterns.

regular path query (x,r,y)

r is a constant (an edge label)

r− (inverse)

r* (Kleene star: 0-or-more)

r1 | r2 (disjunction)

r1 · r2 (concatenation)

In fact, since a cycle is present, an infinite number of paths are potentially matched. For this reason, restricted semantics are often applied, returning only the shortest paths, or paths without repeated nodes or edges (as in the case of Cypher) Rather than returning paths, another option is to instead return the (finite) set of pairs of nodes connected by a matching path (as in the case of SPARQL 1.1).

2.3 Validation

2.3.1 Shapes Graphs.

A shape targets a set of nodes in a data graph and specifies constraints on those nodes. The shape’s target can be specified manually, using a query, and so on. A shapes graph is then formed from a set of interrelated shapes. Shapes graphs can be depicted as UML-like class diagrams ... defining constraints on four interrelated shapes. Each shape - denoted with a box ... - is associated with a set of constraints.
When declaring shapes, the data modeller may not know in advance the entire set of properties that some nodes can have. An open shape allows the node to have additional properties not specified by the shape, while a closed shape does not.

2.3.2 Conformance.

A node conforms to a shape if it satisfies all of the constraints of the shape. The conformance of a node to a shape may depend on the conformance of other nodes to other shapes; ... A graph is valid with respect to a shapes graph (and its targets) if and only if every node that each shape targets conforms to that shape; ...

2.3.3 Other Features.

Two shapes languages with such features have been proposed for RDF graphs: Shape Expressions (ShEx); and SHACL (Shapes Constraint Language). These languages also support additional features; for example, SHACL supports constraints expressed using graph queries in the SPARQL language. More details about ShEx and SHACL can be found in the book by Labra Gayo et al. [75]. Similar ideas have been proposed by Angles [3] for property graphs.

2.4 Context

Definition: By context, we herein refer to the scope of truth, and thus talk about the context in which some data are held to be true

2.4.1 Direct Representation.

The first way to represent context is to consider it as data no different from other data. ... a number of specifications have been proposed to directly represent context in a more standard way, including the Time Ontology [23] for temporal context, the PROV Data Model [39] for provenance, and so on.

2.4.2 Reification.

Often, we may wish to directly define the context of edges themselves ... three forms of reification for modelling temporal context [50]: RDF reification [24], n-ary relations [24], and singleton properties [91]. Unlike in a direct representation, e is seen as denoting an edge in the graph, not a flight.

2.4.3 Higher-arity Representation.

We can also use higher-arity representations—that extend the graph model—for encoding context. ... named graph, property graph and RDF*

2.4.4 Annotations.

While the previous alternatives are concerned with representing context, annotations allow for defining contexts, which enables automated context-aware processing of data. ...

2.4.5 Other Contextual Frameworks.

Other frameworks for modelling and reasoning about context in graphs include that of contextual knowledge repositories [58].... A similar framework, proposed by Schuetz et al. [120], is based on OLAP-like operations over contextual dimensions

3 DEDUCTIVE KNOWLEDGE

Given the data as premises and some general rules about the world that we may know a priori , we can use a deductive process to derive new data, allowing us to know more than what is explicitly given to us by the data. ... Machines do not have inherent deductive faculties, but rather need entailment regimes to formalise the logical consequence of a given set of premises. Once instructed in this manner, machines can (often) apply deductions with a precision, efficiency, and scale beyond human performance. ... which potentially complex entailments can be expressed and automated. Though we could leverage a number of logical frameworks for these purposes - such as First-order Logic, Datalog, Prolog, Answer Set Programming, and so on - we focus on ontologies, which constitute a formal representation of knowledge that, importantly for us, can be represented as a graph; in other words, ontologies can be seen as knowledge graphs with well-defined meaning.

3.1 Ontologies

To enable entailment, we must be precise about what the terms we use mean.
In computing, an ontology is then a concrete, formal representation—a convention—on what terms mean within the scope in which they are used (e.g., a given domain). Like all conventions, the usefulness of an ontology depends on how broadly and consistently it is adopted and how detailed it is. Knowledge graphs that use a shared ontology will be more interoperable. Given that ontologies are formal representations, they can further be used to automate entailment.

3.1.1 Interpretations.

We thus interpret the data graph as another graph - what we here call the domain graph - composed of real-world entities connected by real-world relations. The process of interpretation, here, involves mapping the nodes and edges in the data graph to nodes and edges of the domain graph.

We can thus abstractly define an interpretation [7] of a data graph as the combination of a domain graph and a mapping from the terms (nodes and edge-labels) of the data graph to those of the domain graph. The domain graph follows the same model as the data graph. We refer to the nodes of the domain graph as entities and the edges of the domain graph as relations.

3.1.2 Assumptions.

Why is this abstract notion of interpretation useful? The distinction between nodes/edges and entities/relations becomes clear when we define the meaning of ontology features and entailment.

Under the Closed World Assumption (CWA) - which asserts that what is not known is assumed false - without further knowledge the answer is no. Conversely, under the Open World Assumption (OWA), it is possible for the relation to exist without being described by the graph.

Under the Unique Name Assumption (UNA), which states that no two nodes can map to the same entity .... Conversely, under the No Unique Name Assumption (NUNA), we can only say that there is at least one ... may be the same entity with two “names” (i.e., two nodes referring to the same entity).

These assumptions define which interpretations are valid and which interpretations satisfy which data graphs. The UNA forbids interpretations that map two nodes to the same entity, while the NUNA does not. Under CWA, an interpretation that contains an edge ... in its domain graph can only satisfy a data graph from which we can entail .... . Under OWA, an interpretation containing the edge .... can satisfy a data graph not entailing .... so long it does not contradict that edge.

Ontologies typically adopt the NUNA and OWA, i.e., the most general case, which considers that data may be incomplete, and two nodes may refer to the same entity.

3.1.3 Semantic Conditions. ... subproperty of

3.1.4 Individuals. .... We may further assert that two terms refer to the same entity (same as) ... or that two terms refer to different entities (different from)

Ontology Features for Individuals

3.1.5 Properties. Properties denote terms that can be used as edge-labels

We may also associate classes with properties by defining their domain and range. We may further state that a pair of properties are equivalent, inverses, or disjoint, or define a particular property to denote a transitive, symmetric, asymmetric, reflexive, or irreflexive relation.

Ontology Features for Property Axioms

3.1.6 Classes. .... group nodes in a graph into classes ... (classes, subClassOf)

Ontology Features for Class Axioms and Definitions

3.2 Semantics and Entailment

3.2.2 Entailment. We say that one graph entails another if and only if any model of the former graph is also a model of the latter graph

3.3 Reasoning

Given two graphs, deciding if the first entails the second ... is undecidable:

No (finite) algorithm for such entailment can exist that halts on all inputs with the correct true/false answer [53]. However, we can provide practical reasoning algorithms for ontologies that (1) halt on any input ontology but may miss entailments, returning false instead of true, (2) always halt with the correct answer but only accept input ontologies with restricted features, or (3) only return correct answers for any input ontology but may never halt on certain inputs.

3.3.1 Rules.

A straightforward way to implement reasoning is through inference rules (or simply rules), composed of a body (if) and a head (then). Both the body and head are given as graph patterns. A rule indicates that if we can replace the variables of the body with terms from the data graph and form a subgraph of a given data graph, then using the same replacement of variables in the head will yield a valid entailment. The head must typically use a subset of the variables appearing in the body to ensure that the conclusion leaves no variables unreplaced.

Inferências do tipo SE->ENTÃO

Rules of this form correspond to (positive) Datalog in databases, Horn clauses in logic programming, and so on.

Rules can be used for reasoning in a number of ways. Materialisation applies rules recursively to a graph, adding entailments back to the graph until nothing new can be added. The materialised graph can then be treated as any other graph; however, the materialised graph may become unfeasibly large to manage.

Another strategy is to use rules for query rewriting, which extends an input query to find entailed solutions.

3.3.2 Description Logics.

Description Logics (DLs) hold an important place in the logical formalisation of knowledge graphs: They were initially introduced as a way to formalise the meaning of frames [85] and semantic networks [107] (which can be seen as predecessors of knowledge graphs) and also heavily influenced OWL.

4 INDUCTIVE KNOWLEDGE

Inductive reasoning generalises patterns from input observations, which are used to generate novel but potentially imprecise predictions. ... Predictions may thus have a level of confidence

We then refer to knowledge acquired inductively as inductive knowledge, which includes both the models that encode patterns and the predictions made by those models.

Supervised methods learn a function (a.k.a. model) to map a set of example inputs to their labelled outputs; the model can then be applied to unlabelled inputs. To avoid costly labelling, some supervised methods can generate the input–output pairs automatically from the (unlabelled) input, which are then fed into a supervised process to learn a model; herein, we refer to this approach as self-supervision. Alternatively, unsupervised processes do not require labelled input–output pairs, but rather apply a predefined function (typically statistical in nature) to map inputs to outputs

unsupervised - KG analytics: well-known algorithms are used to detect communities or clusters, find central nodes and edges

self-supervised - KG embeddings: learn a low-dimensional numerical model of elements of a knowledge graph.

supervised - GNN: The structure of graphs can also be directly leveraged for learning through graph neural networks.

self-supervised - symbolic learning: logical formulae in the form of rules or axioms ... Essão não seria o Dedutivo?

4.1 Graph Analytics

4.1.1 Graph Algorithms.

Centrality analysis: degree, betweenness, closeness, Eigenvector, PageRank, HITS, Katz

Community detection: minimum-cut algorithms, label propagation, Louvain modularity

Connectivity analysis aims to estimate how well-connected and resilient the graph is: measuring graph density or k-connectivity, detecting strongly connected components and weakly connected components, computing spanning trees or minimum cuts,

Node similarity aims to find nodes that are similar to other nodes by virtue of how they are connected within their neighbourhood. Node similarity metrics may be computed using structural equivalence, random walks, diffusion kernels,

Graph summarisation aims to extract high-level structures from a graph, often based on quotient graphs, where nodes in the input graph are merged while preserving the edges between the input nodes

Many such techniques have been proposed and studied for simple graphs or directed graphs without edge labels. An open research challenge in the context of knowledge graphs is to adapt such algorithms for graph models such as del graphs, heterogeneous graphs and property graphs

4.1.2 Graph Processing Frameworks.

Various graph parallel frameworks have been proposed for large-scale graph processing, including Apache Spark (GraphX), GraphLab, Pregel, Signal–Collect, Shark.

4.2 Knowledge Graph Embeddings

Machine learning can be used for directly refining a knowledge graph [100]; or for downstream tasks using the knowledge graph, such as recommendation [155], information extraction [135], question answering [60], query relaxation [139], query approximation [45], and so on.

A first attempt to represent a graph using vectors would be to use a one-hot encoding, generating a vector of length |L| x|V| for each node—with |V | the number of nodes in the input graph and |L| the number of edge labels—placing a one at the corresponding index to indicate the existence of the respective edge in the graph, or zero otherwise. Such a representation will, however, typically result in large and sparse vectors, which will be detrimental for most machine learning models.

Para cada nó gerar um vetor com dimensão d = |L| x|V| do tipo one-hot, ou seja, com 0 e 1

The main goal of knowledge graph embedding techniques is to create a dense representation of the graph (i.e., embed the graph) in a continuous, low-dimensional vector space that can then be used for machine learning tasks.

Será que a manipulação deste tipo de representação armazenada em banco de dados seria mais eficiente?

The dimensionality d of the embedding is fixed and typically low (often, e.g., 50 ≥ d ≥ 1000). Typically the graph embedding is composed of an entity embedding for each node: a vector with d dimensions that we denote by e; and a relation embedding for each edge label: (typically) a vector with O(d) dimensions that we denote by r. The overall goal of these vectors is to abstract and preserve latent structures in the graph.

Given a data graph, the goal is then to compute the embeddings of dimension d that maximise the plausibility of positive edges (typically edges in the graph) and minimise the plausibility of negative examples (typically edges in the graph with a node or edge label changed such that they are no longer in the graph) according to the given scoring function. The resulting embeddings can then be seen as models learned through self-supervision that encode (latent) features of the graph, mapping input edges to output plausibility scores. Embeddings can then be used for a number of low-level tasks.

The plausibility scoring function can be used to assign confidence to edges (possibly extracted from an external source) or to complete edges with missing nodes/edge labels (a.k.a. link prediction). Additionally, embeddings will typically assign similar vectors to similar terms and can thus be used for similarity measures.

Translational models where relations are seen as translating subject entities to object entities.

Translational models interpret edge labels as transformations from subject nodes (a.k.a. the source or head) to object nodes (a.k.a. the target or tail)

TransE

Over all positive edges s->p->o, TransE learns vectors es, rp, and eo aiming to make es +rp as close as possible to eo. Conversely, if the edge is negative, then TransE attempts to learn a representation that keeps es +rp away from eo.

many variants of TransE have been investigated, typically using a distinct hyperplane (e.g., TransH [144]) or vector space (e.g., TransR [77], TransD [64]) for each type of relation. Recently, RotatE [130] proposes translational embeddings in complex space, which allows to capture more characteristics of relations, such as direction, symmetry, inversion, antisymmetry, and composition.

Embeddings have also been proposed in non-Euclidean space;

Tensor decomposition models that extract latent factors approximating the graph’s structure.

A tensor is a multidimensional numeric field that generalises scalars (0-order tensors), vectors (1-order tensors), and matrices (2-order tensors) towards arbitrary dimension/order. Tensor decomposition involves decomposing a tensor into more “elemental” tensors (e.g., of lower order) from which the original tensor can be recomposed (or approximated) by a fixed sequence of basic operations. These elemental tensors can be seen as capturing latent factors in the original tensor. There are many approaches to tensor decomposition ...

vetores ... (n vetores) = matrizes ... (n matrizes) = tensores

To compute knowledge graph embeddings with such techniques, a graph can be encoded as a one-hot 3-order tensor G with |V |×|L|×|V | elements, where the element (G)ijk =1 if the ith node links to the kth node with the jth edge label (otherwise (G)ijk =0).

DistMult [152] is a seminal method for computing knowledge graph embeddings based on rank
decompositions, where each entity and relation is associated with a vector of dimension d, such
that for an edge <s,p,o> , a plausibility scoring function F is defined, where (es)i, (rp)i and (eo)i denote the ith elements of vectors es, rp, eo, respectively. The goal, then, is to learn vectors for each node and edge label that maximise the plausibility of positive edges and minimise the plausibility of negative edges. ... DistMult does not capture edge direction.

Rather than use a vector as a relation embedding, RESCAL [93] uses a matrix, which allows for combining values from es and eo across all dimensions and thus can capture (e.g.) edge direction. However, RESCAL incurs a higher cost in terms of space and time than DistMult. Recently, ComplEx [132] and HolE [92] both use vectors for relation and entity embeddings, but ComplEx uses complex vectors, while HolE uses a circular correlation operator (on reals) [57] to capture edge-direction.

4.2.3 Neural Models.

A number of approaches rather use neural networks to learn knowledge graph embeddings with non-linear scoring functions for plausibility. An early neural model was Semantic Matching Energy (SME) [41], which learns parameters (a.k.a. weights: w, w′) for two functions—fw(es,rp) and дw′ (eo,rp)—such that the dot product of the result of both functions gives the plausibility score. Both linear and bilinear variants of fw and дw′ are proposed.

Another early proposal was Neural Tensor Networks (NTN) [123], which maintains a tensor W of weights and computes plausibility scores by combining the outer product es ⊗W ⊗eo with rp and a standard neural layer over es and eo. The tensor W yields a high number of parameters, limiting scalability [140].

Multi Layer Perceptron (MLP) [31] is a simpler model, where es, rp, and eo are concatenated and fed into a hidden layer to compute the plausibility score.

More recent models use convolutional kernels. ConvE [29] generates a matrix from es and rp by “wrapping” each vector over several rows and concatenating both matrices, over which (2D) convolutional layers generate the embeddings. A disadvantage is that wrapping vectors imposes
an arbitrary two-dimensional structure on the embeddings. HypER [8] also uses convolutions, but avoids such wrapping by applying a fully connected layer (called the “hypernetwork”) to rp to generate relation-specific convolutional filters through which the embeddings are generated.

4.2.4 Language Models.

Embedding techniques were first explored as a way to represent natural language within machine learning frameworks, with word2vec [83] and GloVe [102] being two seminal approaches. Both approaches compute embeddings for words based on large corpora of text such that words used in similar contexts (e.g., “frog,” “toad”) have similar vectors. Approaches for language embeddings can be applied for graphs. However, while graphs consist of an unordered set of sequences of three terms (i.e., a set of edges), text in natural language consists of arbitrary-length sequences of terms (i.e., sentences of words). Along these lines, RDF2Vec [109] performs biased random walks on the graph and records the paths traversed as “sentences,” which are then fed as input into the word2vec [83] model.

KGloVe [22] is based on the GloVe model. Much like how the original GloVe model [102] considers words that co-occur frequently in windows of text to be more related, KGloVe uses personalised PageRank to determine the most related nodes to a given node, whose results are then fed into the GloVe model.

4.2.5 Entailment-aware Models.

The embeddings thus far consider the data graph alone. But what if an ontology or set of rules is provided? One may first consider using constraint rules to refine the predictions made by embeddings. Wang et al. [141] use functional and inverse-functional definitions as constraints (under UNA); for example, if we define that an event can have at most one value for venue, then the plausibility of edges that would assign multiple venues to an event is lowered.

More recent approaches rather propose joint embeddings that consider both the data graph and rules.

4.3 Graph Neural Networks

Rather than compute numerical representations for graphs, an alternative is to define custom
machine learning architectures for graphs. Most such architectures are based on neural networks [145] given that a neural network is already a directed weighted graph, where nodes serve as artificial neurons, and edges serve as weighted connections (axons). However, the topology of a traditional (fully connected feed-forward) neural network is quite homogeneous, having sequential layers of fully connected nodes. Conversely, the topology of a data graph is typically more heterogeneous.

A graph neural network (GNN) [117] is a neural network where nodes are connected to their
neighbours in the data graph. Unlike embeddings, GNNs support end-to-end supervised learning
for specific tasks: Given a set of labelled examples, GNNs can be used to classify elements of the
graph or the graph itself.

4.3.1 Recursive Graph Neural Networks.

Recursive graph neural networks (RecGNNs) are the seminal approach to graph neural networks [117, 124]. The approach is conceptually similar to the abstraction illustrated in Figure 17, where messages are passed between neighbours towards recursively computing some result. However, rather than define the functions used to decide the messages to pass, we rather give labelled examples and let the framework learn the functions.

4.3.2 Convolutional Graph Neural Networks.

Convolutional neural networks (CNNs) have gained a lot of attention, in particular, for machine learning tasks involving images

Both GNNs and CNNs work over local regions of the input data: GNNs operate over a node and its neighbours in the graph, while (in the case of images) CNNs operate over a pixel and its neighbours in the image. Following this intuition, a number of convolutional graph neural networks (ConvGNNs) [145]—a.k.a. graph convolutional networks (GCNs) [71]—have been proposed, where the transition function is implemented by means of convolutions. A benefit of CNNs is that the same kernel can be applied over all the regions of an image, but this creates a challenge .... the neighbourhoods of different nodes in a graph can be diverse. An alternative is to use an attention mechanism [136] to learn the nodes whose features are most important to the current node.

Aside from architectural considerations, there are two main differences between RecGNNs and ConvGNNs. First, RecGNNs aggregate information from neighbours recursively up to a fixpoint, whereas ConvGNNs typically apply a fixed number of convolutional layers. Second, RecGNNs typically use the same function/parameters in uniform steps, while different convolutional layers of a ConvGNN can apply different kernels/weights at each distinct step.

4.4 Symbolic Learning

The supervised techniques discussed thus far learn numerical models that are hard to interpret; ... Embeddings further suffer from the out-of-vocabulary problem, where they are often unable to provide results for inputs involving previously unseen nodes or edge-labels. ... An alternative is to use symbolic learning to learn hypotheses in a logical (symbolic) language that “explain” sets of positive and negative edges. Such hypotheses are interpretable ...

4.4.1 Rule Mining.

Rule mining, in the general sense, refers to discovering meaningful patterns in the form of rules from large collections of background knowledge. In the context of knowledge graphs, we assume a set of positive and negative edges as given. The goal of rule mining is to identify new rules that entail a high ratio of positive edges from other positive edges, but entail a low ratio of negative edges from positive edges.

rules are not assumed to hold in all cases, but rather are associated with measures of how well they conform to the positive and negative edges. In more detail, we call the edges entailed by a rule and the set of positive edges (not including the entailed edge itself) the positive entailments of that rule. The number of entailments that are positive is called the support for the rule, while the ratio of a rule’s entailments that are positive is called the confidence for the rule [127]. The goal is to find rules with both high support and confidence.

While similar tasks have been explored for relational settings with Inductive Logic Programming (ILP) [27], when dealing with an incomplete knowledge graph (under OWA), it is not immediately clear how to define negative edges. A common heuristic is to adopt a Partial Completeness Assumption (PCA) [36], which considers the set of positive edges to be those contained in the data graph, and the set of negative examples <x,p,y'> to be the set of all edges not in the graph but where there exists a node y such that <x,p,y> is in the graph.

An influential rule-mining system for graphs is AMIE [36, 37], which adopts the PCA measure of confidence and builds rules in a top-down fashion [127] starting with rule heads of the form ⇒ <?x, country, ?y> for each edge label. For each such rule head, three types of refinements are considered,... Combining refinements gives rise to an exponential search space that can be pruned.

The RuLES system [54] also learns non-monotonic rules and extends the confidence measure to consider the plausibility scores of knowledge graph embeddings for entailed edges not appearing in the graph. In lieu of PCA, the CARL system [101] uses knowledge of the cardinalities of relations to find negative edges, while d’Amato et al. [25] use ontologically entailed negative edges for measuring the confidence of rules generated by an evolutionary algorithm.

4.4.2 Axiom Mining.

Aside from rules, more general forms of axioms—expressed in logical languages such as DLs (see Section 3.3.2)—can be mined from a knowledge graph. We can divide these approaches into two categories: those mining specific axioms and more general axioms.

Among works mining specific types of axioms, disjointness axioms are a popular target ...

(general axioms) A prominent such system is DL-Learner [20], which is based on algorithms for class learning (a.k.a. concept learning), whereby given a set of positive nodes and negative nodes, the goal is to find a logical class description that divides the positive and negative sets.

Aqui teria potencial de gerar um "esquema" para o grafo.

5 SUMMARY AND CONCLUSION

General trends include: (1) the use of knowledge graphs to integrate and leverage data from diverse sources at large scale; and (2) the combination of deductive (rules, ontologies, etc.) and inductive techniques (machine learning, analytics, etc.) to represent and accumulate knowledge.

Particularly interesting topics for knowledge graphs arise from the intersections of areas. In the intersection of data graphs and deductive knowledge, we emphasise emerging topics such as formal semantics for property graphs, with languages that can take into account the meaning of labels and property–value pairs on nodes and edges [74]; and reasoning and querying over contextual data, to derive conclusions and results valid in a particular setting [58, 120, 156].

In the intersection of data graphs and inductive knowledge, we highlight topics such as similarity-based query relaxation, allowing to find approximate answers to exact queries based on numerical representations (e.g., embeddings) [139]; shape induction, to extract and formalise inherent patterns in the knowledge graph as constraints [82]; and contextual knowledge graph embeddings that provide numeric representations of nodes and edges that vary with time, place, and so on [67, 154]. Finally, in the intersection of deductive and inductive knowledge, we mention the topics of entailment-aware knowledge graph embeddings [28, 43], which incorporate rules and/or ontologies when computing plausibility; expressive graph neural networks proven capable of complex classification analogous to expressive ontology languages [11]; as well as further advances on rule and axiom mining, allowing to extract symbolic, deductive representations from the knowledge graphs [20, 37]

[12] L. Bellomarini, E. Sallinger, and G. Gottlob. 2018. The Vadalog system: Datalog-based reasoning for knowledge graphs. Proc. oVLDB Endow. 11, 9 (2018)

[45] W. L. Hamilton, P. Bajaj, M. Zitnik, D. Jurafsky, and J. Leskovec. 2018. Embedding logical queries on knowledge graphs. In Proc. of NIPS.

[58] M. Homola and L. Serafini. 2012. Contextualized knowledge repositories for the semantic web. J. Web Seman. 12 (2012).[60]
[67] S. M. Kazemi, R. Goel, K. Jain, I. Kobyzev, A. Sethi, P. Forsyth, and P. Poupart. 2019. Relational representation learning for dynamic (knowledge) graphs: A survey. CoRR abs/1905.11485 (2019).

[100] H. Paulheim. 2017. Knowledge graph refinement: A survey of approaches and evaluation methods. Seman. Web J. 8, 3 (2017).[120]
[139] M. Wang, R. Wang, J. Liu, Y. Chen, L. Zhang, and G. Qi. 2018. Towards empty answers in SPARQL: Approximating querying with RDF embedding. In Proc. of ISWC.[154]
[155] F. Zhang, N. J. Yuan, D. Lian, X. Xie, and W. Ma. 2016. Collaborative knowledge base embedding for recommender systems. In Proc. of SIGKDD.

[156] A. Zimmermann, N. Lopes, A. Polleres, and U. Straccia. 2012. A general framework for representing, reasoning and querying with annotated semantic web data. J. Web Seman. 12 (Mar. 2012).

Pesquisa de Doutorado da Veronica

Pesquisar este blog

Knowledge Graphs @ ACM Computing Surveys - Leitura de Artigo

Marcadores

Comentários

Postar um comentário

Postagens mais visitadas deste blog

Connected Papers: Uma abordagem alternativa para revisão da literatura

Knowledge Graph Embedding with Triple Context - Leitura de Abstract

Exploratory Search: From Finding to Understanding - Leitura de Artigo