Microsoft: DAT278x
From Graph to Knowledge Graph – Algorithms and Applications
Module 5: Knowledge Graph Inference and Applications
Knowledge Graph Inference
Task based on known facts (like triples from KG), infer new triples, link prediction, adding knowledge (KG completion), with or without external sources
We need the knowledge graph inference because the knowledge graph itself is largely incomplete.
1) Within existing KG
1.1 Modeling approaches
What is statistical relational learning? From it's name we can tell it's about creating statistical models for relational data.
adjacency tensor KG (m types of relations, three dimensions) = adjacency matrix Graph (single type of relation, two dimensional matrix)
To predict unobserved triples each prediction value is a probability between zero and one.
1.2 Graph feature model
If we assume all YIJK are conditionally independent given observed graph features, and additional parameters = We call it graph feature model.
Similar entities are likely to be related: Local (common neighbors), Global (random walks -page rank algorithm) e Path Ranking Algorithm
In general, the global similarity approaches generates significantly better results than local similarity approaches. However, it is also, computationally, much more expensive, especially when it comes to knowledge graphs scenarios.With multiple relations in play, a single node may have a huge amount of neighbors. It is significantly more expensive if it's not completely prohibitive yet.To mitigate this expensive computational cost, we have a third approach, called the quasi-local approach, which can be considered as random walk with bounded length -> Path Ranking Algorithm (PRA)
PRA = between a source-target node pair s and t, we first find all possible relation paths between s and t, where a relation path is defined as a sequence of relations. The paths length must be smaller
than a predefined number L.
1.3 Latent feature models (KG embeddings)
If we assume all YJK are conditionally independent given latent features associated with subject, object, and the relation types, and additional parameters = We call it latent feature model (knowledge graph embeddings)
low dimensional representations: K is a dimension for the embedded space.
N K-dimensional vectors that each one represent a node in the graph = graph embedding
N K-dimensional vectors for each entity + M K-dimensional vectors for each entity = KG embedding
The embedding representation of entities and relations are trained by defining a scoring function on triples, such that the score of correct triple is greater than the score of corrupted triples for a given relation.
That positive triples are obtained from the knowledge graph, while the negative triples are negative samples generated based on positive triples. The negative samples are generated by randomly replacing one entity or relation type in existing positive triples.
Entity vector: The variations from different models come from how it gets initialized. In most models, it initializes randomly just as embedding models in many other domains. However, since each entity in the knowledge graph usually is associated with some natural language word representation, at least with the entity's name, some model would initialize the entity vector as the average word vector with pre-trained word embeddings.
The last component is an embedding scoring function, which is objective that the learning algorithms
are trying to optimize on.
Bordes and his colleagues proposed the TransE at 2013. A method that models relationships by interpreting them as translation operating on entity vectors. In 2015, Yang and her colleagues from Microsoft Research generalize the previous works under a unified learning framework and they also demonstrate a simple bilinear formulation which can be abbreviated as DistMult that outperformed previous models for link prediction task on knowledge graph.
RESCAL
In this model, it's using a vector to represent entity, and an asymmetric matrix to represent the relation. The relation matrix R is asymmetric in RESCAL which takes into account whether a latent component
occurs as a subject or an object.RESCAL is similar to methods used in recommendation systems, and is also similar to traditional tensor factorization measures.
Entity representation : Vector (initialized randomly)
Relation representation : Matrix
Entity-relation interaction : Bilinear
STRUCTURED EMBEDDING
Entity representation : Vector (initialized randomly)
Relation representation : two Matrices (one for subject and other for object)
Entity-relation interaction : Linear
Scoring function: margin-based ranking loss (1-norm distance in the transformed embedding space)
NTN
Entity representation : Vector (initialized with pre-trained word vector)
Relation representation : Tensor (Matrix & vector)
Entity-relation interaction : Bilinear + Linear
Scoring function: margin-based ranking loss
TransE
Entity representation : Vector
Relation representation : Vector (Modeling relations as translations)
Entity-relation interaction : Linear
Scoring function: margin-based ranking loss
It's inspired by the results of word2vec work by Mikolov, which showed that relationships between words can be computed by their vector differences in the embedding space.
DISTMULT
Entity representation : Vector
Relation representation : Vector
Entity-relation interaction : simplification of the Bilinear
Scoring function: margin-based ranking loss
Where the relation vector is considered as a diagonal element in the diagonal matrix that represents the relation. So we can have the bilinear interaction between entities and the relation. In DISTMULT model, it contains the same number of relation parameters as TransE. And it is a very efficient and effective model which normally outperformed over TransE on link prediction task, and other more expressive models we talked earlier.
RESCAL and the Structured Embedding, they both use matrix to represent a relation. The difference is, RESCAL is a bilinear interaction between entities and the relation. Hence, one matrix is used for each relation. While Structured Embedding is linear interaction between entity and the relation. Hence, two matrices are used for each relation. One, to operate on subject entity. And the other on object entity.
The last two models TransE and the DISTMULT, with the simplest representation on relation. Both with a single vector. We can see that the overall trends for modeling on knowledge graph embedding is to gradually simplify the representation of the relations.
With all the models, a natural question is, who is the best of them?
Unfortunately, we do not have a unified answer for it. It's clearly that the best model would be
both dataset dependent and task dependent. And we may also need to consider the trade-off
between the result precision with the models' complexity.
• Entity Recommendation
1) Co-ocorrence based
Example: Harvard University (entity type University), Related People (Alumini and Teachers), others universities (that are also frequently searched)
Dado Y qual(quais) X recomendar
Co-occurrence (X, Y) => Freq_Juntos (X, Y) / Freq_Total (Y)
Co-occurrence, usually, is a very strong user provided signal and it reflects the wisdom of the crowds.
Based on the source of the co-occurrence venue, or where the co-occurrence is collected, they can be categorized into three groups.
1.1) Search related user behavior signal (Within Queries, Across Queries, User Url Clicks)
Within Queries: flight from Seatle to Las Vegas (Location), Computer Vision Machine Learning (Topics from Computer Science) ... Entity Linking + Frequency
Across Queries: queries within a search session or a period of time from the same user (Browser, IP)
1.2) Wikipedia content which can be obtained by graph link analysis (Wikipedia Pages (include Wikidata); Wikipedia Categories/Templates; Wikipedia Revision Histories) - external structured sources
All Wikipedia articles are entities in many public knowledge base such as Wikidata and DBpedia.
These Wikipedia articles are also an important the source and indispensable part in many well-known in-house knowledge base such as Google's Knowledge graph and the Microsoft Satori knowledge graph.
By leveraging such hyperlink structure of Wikipedia, we can use graph link analysis to generate an effective and a low cost measure of semantic relatedness. Based on such semantic relatedness measure
we can obtain the top related entities. A simple and intuitive explanation for this is to consider how many shared direct linking entities between two given entities. If a great portion of linked entities for two entities are shared, which implies these two entities are semantically alike each other. We can consider this as a specific type of co-occurrence which is called co-mentioning.
For any single Wikipedia page, it could have belonged to one or more categories or templates.
For any given entity pairs, we can extract the categories they belong to and compute the relatedness
based on path found along the category taxomony, or, in this case, the co-occurrence means, if two entities shared a good portion of tracing up their respective category hierarchy.
1.3) All kinds of general web documents.
For example, when people are searching for one conference like WWW or the World Wide Web conference, what are related conferences that we can recommend?
The previous mentioned two co-occurrence sources would not work for us since such academic intent queries belongs to tail queries and are quite sparse. And many conferences may not have their Wikipedia pages yet. - specific domain requires specific approach
So we have decided to use the research labs publication pages or academic researchers home page
to extract the useful signals. Here, we have used the assumption that the same researcher or same research labs their work is usually in the specific domain and the venues that they publish their work
are highly related or similar to each other. From these web documents, we can extract the conferences venues and count the co-occurrence if two conferences appear in the same web page.
2) Similarity based
Cosine Similarity (X, Y) => Similarity (X, Y) / Soma_Similarity (*, Y)
Textual similarity: In general, the more a word appears in current documents, compared with its frequency in the whole corpus, TF-IDF value would be higher. However, it only takes care of the word similarity on the surface value, but suffers the vocabulary mismatch problem. Or it cannot detect the semantic equivalent words, such as USA and the United States of America, since they shares no terms in common.
Distributional similarity = Distributional semantics ... It is based on very old distributional hypothesis
that terms appear in similar context or which can also called with similar distribution have similar meanings. These hypothesis and the solution approach generates embeddings, which is to project entities into low dimensional latent space, which can semantically represent the entity.
Embeddings:
Word Embeddings: Mikolov in 2013. .. And it automatically learns male-female relationships
with the induced word vector representation which demonstrates that the word vector representation
captures synthetic regularities. .. two novel model architecture: continuous bag of words (CBOW) and continuous skip-gram. ... And it also made large improvement in accuracy for tasks measuring syntactic
and semantic word similarities.
So now we have word embeddings, how can we generate entity embeddings, then?
The most straight forward approach is to first perform entity linking on documents. And then we treat each entity as a single word. And then we can learn representation using Word2Vec or Glove.
Similarity calculated on embedding space vector representation of entities.
• Question and Answering (NLP e IR)
Question answering: using a computer system to understand natural language questions posed by humans and automatically generate answers.
Closed domain question answering, it deals with questions within a specific domain. ... domain specific knowledge could be stored in formalized ontologies.
Open domain question answering would handle questions almost about anything. So it would have to leverage general oncologies and world knowledge, also requires a lot more data.
There are usually a few sub steps in KB based QA system such as understanding the questions, which includes question analysis, phrase mapping and disambiguation. Then to construct a query against a knowledge base, to retrieve the answers from the knowledge base.
There are various different ways to ask the same question. How to handle such diverse language expressions that are actually asking about exactly the same thing?
General knowledge has very rich relation information. The search space could be very large as some entities may have too many immediate neighbors.
The question required more than one simple query. Some composite results across multiple simple queries would be required. How can we effectively construct queries to handle such composite relationship.
Need to map questions to the predicates defined in KB
KG based
To use semantic parsing to query knowledge base, here we compare two approaches.
Generic semantic parsing and then match to the knowledge graph ontology. It would require two steps.
First is to translate the natural language question to a generic purpose lambda formula and then matching the knowledge base ontology to generate the knowledge based specific lambda formula.
Knowledge based specifics semantic parsing.
VER -> Embeddings [Bordes et al., EMNLP-2014]
Web based with KG enrichment
Semantic parsing is a very difficult task, due to potential ontology mismatch, which means that we may not be able to construct the proper lambda form query to query the knowledge base.
Knowledge base is largely incomplete, there could be missing entities, relations, or attribute values, so that the question cannot be answered by the existing information in the knowledge base.
It tries to understand the questions by doing question type detection, NER parsing, and candidate ranking steps, through directly interacting with the set of web documents. Entity linking is used during the understanding stage. This approach could generate better answer candidates by mapping answers to entities in knowledge base. It would also be able to merge mentions of same entities to a single candidate. It can also leverage rich information about entities in knowledge base.
Comentários
Postar um comentário
Sinta-se a vontade para comentar. Críticas construtivas são sempre bem vindas.