Vídeo -> https://youtu.be/ZyYec3X4pkY
Abstract: Search engines and other information systems have started to evolve from retrieving documents to providing more intelligent information access. However, the evolution is still in its infancy due to computers’ limited ability in representing and understanding human language. This talk will present my work addressing these challenges with knowledge graphs.
The first part is about utilizing entities from knowledge graphs to improve search. I will discuss how we build better text representations with entities and how the entity-based text representations improve text retrieval.
The second part is about better text understanding through modeling entity salience (importance), as well as how the improved text understanding helps search under both feature-based and neural ranking settings. This talk concludes with future directions towards the next generation of intelligent information systems.
Bag of Words
Vocabulary Mismatch, Shalow Understanding, Writing Queries requires knowledge (content oriented).
- KG (Structured Semantics) and Semantics: Entity-oriented search (Entity retrieval) + Semantic Search
node = entity (concrete and abstract), has attributes
Entity linking between Query and KG entities.
Query and documents are represented as bag-of-entities and match occurs in the entity space
Move from words to entities.
Sparse Graphs because similar entities may not be connected.
Connect all entities by their similarities in the embeddings spaces
Soft Match (embeddings) x Exact Match (words ou entities)
Ranking Performance
Search for one or more concepts. Search for relations is more common in Q&A
Lessons learned: combined approaches
- Large Scale Text Understanding: Entity salience (ranking)
Bag-of-Words x Bag-of-Entities: set of individuals, bag of things, shallow understanding
More than count "things" frequency, identify the importance, the central entity (centrality)
Hubness Problem of semantic similarity ... ver artigo: [Xu et al. 2015]
Very high dimensional embeddings space, anything that is not similar has approximate the same distance. (Para reduzir a sobrecarga de informação isso pode não ser um problema)
Separate similarity into several difference ranges to model how other entities are connected (not only similar): related, unrelated, conflict
NÃO ENTENDI A A FÓRMULA DE CALCULO
Similarity != Relevance
- Deep Learning
More Contributing, more Relevant
Ranking is learned from Document, Query and Entity embeddings
Combination of Entity Retrieval and Information Retrieval
Semantic: Structured (KG) and Distributed (Embeddings)
Comentários
Postar um comentário
Sinta-se a vontade para comentar. Críticas construtivas são sempre bem vindas.