Pular para o conteúdo principal

Deep Learning for Matching in Search and Recommendation - Cap 2 Traditional Matching Models - Leitura de Artigo

Learning to Match

Matching Function

The two objects x and y, and their relationship can be described with a set of features (teta) (x, y).
The matching function f(x, y) can be a linear combination of features:


  

where w is the parameter vector. It can also be a generalized linear model, a tree model, or a neural network.

Supervised learning can be employed to learn the parameters of the matching function f. Supervised learning for matching typically consists of two phases: offline learning and online matching. 

Learning is conducted to choose a matching function f (pertence) F that can perform the best in matching.

Similar to other supervised learning problems, we can define the goal of learning to match as minimizing a loss function, which represents how much accuracy the matching function can achieve on the training data as well as the test data.

The pointwise loss function is defined only on one instance, i.e., a source object and a target object. Suppose that there is a pair of objects (x, y) with the true matching degree of r. Further, suppose the predicted matching degree of (x, y) given by the matching model is f(x, y). The pointwise loss function is defined as a measure representing the disagreement between the matching degrees ... As an example of the pointwise loss, Mean Square Error (MSE) is a widely used loss function. ... Another example is the cross-entropy loss function. Cross-entropy loss function assumes that r (pertence) {0, 1} where 1 indicates relevant and 0 otherwise.

The pairwise loss function is defined as a measure representing the disagreement between the matching degrees and the order relation

The total empirical loss on the training data is the sum of the losses over the ordered object pairs

In search and recommendation, a source object (e.g., a query or a user) is usually related to multiple target objects (e.g., multiple documents or items). The evaluation measures for search and recommendation usually treat a list of target objects as a whole. It is reasonable, therefore, to define the loss function over a list of target objects, called listwise loss function.

The listwise loss function is defined as a measure to represent the disagreement between the true matching degrees and predicted matching degrees

Learning to rank (Li, 2011; Liu, 2009) is to learn a function represented as g(x, y) where x and y can be query and document in search and user and item in recommendation respectively. In search, for example, the ranking function g(x, y) may contain features about the relations between x and y, as well as features on x and features on y. In contrast, the matching function f(x, y) only contains features about the relations between x and y. Usually the matching function f(x, y) is trained first and then the ranking function g(x, y) is trained with f(x, y) being a feature. For ranking, determination of the order of multiple objects is the key, while for matching, determination of the relation between two objects is the key.

  • Learning to Match
    • Matching degree between query and document
    • Challenge Mismatch
  • Learning to Rank
    • Ranking list of document
    • Challenge Correct ranking on the top

Matching Models in Search

D = {(q1, d1, r1), (q2, d2, r2), . . . , (qN,dN, rN)} are given as training data

queries are submitted to the search systems independently, documents associated with a query are retrieved with the query words, and the relevance of a document with respect to a query is determined by the contents of the query and document.

The goal of learning to match for search is to automatically learn a matching model represented as a scoring function f(q, d) (or as a conditional probability distribution P(r | q, d)). The learning problem can be formalized as minimization of {any} loss function . The learned model must have the generalization capability to conduct matching on unseen test data.

The goal of learning to match for recommendation is to learn the underlying matching model f(ui, ij) that can make predictions on the ratings (interactions) of the zero entries in matrix R ... The learning problem can be formalized as minimizing the regularized empirical loss function.

the fundamental challenge to matching in search and recommendation is the mismatch between objects from two different spaces (query and document, as well as user and item). One effective approach to dealing with the challenge is to represent the two objects in matching in a common space, and to perform the task of matching in the common space. As the space may not have an explicit definition, it is often referred to as “latent space”.

There are three spaces: query space, document space, and latent space, and there exist semantic gaps between the query space and document space. Queries and documents are first mapped to the latent space, and then matching is conducted in the latent space. Two mapping functions specify the mappings from the query space and document space into the latent space. The uses of different types of mapping functions (e.g., linear and non-linear) and similarity measures in the latent space (e.g., inner product and Euclidean distance) lead to different types of matching models.


The matching score between q and d is defined as the similarity between the mapped vectors (representations) of q and d in the latent space, i.e., (teta) (q) and (teta') (d).

Before the prevalence of deep learning, most methods are “shallow”, in the sense that linear functions and inner product are adopted as the mapping function and similarity, respectively

Latent Space Models in Search

methods for search that perform matching in a latent space, including Partial Least Square (PLS) (Rosipal and Krämer, 2006), Regularized Matching in Latent Space (RMLS) (Wu et al., 2013b), and Supervised Semantic Indexing (SSI) (Bai et al., 2009, 2010).

A complete introduction to semantic matching in search can be found in Li and Xu (2014).

PLS assumes that the mapping functions are orthonormal matrices. When the training data size is large, learning becomes hard because it needs to solve SVD, which is of high time complexity. To address the issue, Wu et al. (2013b) propose a new method called Regularized Matching in Latent Space (RMLS), ... The learning in RMLS is also a non-convex optimization problem. There is no guarantee that a globally optimal solution can be found.

Supervised Semantic Indexing (SSI)

A special assumption can be made in PLS and RMLS; that is, the query space and the document space have the same dimensions. For example, when both queries and documents are represented as bag-of-words, they have the same dimensions in the query and document spaces..... SSI exactly makes the assumption

The addition of the identity matrix means that SSI makes a tradeoff between the use of a low-dimensional latent space and the use of a classical Vector Space Model (VSM).1 The diagonal of matrix W gives a score to each term which occurs in both query and document.

Latent Space Models in Recommendation

Biased Matrix Factorization (BMF) is a model proposed for predicting the ratings of users (Koren et al., 2009), i.e., formalizing recommendation as a regression task. .... As it is a non-convex optimization problem, alternating least squares (He et al., 2016b) or stochastic gradient decent (Koren et al., 2009) are typically employed, which cannot guarantee to find a global optimum.

Factored Item Similarity Model (FISM) (Kabbur et al., 2013) adopts the assumption of item-based collaborative filtering, i.e., users will prefer items that are similar to what they have chosen so far. To do so, FISM uses the items that a user has chosen to represent the user and projects the combined items into the latent space.

Factorization Machine (FM) ... The input to FM is a feature vector x = [x1, x2, . . . , xn] that can contain any features for representing the matching function, as described above. Consequently, FM casts the matching problem as a supervised learning problem. FM is a very general model in the sense that feeding different input features into the model will lead to different formulations of the model.

Further Reading

Topic models can also be utilized to address the mismatch problem. A simple and effective approach is to use a linear combination of term matching score and topic matching score (Hofmann, 1999). Probabilistic topic models are also employed to smooth document language models (or query language models) (Wei and Croft, 2006; Yi and Allan, 2009). 

Li, H. and J. Xu (2014). “Semantic matching in search”. Foundations and Trends in Information Retrieval. 7(5): 343–469.

====================================================================

Definições

linear model

Supervised learning for matching: offline learning + online matching. 

pointwise loss function: Mean Square Error (MSE), Cross-entropy loss function 

bilinear function 

====================================================================

Apresentação -> https://youtu.be/JqkvH7lEfak

A função de match f(x,y) onde x pertence ao espaço X e y pertence ao espaço Y depende de um vetor de pesos W e do conjunto de features que descrevem x e y (chamado de teta de x,y)

Aprendizagem supervisionada: offline learning +  online matching. Para a etapa offline existe um dataset D de tamanho N (grande!!!) onde o grau de similaridade rij entre xi e yj (ou a indicação T/F que são similares) é conhecido. A função de match f gerada consegue prever o grau de similaridade entre um par x e y qualquer e inicialmente desconhecido. 

Para descobrir um bom vetor de pesos W é necessário diminuir o resultado da função de perda L e considerar um normalizador (ômega) para evitar o overfitting.

Algumas funções de perda: pointwise, pairwise, listwise

Learning to rank está relacionado com Learning to match. No ranking importa a ordem e o valor do matching pode ser uma das features para a operação. 

Para Search a função de match é f (query, document)

Espaço Latente: a consulta e o documento estão em espaços diferentes, a representação em espaço latente comum permite a comparabilidade.

Comentários

Postagens mais visitadas deste blog

Connected Papers: Uma abordagem alternativa para revisão da literatura

Durante um projeto de pesquisa podemos encontrar um artigo que nos identificamos em termos de problema de pesquisa e também de solução. Então surge a vontade de saber como essa área de pesquisa se desenvolveu até chegar a esse ponto ou quais desdobramentos ocorreram a partir dessa solução proposta para identificar o estado da arte nesse tema. Podemos seguir duas abordagens:  realizar uma revisão sistemática usando palavras chaves que melhor caracterizam o tema em bibliotecas digitais de referência para encontrar artigos relacionados ou realizar snowballing ancorado nesse artigo que identificamos previamente, explorando os artigos citados (backward) ou os artigos que o citam (forward)  Mas a ferramenta Connected Papers propõe uma abordagem alternativa para essa busca. O problema inicial é dado um artigo de interesse, precisamos encontrar outros artigos relacionados de "certa forma". Find different methods and approaches to the same subject Track down the state of the art rese...

Knowledge Graph Embedding with Triple Context - Leitura de Abstract

  Jun Shi, Huan Gao, Guilin Qi, and Zhangquan Zhou. 2017. Knowledge Graph Embedding with Triple Context. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM '17). Association for Computing Machinery, New York, NY, USA, 2299–2302. https://doi.org/10.1145/3132847.3133119 ABSTRACT Knowledge graph embedding, which aims to represent entities and relations in vector spaces, has shown outstanding performance on a few knowledge graph completion tasks. Most existing methods are based on the assumption that a knowledge graph is a set of separate triples, ignoring rich graph features, i.e., structural information in the graph. In this paper, we take advantages of structures in knowledge graphs, especially local structures around a triple, which we refer to as triple context. We then propose a Triple-Context-based knowledge Embedding model (TCE). For each triple, two kinds of structure information are considered as its context in the graph; one is the out...

KnOD 2021

Beyond Facts: Online Discourse and Knowledge Graphs A preface to the proceedings of the 1st International Workshop on Knowledge Graphs for Online Discourse Analysis (KnOD 2021, co-located with TheWebConf’21) https://ceur-ws.org/Vol-2877/preface.pdf https://knod2021.wordpress.com/   ABSTRACT Expressing opinions and interacting with others on the Web has led to the production of an abundance of online discourse data, such as claims and viewpoints on controversial topics, their sources and contexts . This data constitutes a valuable source of insights for studies into misinformation spread, bias reinforcement, echo chambers or political agenda setting. While knowledge graphs promise to provide the key to a Web of structured information, they are mainly focused on facts without keeping track of the diversity, connection or temporal evolution of online discourse data. As opposed to facts, claims are inherently more complex. Their interpretation strongly depends on the context and a vari...