Deep Learning for Matching in Search and Recommendation - Cap 2 Traditional Matching Models - Leitura de Artigo
Learning to Match
Matching Function
The two objects x and y, and their relationship can be described with a set of features (teta) (x, y).
The matching function f(x, y) can be a linear combination of features:
where w is the parameter vector. It can also be a generalized linear model, a tree model, or a neural network.
Supervised learning can be employed to learn the parameters of the matching function f. Supervised learning for matching typically consists of two phases: offline learning and online matching.
Learning is conducted to choose a matching function f (pertence) F that can perform the best in matching.
Similar to other supervised learning problems, we can define the goal of learning to match as minimizing a loss function, which represents how much accuracy the matching function can achieve on the training data as well as the test data.
The pointwise loss function is defined only on one instance, i.e., a source object and a target object. Suppose that there is a pair of objects (x, y) with the true matching degree of r. Further, suppose the predicted matching degree of (x, y) given by the matching model is f(x, y). The pointwise loss function is defined as a measure representing the disagreement between the matching degrees ... As an example of the pointwise loss, Mean Square Error (MSE) is a widely used loss function. ... Another example is the cross-entropy loss function. Cross-entropy loss function assumes that r (pertence) {0, 1} where 1 indicates relevant and 0 otherwise.
The pairwise loss function is defined as a measure representing the disagreement between the matching degrees and the order relation
The total empirical loss on the training data is the sum of the losses over the ordered object pairs
In search and recommendation, a source object (e.g., a query or a user) is usually related to multiple target objects (e.g., multiple documents or items). The evaluation measures for search and recommendation usually treat a list of target objects as a whole. It is reasonable, therefore, to define the loss function over a list of target objects, called listwise loss function.
The listwise loss function is defined as a measure to represent the disagreement between the true matching degrees and predicted matching degrees
Learning to rank (Li, 2011; Liu, 2009) is to learn a function represented as g(x, y) where x and y can be query and document in search and user and item in recommendation respectively. In search, for example, the ranking function g(x, y) may contain features about the relations between x and y, as well as features on x and features on y. In contrast, the matching function f(x, y) only contains features about the relations between x and y. Usually the matching function f(x, y) is trained first and then the ranking function g(x, y) is trained with f(x, y) being a feature. For ranking, determination of the order of multiple objects is the key, while for matching, determination of the relation between two objects is the key.
- Learning to Match
- Matching degree between query and document
- Challenge Mismatch
- Learning to Rank
- Ranking list of document
- Challenge Correct ranking on the top
Matching Models in Search
D = {(q1, d1, r1), (q2, d2, r2), . . . , (qN,dN, rN)} are given as training data
queries are submitted to the search systems independently, documents associated with a query are retrieved with the query words, and the relevance of a document with respect to a query is determined by the contents of the query and document.
The goal of learning to match for search is to automatically learn a matching model represented as a scoring function f(q, d) (or as a conditional probability distribution P(r | q, d)). The learning problem can be formalized as minimization of {any} loss function . The learned model must have the generalization capability to conduct matching on unseen test data.
The goal of learning to match for recommendation is to learn the underlying matching model f(ui, ij) that can make predictions on the ratings (interactions) of the zero entries in matrix R ... The learning problem can be formalized as minimizing the regularized empirical loss function.
the fundamental challenge to matching in search and recommendation is the mismatch between objects from two different spaces (query and document, as well as user and item). One effective approach to dealing with the challenge is to represent the two objects in matching in a common space, and to perform the task of matching in the common space. As the space may not have an explicit definition, it is often referred to as “latent space”.
There
are three spaces: query space, document space, and latent space, and
there exist semantic gaps between the query space and document space.
Queries and documents are first mapped to the latent space, and then
matching is conducted in the latent space. Two mapping functions specify
the mappings from the query space and document space into the latent
space. The uses of
different types of mapping functions (e.g., linear and non-linear) and
similarity measures in the latent space (e.g., inner product and
Euclidean distance) lead to different types of matching models.
The matching score between q and d is defined as the similarity between the mapped vectors (representations) of q and d in the latent space, i.e., (teta) (q) and (teta') (d).
Before the prevalence of deep learning, most methods are “shallow”, in the sense that linear functions and inner product are adopted as the mapping function and similarity, respectively
Latent Space Models in Search
methods for search that perform matching in a latent space, including Partial Least Square (PLS) (Rosipal and Krämer, 2006), Regularized Matching in Latent Space (RMLS) (Wu et al., 2013b), and Supervised Semantic Indexing (SSI) (Bai et al., 2009, 2010).
A complete introduction to semantic matching in search can be found in Li and Xu (2014).
PLS assumes that the mapping functions are orthonormal matrices. When the training data size is large, learning becomes hard because it needs to solve SVD, which is of high time complexity. To address the issue, Wu et al. (2013b) propose a new method called Regularized Matching in Latent Space (RMLS), ... The learning in RMLS is also a non-convex optimization problem. There is no guarantee that a globally optimal solution can be found.
Supervised Semantic Indexing (SSI)
A special assumption can be made in PLS and RMLS; that is, the query space and the document space have the same dimensions. For example, when both queries and documents are represented as bag-of-words, they have the same dimensions in the query and document spaces..... SSI exactly makes the assumption
The addition of the identity matrix means that SSI makes a tradeoff between the use of a low-dimensional latent space and the use of a classical Vector Space Model (VSM).1 The diagonal of matrix W gives a score to each term which occurs in both query and document.
Latent Space Models in Recommendation
Biased Matrix Factorization (BMF) is a model proposed for predicting the ratings of users (Koren et al., 2009), i.e., formalizing recommendation as a regression task. .... As it is a non-convex optimization problem, alternating least squares (He et al., 2016b) or stochastic gradient decent (Koren et al., 2009) are typically employed, which cannot guarantee to find a global optimum.
Factored Item Similarity Model (FISM) (Kabbur et al., 2013) adopts the assumption of item-based collaborative filtering, i.e., users will prefer items that are similar to what they have chosen so far. To do so, FISM uses the items that a user has chosen to represent the user and projects the combined items into the latent space.
Factorization Machine (FM) ... The input to FM is a feature vector x = [x1, x2, . . . , xn] that can contain any features for representing the matching function, as described above. Consequently, FM casts the matching problem as a supervised learning problem. FM is a very general model in the sense that feeding different input features into the model will lead to different formulations of the model.
Further Reading
Topic models can also be utilized to address the mismatch problem. A simple and effective approach is to use a linear combination of term matching score and topic matching score (Hofmann, 1999). Probabilistic topic models are also employed to smooth document language models (or query language models) (Wei and Croft, 2006; Yi and Allan, 2009).
Li, H. and J. Xu (2014). “Semantic matching in search”. Foundations and Trends in Information Retrieval. 7(5): 343–469.
====================================================================
Definições
linear model
Supervised learning for matching: offline learning + online matching.
pointwise loss function: Mean Square Error (MSE), Cross-entropy loss function
bilinear function
====================================================================
Apresentação -> https://youtu.be/JqkvH7lEfak
A função de match f(x,y) onde x pertence ao espaço X e y pertence ao espaço Y depende de um vetor de pesos W e do conjunto de features que descrevem x e y (chamado de teta de x,y)
Aprendizagem supervisionada: offline learning + online matching. Para a etapa offline existe um dataset D de tamanho N (grande!!!) onde o grau de similaridade rij entre xi e yj (ou a indicação T/F que são similares) é conhecido. A função de match f gerada consegue prever o grau de similaridade entre um par x e y qualquer e inicialmente desconhecido.
Para descobrir um bom vetor de pesos W é necessário diminuir o resultado da função de perda L e considerar um normalizador (ômega) para evitar o overfitting.
Algumas funções de perda: pointwise, pairwise, listwise
Learning to rank está relacionado com Learning to match. No ranking importa a ordem e o valor do matching pode ser uma das features para a operação.
Para Search a função de match é f (query, document)
Espaço Latente: a consulta e o documento estão em espaços diferentes, a representação em espaço latente comum permite a comparabilidade.
Comentários
Postar um comentário
Sinta-se a vontade para comentar. Críticas construtivas são sempre bem vindas.