- The paper introduces a retrievable structure (DR) that learns a discrete latent space for efficient candidate retrieval in recommendation systems.
- It employs a multi-layer MLP with beam search over a K-ary tree structure to learn user-item interactions without relying on conventional ANN assumptions.
- Experimental results on MovieLens-20M and Amazon books show DR’s competitive performance and significant improvements in user engagement metrics.
The paper introduces Deep Retrieval (DR), a novel method designed for large-scale recommendation systems, addressing the challenge of efficiently retrieving relevant candidates in sub-linear time. DR distinguishes itself from traditional approaches by directly learning a retrievable structure from user-item interaction data, such as clicks, without relying on the Euclidean space assumption inherent in Approximate Nearest Neighbor (ANN) algorithms.
DR encodes all candidate items into a discrete latent space, where the latent codes for these candidates are model parameters learned in conjunction with other neural network parameters. This learning process is geared towards maximizing a singular objective function. Once the model is trained, a beam search is executed over the learned structure to retrieve the top candidates for re-ranking.
Key components and design considerations of DR include:
- A structure model consisting of D layers, each with K nodes. Each layer uses a multi-layer perceptron (MLP) and K-class softmax to output a distribution over its K nodes.
- An item-to-path mapping π:V→[K]D, where V represents the set of all items. A path c is defined as the forward index traverse over matrix columns. Each path has a length of D, with index values ranging from {1,2,…,K}. Consequently, there are KD possible paths, each representing a cluster of items.
The model learns a probability distribution over the paths given user inputs, concurrently with a mapping from items to paths. During the serving phase, beam search is employed to identify the most probable paths and the items associated with them.
The probability of a path c given a user x, denoted as p(c∣x,θ), is constructed layer by layer:
- The initial layer takes the user embedding emb(x) as input and outputs a probability p(c1∣x,θ1) over the K nodes, based on parameters θ1.
- Subsequent layers concatenate the user embedding emb(x) with the embeddings of all preceding layers emb(cd−1) as input to an MLP, which outputs p(cd∣x,c1,…,cd−1,θd) over the K nodes of layer d, based on parameters θd.
- The path probability is then calculated as the product of the probabilities of all layers' outputs:
- p(c∣x,θ) is the probability of path c given user x.
- D is the number of layers.
- cd is the node in layer d along path c.
- x is the user.
- θd represents the parameters of layer d.
To enhance the model's capacity to express multi-aspect information, DR allows each item yi to be assigned to J different paths {ci,1,…,ci,J}. The multi-path structure objective is defined as:
Qstr(θ,π)=i=1∑Nlog(j=1∑Jp(ci,j=πj(yi)∣xi,θ)), where the probability of belonging to multiple paths is the summation of the probabilities of belonging to individual paths.
To prevent the model from collapsing into allocating all items into a single path, a penalized likelihood function is introduced:
Qpen(θ,π)=Qstr(θ,π)−α⋅c∈[K]D∑f(∣c∣), where α is the penalty factor, ∣c∣ denotes the number of items allocated in path c, and f is an increasing and convex function (e.g., f(∣c∣)=∣c∣4/4).
In the inference stage, beam search is used to retrieve the most probable paths. The algorithm selects the top B nodes from all successors of the selected nodes from the previous layer, returning the top B paths in the final layer.
The model is trained using an Expectation-Maximization (EM) type algorithm. The EM algorithm involves iteratively optimizing the model parameters θ for a fixed mapping π (E-step) and updating the mapping π to maximize the objective function given the updated parameters (M-step).
To further improve performance, the DR model is jointly trained with a re-ranking model, specifically a softmax model with output size V, where V is the total number of items. The final objective function is a combination of the penalized likelihood function and the softmax objective: Q=Qpen+Qsoftmax.
The paper includes experiments conducted on two public datasets, MovieLens-20M and Amazon books, to evaluate the performance of DR. The results demonstrate that DR achieves performance comparable to brute-force retrieval methods while maintaining sub-linear computational complexity. Furthermore, DR was deployed in a live production recommendation system with hundreds of millions of users and items, where it significantly outperformed a well-tuned ANN baseline in terms of engagement metrics such as video finish rate (+3.0\%), app view time (+0.87%), and second-day retention (+0.036%).