Embedding in Recommender Systems: A Survey (2310.18608v2)

Published 28 Oct 2023 in cs.IR and cs.AI

Abstract: Recommender systems have become an essential component of many online platforms, providing personalized recommendations to users. A crucial aspect is embedding techniques that coverts the high-dimensional discrete features, such as user and item IDs, into low-dimensional continuous vectors and can enhance the recommendation performance. Applying embedding techniques captures complex entity relationships and has spurred substantial research. In this survey, we provide an overview of the recent literature on embedding techniques in recommender systems. This survey covers embedding methods like collaborative filtering, self-supervised learning, and graph-based techniques. Collaborative filtering generates embeddings capturing user-item preferences, excelling in sparse data. Self-supervised methods leverage contrastive or generative learning for various tasks. Graph-based techniques like node2vec exploit complex relationships in network-rich environments. Addressing the scalability challenges inherent to embedding methods, our survey delves into innovative directions within the field of recommendation systems. These directions aim to enhance performance and reduce computational complexity, paving the way for improved recommender systems. Among these innovative approaches, we will introduce Auto Machine Learning (AutoML), hash techniques, and quantization techniques in this survey. We discuss various architectures and techniques and highlight the challenges and future directions in these aspects. This survey aims to provide a comprehensive overview of the state-of-the-art in this rapidly evolving field and serve as a useful resource for researchers and practitioners working in the area of recommender systems.

PDF HTML Abstract

The survey paper titled "Embedding in Recommender Systems: A Survey" offers an extensive review of embedding techniques in recommendation systems. Embeddings are utilized to transform high-dimensional, discrete feature spaces, such as user and item identifiers, into low-dimensional, continuous vector spaces, thereby improving recommendation performance through the capture of intricate relationships between entities.

The paper systematically investigates the wide array of embedding methodologies in recommender systems, categorizing them into distinct approaches:

Collaborative Filtering (CF) Methods: CF typically employs Matrix Factorization (MF) and Factorization Machines (FM) to capture user-item interaction patterns. MF, through techniques such as FunkSVD, reduces dimensionality by decomposing interaction matrices into latent spaces where user and item preferences coexist. FM enhances MF by integrating higher-order feature combinations, which are essential in managing sparse data prevalent in recommendation tasks. The paper discusses variants like SVD++, NSVD, and DeepFM, which extend FM with neural network architectures to capture deeper interactions between features.
Self-Supervised Learning (SSL) Techniques: SSL methods with embedding focus leverage large-scale unlabeled data, using it to enhance the representation learning in recommender systems. These techniques are divided into contrastive and generative methods. Contrastive methods, such as SimCLR, optimize for representation similarity by maximizing agreement across similar data instances. Generative methods, exemplified by BERT-like approaches, reconstruct data from masked sequences, refining embeddings through prediction tasks.
Graph-Based Approaches: The survey explores graph embeddings within recommendation systems and categorizes them based on graph structures: homogeneous, bipartite, heterogeneous, and hypergraphs. Methods like LightGCN are highlighted for their efficient propagation of graph signals to improve recommendations across collaborative networks. Embeddings from knowledge graphs and social network graphs are given significant attention due to their ability to enrich the semantic understanding of user-item interactions.
Scalability Techniques: As recommender systems grow, embedding generation faces challenges in scalability and computational constraints. Techniques introduced in the paper to alleviate these issues include Auto Machine Learning (AutoML) frameworks, hashing, and quantization methods. AutoML attempts to automate hyperparameter tuning and optimal embedding size selection. Hashing approaches such as Bloom filters and hash embeddings reduce dimensionality while maintaining essential data semantics, whereas quantization compresses embeddings to enhance computational efficiency without significantly compromising accuracy.
Future Directions: The paper outlines future research avenues, such as enhancing dynamic graph embeddings, fair embedding learning in recommendation contexts, and improving algorithms for edge feature representation in graph embeddings. Additionally, the potential for leveraging LLMs to enrich embeddings by infusing them with semantic context from user interactions is discussed.

The survey aims to serve as a resource for researchers by methodically consolidating state-of-the-art embedding techniques in recommendation systems, providing insights into current challenges and proposing directions for future innovations. It also highlights the importance of integrating modern computational strategies like AutoML and LLMs to further push the boundaries of embedding-based recommendation performance.

PDF Markdown Bookmark Chat (Pro)

Authors (9)

Xiangyu Zhao (192 papers)
Maolin Wang (29 papers)
Xinjian Zhao (8 papers)
Jiansheng Li (6 papers)
Shucheng Zhou (1 paper)
Dawei Yin (165 papers)
Qing Li (430 papers)
Jiliang Tang (204 papers)
Ruocheng Guo (62 papers)

Citations (3)

View on Semantic Scholar

Embedding in Recommender Systems: A Survey (2310.18608v2)

Related Papers