Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RECE: Reduced Cross-Entropy Loss for Large-Catalogue Sequential Recommenders (2408.02354v3)

Published 5 Aug 2024 in cs.IR and cs.LG

Abstract: Scalability is a major challenge in modern recommender systems. In sequential recommendations, full Cross-Entropy (CE) loss achieves state-of-the-art recommendation quality but consumes excessive GPU memory with large item catalogs, limiting its practicality. Using a GPU-efficient locality-sensitive hashing-like algorithm for approximating large tensor of logits, this paper introduces a novel RECE (REduced Cross-Entropy) loss. RECE significantly reduces memory consumption while allowing one to enjoy the state-of-the-art performance of full CE loss. Experimental results on various datasets show that RECE cuts training peak memory usage by up to 12 times compared to existing methods while retaining or exceeding performance metrics of CE loss. The approach also opens up new possibilities for large-scale applications in other domains.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Danil Gusak (3 papers)
  2. Gleb Mezentsev (4 papers)
  3. Ivan Oseledets (187 papers)
  4. Evgeny Frolov (18 papers)
Citations (1)

Summary

RECE: Reduced Cross-Entropy Loss for Large-Catalogue Sequential Recommenders

In the paper "RECE: Reduced Cross-Entropy Loss for Large-Catalogue Sequential Recommenders," the authors address the scalability challenges inherent in contemporary recommender systems when dealing with extensive item catalogs. They propose an innovative RECE (REduced Cross-Entropy) loss that maintains the high performance of full Cross-Entropy (CE) loss while significantly reducing GPU memory usage during training.

Problem Statement

Sequential recommenders have increasingly adopted sophisticated models, particularly those inspired by Transformer architectures, to predict users' next item choices based on their interaction histories. Despite the effectiveness of full CE loss in fine-tuning such models to achieve state-of-the-art (SOTA) results, its applicability is limited by substantial memory overheads, especially with larger catalogs. The paper introduces RECE to overcome this memory bottleneck while maintaining the robust performance metrics of full CE loss.

Proposed Method: RECE

The authors propose a GPU-efficient algorithm akin to locality-sensitive hashing to approximate the large tensor of logits otherwise required by full CE loss. The RECE algorithm selectively computes critical logits that are most informative for the learning process and approximates the softmax distribution over these elements. This selection is based on a fast, GPU-friendly search for maximum inner products. The key components of RECE include:

  1. Locality-sensitive hashing for angular distance: Generates random vectors to index both the output sequences and item embeddings.
  2. Bucketing and sorting: Items are grouped into buckets, sorted, and divided into manageable chunks.
  3. Approximate gradient preservation: Prioritizes computation of logits with the largest absolute gradient values to retain the most useful information for classification.
  4. Scalability: Allows for multiple rounds and chunk processing to ensure better gradient estimation and lower memory usage.

The resulting memory requirements are shown to scale with min(C,sl)\sqrt{\min(C, s\cdot l)}, reflecting a significant reduction compared to the full CE loss.

Experimental Results

To validate the effectiveness of RECE, the paper includes extensive evaluations across four datasets: BeerAdvocate, Behance, Amazon Kindle Store, and Gowalla, chosen for their varied catalog sizes and user interaction densities. The experiments compare the SASRec model enhanced with RECE against several baselines: SASRec with full CE loss, Binary Cross-Entropy with multiple negative samples (BCE+^+), and recent SOTA sampling methods.

Key findings include:

  • On the BeerAdvocate dataset, with a relatively small catalog, RECE performs comparably to the best alternative methods in both memory and quality metrics.
  • For datasets with larger catalogs, such as Behance and Kindle Store, RECE achieves similar performance levels while reducing peak memory usage by up to 12 times.
  • On Gowalla, the dataset with the largest catalog in the paper, RECE could either surpass the performance of other methods by up to 8.19% in NDCG@10 or achieve equivalent quality with a six-fold reduction in memory usage.

Implications and Future Work

The introduction of RECE has significant implications for the design and implementation of large-scale recommender systems. Practically, it enables the deployment of high-performance models in memory-constrained environments, making it feasible to handle extensive item catalogs more efficiently. Theoretically, RECE opens new avenues in loss function optimization, suggesting that similar approaches might be adapted for other models or domains beyond sequential recommenders, such as natural language processing or search systems.

Future research could explore extending RECE to other commonly used loss functions or integrating it into different model architectures beyond SASRec. Additionally, examining the applicability of RECE in real-time recommendation scenarios and diverse application domains would further solidify its utility and versatility.

In conclusion, this work proposes a practical solution to a critical limitation in the field of recommender systems, bridging the gap between high-performance and scalable model training. The RECE loss function stands out as a promising tool for advancing recommender system research and applications, particularly in the context of large item catalogs.

Acknowledgments

The authors acknowledge the support from the Basic Research Program at the National Research University Higher School of Economics (HSE University).

References:

Gusak, D., Mezentsev, G., Oseledets, I., & Frolov, E. (2024). RECE: Reduced Cross-Entropy Loss for Large-Catalogue Sequential Recommenders. In CIKM '24: 33rd ACM International Conference on Information and Knowledge Management. Boise, USA. DOI: XXXX/YYYY.