Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time (1711.07601v1)

Published 21 Nov 2017 in cs.IR, cs.LG, cs.PF, and cs.SI

Abstract: User experience in modern content discovery applications critically depends on high-quality personalized recommendations. However, building systems that provide such recommendations presents a major challenge due to a massive pool of items, a large number of users, and requirements for recommendations to be responsive to user actions and generated on demand in real-time. Here we present Pixie, a scalable graph-based real-time recommender system that we developed and deployed at Pinterest. Given a set of user-specific pins as a query, Pixie selects in real-time from billions of possible pins those that are most related to the query. To generate recommendations, we develop Pixie Random Walk algorithm that utilizes the Pinterest object graph of 3 billion nodes and 17 billion edges. Experiments show that recommendations provided by Pixie lead up to 50% higher user engagement when compared to the previous Hadoop-based production system. Furthermore, we develop a graph pruning strategy at that leads to an additional 58% improvement in recommendations. Last, we discuss system aspects of Pixie, where a single server executes 1,200 recommendation requests per second with 60 millisecond latency. Today, systems backed by Pixie contribute to more than 80% of all user engagement on Pinterest.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Chantat Eksombatchai (2 papers)
  2. Pranav Jindal (2 papers)
  3. Jerry Zitao Liu (2 papers)
  4. Yuchen Liu (156 papers)
  5. Rahul Sharma (88 papers)
  6. Charles Sugnet (1 paper)
  7. Mark Ulrich (1 paper)
  8. Jure Leskovec (233 papers)
Citations (192)

Summary

Overview of the Pixie Real-Time Recommender System

The paper "Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time" presents a scalable, graph-based real-time recommendation system implemented at Pinterest. Pixie addresses the challenges inherent in real-time recommendation systems, particularly concerning scalability and responsiveness. It effectively handles Pinterest’s extensive catalog and user base by utilizing a robust system architecture based on graph-theoretic algorithms.

Core Contributions and Methodology

Pixie operationalizes its recommendations using a novel Pixie Random Walk algorithm on a bipartite graph comprising pins and boards curated by Pinterest users. This graph initially includes 7 billion nodes accommodated within a 120GB RAM cluster, benefiting from AWS infrastructure's scale and flexibility. The algorithm’s significant features include:

  1. Graph-based Recommendation: The system expands on previous collaborative filtering techniques by initiating random walks on a large-scale Pinterest object graph, leveraging over 17 billion edges to identify connections between items.
  2. Real-time Responsiveness: Through efficient graph traversal methods, Pixie ensures a sub-100 millisecond latency on recommendations, matching the dynamism expected in user interactions and content updates.
  3. Graph Pruning: A strategic graph pruning approach results in a more topically focused and computationally manageable graph, leading to a 58% improvement in recommendation quality while reducing the graph’s size sixfold.
  4. Algorithmic Novelty: The Pixie Random Walk introduces several innovations. These include user-specific bias integration, a weighted multi-query system to capture user behavior comprehensively, and early stopping mechanisms for efficiency. The multi-hit booster within the walk algorithm prioritizes items relevant to multiple user queries, enhancing recommendation relevance.
  5. High-throughput Execution: Each server within the deployment handles 1,200 recommendation requests per second with comprehensive support for parallel and scalable operations to meet the demands of Pinterest’s extensive active user base.

Evaluation and Performance

Empirical results demonstrate Pixie’s efficacy and robustness against traditional content or text-based methods. Specifically, it exhibits a 50% increase in user engagement over previous Hadoop-based systems, substantiated by offline experiments and online A/B testing. Its real-time capabilities and accuracy in predicting user engagement emphasize its value across Pinterest’s various application scenarios, from Homefeed personalization to board recommendations.

Practical and Theoretical Implications

The application of Pixie’s real-time graph-based approach has practical implications beyond current industrial recommender systems that rely on precomputed results. Pixie exemplifies real-time recommendation feasibility for web-scale applications and presents a scalable architecture capable of rapid adaptation to user interaction patterns without sacrificing engagement quality.

Theoretically, Pixie underscores the benefits of combining classic collaborative filtering logic with modern advancements in graph theory and real-time computing frameworks. This demonstrates an evolution in large-scale recommendation system design, highlighting the potential for further improvements through graph-based methodologies.

Future Directions and Developments

Pixie’s deployment at Pinterest opens avenues for future research and developments such as incorporating diverse node types into the graph to encapsulate richer semantic information, enhancing the system's adaptability and depth of personalization. Moreover, integrating machine learning methods, such as deep embedding approaches with Pixie's graph traversal techniques, might further refine recommendation precision.

Overall, Pixie’s approach and operational success offer a blueprint for other large-scale recommendation systems seeking to leverage graph-theoretic methods in real-time environments while maintaining high engagement and computational efficiency.

X Twitter Logo Streamline Icon: https://streamlinehq.com