Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Counterfactual Inference for Consumer Choice Across Many Product Categories (1906.02635v2)

Published 6 Jun 2019 in cs.LG, econ.EM, and stat.ML

Abstract: This paper proposes a method for estimating consumer preferences among discrete choices, where the consumer chooses at most one product in a category, but selects from multiple categories in parallel. The consumer's utility is additive in the different categories. Her preferences about product attributes as well as her price sensitivity vary across products and are in general correlated across products. We build on techniques from the machine learning literature on probabilistic models of matrix factorization, extending the methods to account for time-varying product attributes and products going out of stock. We evaluate the performance of the model using held-out data from weeks with price changes or out of stock products. We show that our model improves over traditional modeling approaches that consider each category in isolation. One source of the improvement is the ability of the model to accurately estimate heterogeneity in preferences (by pooling information across categories); another source of improvement is its ability to estimate the preferences of consumers who have rarely or never made a purchase in a given category in the training data. Using held-out data, we show that our model can accurately distinguish which consumers are most price sensitive to a given product. We consider counterfactuals such as personally targeted price discounts, showing that using a richer model such as the one we propose substantially increases the benefits of personalization in discounts.

Citations (32)

Summary

  • The paper introduces a Nested Factorization model that efficiently pools data across product categories to capture nuanced consumer heterogeneity.
  • The paper employs Bayesian variational inference to significantly improve counterfactual predictions in scenarios like personalized price discounts.
  • The paper demonstrates the practical impact of modeling cross-category interdependencies, offering actionable insights for targeted promotions and inventory management.

Counterfactual Inference for Consumer Choice Across Many Product Categories

The paper "Counterfactual Inference for Consumer Choice Across Many Product Categories" presents a sophisticated approach to estimating consumer preferences over multiple product categories. Authored by Donnelly, Ruiz, Blei, and Athey, this research focuses on enhancing the predictive power of consumer choice models by capturing the interdependencies between different product categories and accommodating consumer heterogeneity in substitute preferences.

Methodology and Model

The authors propose a model grounded in probabilistic matrix factorization, extended to account for time-varying attributes and product availability shifts, such as items going out-of-stock. This Nested Factorization model is designed to transcend the limitations of isolated category models by pooling information across various categories. It incorporates a nested logit framework, adapted to reflect realistic substitution patterns but with significantly greater flexibility to accommodate consumer-specific preference heterogeneity.

The methodological innovation lies in the integration of a counterfactual inference framework, capable of assessing scenarios such as personalized price discounts. This is primarily facilitated through Bayesian variational inference techniques, which allow for efficient scaling to large data sets, such as those collected from grocery store loyalty programs.

Performance Evaluation and Findings

Empirical validation is conducted using scanner data from a supermarket, exploiting the natural experiment setting provided by weekly price changes and stock-out events. The model demonstrates superiority over traditional mixed logit and nested logit models across numerous metrics:

  • Predictive Accuracy: The model shows a substantial improvement in predicting held-out test sets, especially in scenarios involving price and availability changes (counterfactual predictions).
  • Capacity to Personalize: It accurately estimates individual-level preferences and elasticities, a feat that traditional models struggle to achieve, particularly for products not previously observed in the consumer's purchase history.
  • Cross-Category Interdependence: By including cross-category preference correlations, it effectively captures consumer substitution dynamics.

Implications

The findings suggest that incorporating rich hierarchical models into consumer choice analysis has meaningful implications. Practically, retailers and marketers could leverage these insights to design more effective targeting strategies for promotions, personalized recommendations, or inventory management.

Theoretically, this work bridges gaps between conventional demand estimation techniques in economics and advanced machine learning methods. The integration of latent factor models into hierarchical choice frameworks presents a promising direction for future research in consumer behavior analysis, especially in rapidly digitizing retail environments.

Future Directions

The paper opens avenues for further exploration, including the deployment of similar models in dynamic pricing strategies and investigating the impact of broader macroeconomic changes on category-level demand. Additionally, extending these methodologies to assess complementarity and substitution patterns more deeply across broader product assortments could further enhance retail decision-making capabilities.

In summary, this research highlights the vital role of advanced statistical techniques in informing business decisions and contributes to the evolving discourse on consumer choice modeling. Its application to real-world data not only underscores its practical relevance but also its potential to refine economic theories on consumer behavior.