Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Semi-Synthetic Dataset Generation Framework for Causal Inference in Recommender Systems (2202.11351v1)

Published 23 Feb 2022 in cs.IR

Abstract: Accurate recommendation and reliable explanation are two key issues for modern recommender systems. However, most recommendation benchmarks only concern the prediction of user-item ratings while omitting the underlying causes behind the ratings. For example, the widely-used Yahoo!R3 dataset contains little information on the causes of the user-movie ratings. A solution could be to conduct surveys and require the users to provide such information. In practice, the user surveys can hardly avoid compliance issues and sparse user responses, which greatly hinders the exploration of causality-based recommendation. To better support the studies of causal inference and further explanations in recommender systems, we propose a novel semi-synthetic data generation framework for recommender systems where causal graphical models with missingness are employed to describe the causal mechanism of practical recommendation scenarios. To illustrate the use of our framework, we construct a semi-synthetic dataset with Causal Tags And Ratings (CTAR), based on the movies as well as their descriptive tags and rating information collected from a famous movie rating website. Using the collected data and the causal graph, the user-item-ratings and their corresponding user-item-tags are automatically generated, which provides the reasons (selected tags) why the user rates the items. Descriptive statistics and baseline results regarding the CTAR dataset are also reported. The proposed data generation framework is not limited to recommendation, and the released APIs can be used to generate customized datasets for other research tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Yan Lyu (21 papers)
  2. Sunhao Dai (22 papers)
  3. Peng Wu (119 papers)
  4. Quanyu Dai (39 papers)
  5. Yuhao Deng (10 papers)
  6. Wenjie Hu (33 papers)
  7. Zhenhua Dong (76 papers)
  8. Jun Xu (398 papers)
  9. Shengyu Zhu (26 papers)
  10. Xiao-Hua Zhou (30 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.