Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Content-Driven Micro-Video Recommendation Dataset at Scale (2309.15379v1)

Published 27 Sep 2023 in cs.IR

Abstract: Micro-videos have recently gained immense popularity, sparking critical research in micro-video recommendation with significant implications for the entertainment, advertising, and e-commerce industries. However, the lack of large-scale public micro-video datasets poses a major challenge for developing effective recommender systems. To address this challenge, we introduce a very large micro-video recommendation dataset, named "MicroLens", consisting of one billion user-item interaction behaviors, 34 million users, and one million micro-videos. This dataset also contains various raw modality information about videos, including titles, cover images, audio, and full-length videos. MicroLens serves as a benchmark for content-driven micro-video recommendation, enabling researchers to utilize various modalities of video information for recommendation, rather than relying solely on item IDs or off-the-shelf video features extracted from a pre-trained network. Our benchmarking of multiple recommender models and video encoders on MicroLens has yielded valuable insights into the performance of micro-video recommendation. We believe that this dataset will not only benefit the recommender system community but also promote the development of the video understanding field. Our datasets and code are available at https://github.com/westlake-repl/MicroLens.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Yongxin Ni (15 papers)
  2. Yu Cheng (354 papers)
  3. Xiangyan Liu (10 papers)
  4. Junchen Fu (14 papers)
  5. Youhua Li (8 papers)
  6. Xiangnan He (200 papers)
  7. Yongfeng Zhang (163 papers)
  8. Fajie Yuan (33 papers)
Citations (21)

Summary

  • The paper presents the MicroLens dataset, offering one billion user-item interactions and multimodal content for micro-video recommendation research.
  • It details a robust methodology involving seed video collection, expansion, and strict filtering to ensure high data quality and diversity.
  • Benchmark evaluations reveal that end-to-end video content models surpass traditional feature-based methods, underscoring the need for specialized video encoders.

Large-Scale Dataset for Micro-Video Recommendation: MicroLens

The paper, "A Content-Driven Micro-Video Recommendation Dataset at Scale," presents the MicroLens dataset, addressing a significant gap in the availability of large-scale datasets for micro-video recommendation research. This dataset, comprising one billion user-item interactions involving 34 million users and one million micro-videos, incorporates a comprehensive suite of raw modalities, including titles, cover images, audio, and the full video, thus enabling more nuanced content-driven recommendation approaches.

Dataset Overview and Construction

The construction of the MicroLens dataset marks a meaningful step forward in the field of recommender systems, particularly in the domain of micro-videos. The dataset was meticulously curated over a year-long process, ensuring a diverse collection of videos from an online platform. It transcends traditional benchmarks such as MovieLens, Tenrec, and KuaiRec by offering raw video content, which is pivotal for developing models that can directly learn from video data rather than relying on identifiers or pre-extracted features.

The dataset's construction involved several stages, starting with seed video collection, followed by dataset expansion and stringent data filtering to eliminate duplicates and uphold quality standards. Interaction data was gathered through public user comments, offering implicit yet robust indicators of user preference—a methodological choice that prioritizes user privacy while capturing engagement more comprehensively than clicks or likes might.

Benchmark Results and Model Evaluation

The authors benchmarked a range of models to evaluate the potential of MicroLens, categorizing them into item ID-based (IDRec) and content-driven models (VIDRec, VideoRec). Notably, the VIDRec approach, which augments item IDs with features from pre-extracted video content, does not consistently outpace traditional ID-based methods in terms of accuracy. This observation underscores the latent potential within the dataset: leveraging raw video content through end-to-end (E2E) trained VideoRec models leads to superior performance, particularly when moving away from traditional reliance on pre-extracted video features.

Insights into Video Understanding and Recommender Systems

The paper delineates a notable gap between current video understanding technologies and their effective application in recommender systems. Experimentation with various video encoders indicates that pre-trained models, although beneficial, require retraining to align with the specific needs of recommendation tasks. The paper articulates that the semantic representations derived from common video classification tasks are not universally applicable to recommendation contexts without further fine-tuning, advocating for research focused on optimizing these models for recommendation use cases.

Future Implications and Research Directions

MicroLens advances the frontier of research in both the recommendation and video understanding fields by creating a fertile platform for innovation. It holds the promise of serving as a benchmark for developing multimodal recommendation algorithms and potentially enabling the emergence of foundation models akin to GPT in the text domain but tailored for the rich, multimodal space of video recommendation. The authors note that while substantial progress has been made, the semantic gap between video understanding and recommendation tasks signifies a continued need for dedicated research to bridge these fields effectively.

In summary, the introduction of the MicroLens dataset not only fills a critical gap in available resources for recommendation research but also redefines possibilities within the field by prioritizing raw video content as a primary modality. The insights garnered from this work emphasize the necessity of adapting video technologies for recommendation use, presenting both a challenge and an opportunity for future advancements in AI.