Bridging Language and Items for Retrieval and Recommendation (2403.03952v1)
Abstract: This paper introduces BLaIR, a series of pretrained sentence embedding models specialized for recommendation scenarios. BLaIR is trained to learn correlations between item metadata and potential natural language context, which is useful for retrieving and recommending items. To pretrain BLaIR, we collect Amazon Reviews 2023, a new dataset comprising over 570 million reviews and 48 million items from 33 categories, significantly expanding beyond the scope of previous versions. We evaluate the generalization ability of BLaIR across multiple domains and tasks, including a new task named complex product search, referring to retrieving relevant items given long, complex natural language contexts. Leveraging LLMs like ChatGPT, we correspondingly construct a semi-synthetic evaluation set, Amazon-C4. Empirical results on the new task, as well as conventional retrieval and recommendation tasks, demonstrate that BLaIR exhibit strong text and item representation capacity. Our datasets, code, and checkpoints are available at: https://github.com/hyp1231/AmazonReviews2023.
- Learning a hierarchical embedding model for personalized product search. In SIGIR.
- Palm 2 technical report. arXiv preprint arXiv:2305.10403.
- Understanding scaling laws for recommendation models. arXiv preprint arXiv:2208.08489.
- Tallrec: An effective and efficient tuning framework to align large language model with recommendation. In RecSys.
- The netflix prize. In Proceedings of KDD cup and workshop, volume 2007, page 35. New York.
- A transformer-based embedding model for personalized product search. In SIGIR.
- Palm: Scaling language modeling with pathways. JMLR, 24(240):1–113.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- Precise zero-shot dense retrieval without relevance labels.
- Simcse: Simple contrastive learning of sentence embeddings. In emnlp.
- Large language models as zero-shot conversational recommenders. In CIKM.
- Query-aware sequential recommendation. In CIKM.
- Session-based recommendations with recurrent neural networks. In ICLR.
- Training compute-optimal large language models. arXiv preprint arXiv:2203.15556.
- Learning vector-quantized item representation for transferable sequential recommenders. In TheWebConf.
- Towards universal sequence representation learning for recommender systems. In KDD.
- Large language models are zero-shot rankers for recommender systems. In ECIR.
- Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recommendation. In ICDM.
- Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781.
- Matrix factorization techniques for recommender systems. Computer, 42(8):30–37.
- Text is all you need: Learning language representations for sequential recommendation. In KDD.
- Variational autoencoders for collaborative filtering. In Proceedings of the 2018 world wide web conference, pages 689–698.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
- Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pages 188–197.
- Large dual encoders are generalizable retrievers. In EMNLP.
- A content-driven micro-video recommendation dataset at scale. arXiv preprint arXiv:2309.15379.
- OpenAI. 2022. Introducing chatgpt. OpenAI Blog.
- OpenAI. 2023. Gpt-4 technical report.
- Proving test set contamination in black box language models. arXiv preprint arXiv:2310.17623.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR.
- Scaling language models: Methods, analysis & insights from training gopher. arXiv preprint arXiv:2112.11446.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551.
- Recommendation on live-streaming platforms: Dynamic availability and repeat consumption. In Proceedings of the 15th ACM Conference on Recommender Systems, pages 390–399.
- Shopping queries dataset: A large-scale ESCI benchmark for improving product search.
- Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992.
- Representation learning with large language models for recommendation. arXiv preprint arXiv:2310.15950.
- The probabilistic relevance framework: Bm25 and beyond. Foundations and Trends® in Information Retrieval, 3(4):333–389.
- e-clip: Large-scale vision-language representation learning in e-commerce. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, pages 3484–3494.
- One embedder, any task: Instruction-finetuned text embeddings. In ACL.
- Xiaoyuan Su and Taghi M Khoshgoftaar. 2009. A survey of collaborative filtering techniques. Advances in artificial intelligence, 2009.
- Transformer memory as a differentiable search index. Advances in Neural Information Processing Systems, 35:21831–21843.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Mengting Wan and Julian McAuley. 2018. Item recommendation on monotonic behavior chains. In Proceedings of the 12th ACM conference on recommender systems, pages 86–94.
- Videoclip: Contrastive pre-training for zero-shot video-text understanding. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6787–6800.
- Personalized complementary product recommendation. In Companion Proceedings of the Web Conference 2022, pages 146–151.
- Personalized showcases: Generating multi-modal explanations for recommendations. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2251–2255.
- Unsupervised legal evidence retrieval via contrastive learning with approximate aggregated positive. In AAAI.
- Tenrec: A large-scale multipurpose benchmark dataset for recommender systems. Advances in Neural Information Processing Systems, 35:11480–11493.
- Where to go next for recommender systems? id-vs. modality-based recommender models revisited. In SIGIR.
- E-bert: A phrase and product knowledge enhanced language model for e-commerce. arXiv e-prints, pages arXiv–2009.
- Scaling law of large sequential recommendation models. arXiv preprint arXiv:2311.11351.
- Recommendation as instruction following: A large language model empowered recommendation approach. arXiv preprint arXiv:2305.07001.
- Feature-level deeper self-attention network for sequential recommendation. In IJCAI.
- Dense text retrieval based on pretrained language models: A survey. arXiv preprint arXiv:2211.14876.
- Recbole: Towards a unified, comprehensive and efficient framework for recommendation algorithms. In CIKM.
- A survey of large language models. arXiv preprint arXiv:2303.18223.
- S3-rec: Self-supervised learning for sequential recommendation with mutual information maximization. In CIKM.
- Improving conversational recommender systems via knowledge graph based semantic fusion. In KDD.
- Don’t make your llm an evaluation benchmark cheater. arXiv preprint arXiv:2311.01964.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.