Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 91 tok/s
Gemini 2.5 Pro 58 tok/s Pro
GPT-5 Medium 29 tok/s
GPT-5 High 29 tok/s Pro
GPT-4o 102 tok/s
GPT OSS 120B 462 tok/s Pro
Kimi K2 181 tok/s Pro
2000 character limit reached

Using Interventions to Improve Out-of-Distribution Generalization of Text-Matching Recommendation Systems (2210.10636v2)

Published 7 Oct 2022 in cs.IR and cs.LG

Abstract: Given a user's input text, text-matching recommender systems output relevant items by comparing the input text to available items' description, such as product-to-product recommendation on e-commerce platforms. As users' interests and item inventory are expected to change, it is important for a text-matching system to generalize to data shifts, a task known as out-of-distribution (OOD) generalization. However, we find that the popular approach of fine-tuning a large, base LLM on paired item relevance data (e.g., user clicks) can be counter-productive for OOD generalization. For a product recommendation task, fine-tuning obtains worse accuracy than the base model when recommending items in a new category or for a future time period. To explain this generalization failure, we consider an intervention-based importance metric, which shows that a fine-tuned model captures spurious correlations and fails to learn the causal features that determine the relevance between any two text inputs. Moreover, standard methods for causal regularization do not apply in this setting, because unlike in images, there exist no universally spurious features in a text-matching task (the same token may be spurious or causal depending on the text it is being matched to). For OOD generalization on text inputs, therefore, we highlight a different goal: avoiding high importance scores for certain features. We do so using an intervention-based regularizer that constraints the causal effect of any token on the model's relevance score to be similar to the base model. Results on Amazon product and 3 question recommendation datasets show that our proposed regularizer improves generalization for both in-distribution and OOD evaluation, especially in difficult scenarios when the base model is not accurate.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. The unfairness of popularity bias in recommendation. arXiv preprint arXiv:1907.13286.
  2. Gediminas Adomavicius and Alexander Tuzhilin. 2005. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE transactions on knowledge and data engineering, 17(6):734–749.
  3. Jasmijn Bastings and Katja Filippova. 2020. The elephant in the interpretability room: Why use attention as explanation when we have saliency methods? arXiv preprint arXiv:2010.05607.
  4. The extreme classification repository: Multi-label datasets and code.
  5. Search advertising using web relevance feedback. In Proceedings of the 17th ACM conference on information and knowledge management, pages 1013–1022.
  6. Domain generalization by mutual-information regularization with pre-trained models. arXiv preprint arXiv:2203.10789.
  7. Pre-training tasks for embedding-based large-scale retrieval. arXiv preprint arXiv:2002.03932.
  8. Siamesexml: Siamese networks meet extreme classifiers with 100m labels. In ICML.
  9. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  10. Luyu Gao and Jamie Callan. 2021. Condenser: a pre-training architecture for dense retrieval. arXiv preprint arXiv:2104.08253.
  11. Simcse: Simple contrastive learning of sentence embeddings. arXiv preprint arXiv:2104.08821.
  12. Causpref: Causal preference learning for out-of-distribution recommendation. arXiv preprint arXiv:2202.03984.
  13. Pretrained transformers improve out-of-distribution robustness. arXiv preprint arXiv:2004.06100.
  14. Learning personalized item-to-item recommendation metric via implicit feedback. arXiv preprint arXiv:2203.12598.
  15. Dense passage retrieval for open-domain question answering. arXiv preprint arXiv:2004.04906.
  16. Fine-tuning can distort pretrained features and underperform out-of-distribution. arXiv preprint arXiv:2202.10054.
  17. Latent retrieval for weakly supervised open domain question answering. arXiv preprint arXiv:1906.00300.
  18. Diversify and disambiguate: Learning from underspecified data. arXiv preprint arXiv:2202.03418.
  19. Amazon. com recommendations: Item-to-item collaborative filtering. IEEE Internet computing, 7(1):76–80.
  20. Domain generalization using causal matching. In International Conference on Machine Learning, pages 7313–7324. PMLR.
  21. Causally motivated shortcut removal using auxiliary labels. In International Conference on Artificial Intelligence and Statistics, pages 739–766. PMLR.
  22. Eclare: Extreme classification with label graph correlations. In Proceedings of the Web Conference 2021, pages 3721–3732.
  23. Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084.
  24. Beyond accuracy: Behavioral testing of nlp models with checklist. arXiv preprint arXiv:2005.04118.
  25. Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. arXiv preprint arXiv:1911.08731.
  26. Galaxc: Graph neural networks with labelwise attention for extreme classification. In Proceedings of the Web Conference 2021, pages 3733–3744.
  27. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.
  28. Tobias Schnabel and Paul N Bennett. 2020. Debiasing item-to-item recommendations with small annotated datasets. In Fourteenth ACM Conference on Recommender Systems, pages 73–81.
  29. Adversarial domain adaptation for duplicate question detection. arXiv preprint arXiv:1809.02255.
  30. Augmented sbert: Data augmentation method for improving bi-encoders for pairwise sentence scoring tasks. arXiv preprint arXiv:2010.08240.
  31. Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461.
  32. Generalizing to unseen domains: A survey on domain generalization. IEEE Transactions on Knowledge and Data Engineering.
  33. Robust fine-tuning of zero-shot models. arXiv preprint arXiv:2109.01903.
  34. Clear: Contrastive learning for sentence representation. arXiv preprint arXiv:2012.15466.
  35. Contrastive learning for sequential recommendation. arXiv preprint arXiv:2010.14395.
  36. Approximate nearest neighbor negative contrastive learning for dense text retrieval. In International Conference on Learning Representations.
  37. Approximate nearest neighbor negative contrastive learning for dense text retrieval. In ICLR.
  38. Improving out-of-distribution robustness via selective augmentation. arXiv preprint arXiv:2201.00299.
  39. Are graph augmentations necessary? simple graph contrastive learning for recommendation.
  40. Self-supervised learning for recommender systems: A survey. arXiv preprint arXiv:2203.15876.
  41. Contrastive learning for debiased candidate generation in large-scale recommender systems. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 3985–3995.
  42. Domain generalization in vision: A survey. arXiv preprint arXiv:2103.02503.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.