PSL: Rethinking and Improving Softmax Loss from Pairwise Perspective for Recommendation (2411.00163v1)
Abstract: Softmax Loss (SL) is widely applied in recommender systems (RS) and has demonstrated effectiveness. This work analyzes SL from a pairwise perspective, revealing two significant limitations: 1) the relationship between SL and conventional ranking metrics like DCG is not sufficiently tight; 2) SL is highly sensitive to false negative instances. Our analysis indicates that these limitations are primarily due to the use of the exponential function. To address these issues, this work extends SL to a new family of loss functions, termed Pairwise Softmax Loss (PSL), which replaces the exponential function in SL with other appropriate activation functions. While the revision is minimal, we highlight three merits of PSL: 1) it serves as a tighter surrogate for DCG with suitable activation functions; 2) it better balances data contributions; and 3) it acts as a specific BPR loss enhanced by Distributionally Robust Optimization (DRO). We further validate the effectiveness and robustness of PSL through empirical experiments. The code is available at https://github.com/Tiny-Snow/IR-Benchmark.
- A survey of recommendation systems: recommendation models, techniques, and application fields. Electronics, 11(1):141, 2022.
- Deep learning based recommender system: A survey and new perspectives. ACM computing surveys (CSUR), 52(1):1–38, 2019.
- Aligning distillation for cold-start item recommendation. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1147–1157, 2023.
- Large language model interaction simulator for cold-start item recommendation. arXiv preprint arXiv:2402.09176, 2024.
- Tie-Yan Liu et al. Learning to rank for information retrieval. Foundations and Trends® in Information Retrieval, 3(3):225–331, 2009.
- Ir evaluation methods for retrieving highly relevant documents. In ACM SIGIR Forum, volume 51, pages 243–250. ACM New York, NY, USA, 2017.
- Optimizing reciprocal rank with bayesian average for improved next item recommendation. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2236–2240, 2023.
- Neural collaborative filtering. In Proceedings of the 26th international conference on world wide web, pages 173–182, 2017.
- Neural factorization machines for sparse predictive analytics. In Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval, pages 355–364, 2017.
- Bpr: Bayesian personalized ranking from implicit feedback. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pages 452–461, 2009.
- On the effectiveness of sampled softmax loss for item recommendation. ACM Transactions on Information Systems, 42(4):1–26, 2024a.
- Self-supervised learning: Generative or contrastive. IEEE transactions on knowledge and data engineering, 35(1):857–876, 2021.
- Understanding contrastive learning via distributionally robust optimization. Advances in Neural Information Processing Systems, 36, 2024b.
- An analysis of the softmax cross entropy loss for learning-to-rank with binary relevance. In Proceedings of the 2019 ACM SIGIR international conference on theory of information retrieval, pages 75–78, 2019.
- Bsl: Understanding and improving softmax loss for recommendation. In 2024 IEEE 40th International Conference on Data Engineering (ICDE), pages 816–830. IEEE, 2024c.
- Bias and debias in recommender system: A survey and future directions. ACM Transactions on Information Systems, 41(3):1–39, 2023a.
- Autodebias: Learning to debias for recommendation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 21–30, 2021.
- Llm4dsr: Leveraing large language model for denoising sequential recommendation. arXiv preprint arXiv:2408.08208, 2024a.
- Alexander Shapiro. Distributionally robust stochastic programming. SIAM Journal on Optimization, 27(4):2258–2275, 2017.
- Invariant collaborative filtering to popularity distribution shift. In Proceedings of the ACM Web Conference 2023, pages 1240–1251, 2023.
- Popularity bias is not always evil: Disentangling benign and harmful bias for recommendation. IEEE Transactions on Knowledge and Data Engineering, 35(10):9920–9931, 2022.
- Lightgcn: Simplifying and powering graph convolution network for recommendation. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pages 639–648, 2020.
- Foundations of machine learning. MIT press, 2018.
- Distributionally robust graph-based recommendation system. arXiv preprint arXiv:2402.12994, 2024b.
- A survey of collaborative filtering techniques. Advances in artificial intelligence, 2009, 2009.
- Matrix factorization techniques for recommender systems. Computer, 42(8):30–37, 2009.
- Adap-τ𝜏\tauitalic_τ: Adaptively modulating embedding magnitude for recommendation. In Proceedings of the ACM Web Conference 2023, pages 1085–1096, 2023b.
- Microsoft recommenders: best practices for production-ready recommendation systems. In Companion Proceedings of the Web Conference 2020, pages 50–51, 2020.
- Recommendation systems: Algorithms, challenges, metrics, and business opportunities. applied sciences, 10(21):7748, 2020.
- How good your recommender system is? a survey on evaluations in recommendation. International Journal of Machine Learning and Cybernetics, 10:813–831, 2019.
- A guided learning approach for item recommendation via surrogate loss learning. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 605–613, 2021.
- Statistical inference. CRC Press, 2024.
- Learning to rank: from pairwise approach to listwise approach. In Proceedings of the 24th international conference on Machine learning, pages 129–136, 2007.
- Johan Ludwig William Valdemar Jensen. Sur les fonctions convexes et les inégalités entre les valeurs moyennes. Acta mathematica, 30(1):175–193, 1906.
- Doro: Distributional and outlier robust optimization. In International Conference on Machine Learning, pages 12345–12355. PMLR, 2021.
- Outlier-robust wasserstein dro. Advances in Neural Information Processing Systems, 36, 2024.
- Kullback-leibler divergence constrained distributionally robust optimization. Available at Optimization Online, 1(2):9, 2013.
- Empowering collaborative filtering with principled adversarial contrastive loss. Advances in Neural Information Processing Systems, 36, 2024.
- Model-agnostic counterfactual reasoning for eliminating popularity bias in recommender system. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 1791–1800, 2021.
- Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In proceedings of the 25th international conference on world wide web, pages 507–517, 2016a.
- Image-based recommendations on styles and substitutes. In Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, pages 43–52, 2015.
- Friendship and mobility: user movement in location-based social networks. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1082–1090, 2011.
- Yelp. Yelp dataset. https://www.yelp.com/dataset, 2018.
- Lower-left partial auc: An effective and efficient optimization metric for recommendation. arXiv preprint arXiv:2403.00844, 2024.
- Xsimgcl: Towards extremely simple graph contrastive learning for recommendation. IEEE Transactions on Knowledge and Data Engineering, 2023.
- Beyond triplet loss: a deep quadruplet network for person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 403–412, 2017.
- Handbook of neural computation. CRC Press, 2020.
- Latent relational metric learning via memory-based attention for collaborative ranking. In Proceedings of the 2018 world wide web conference, pages 729–739, 2018.
- Indexing by latent semantic analysis. Journal of the American society for information science, 41(6):391–407, 1990.
- Modeling relationships at multiple scales to improve accuracy of large recommender systems. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 95–104, 2007.
- Yehuda Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 426–434, 2008.
- Graph neural networks in recommender systems: a survey. ACM Computing Surveys, 55(5):1–37, 2022a.
- Graph neural networks for recommender system. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, pages 1623–1625, 2022.
- Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
- Neural graph collaborative filtering. In Proceedings of the 42nd international ACM SIGIR conference on Research and development in Information Retrieval, pages 165–174, 2019.
- On the equivalence of decoupled graph convolution network and label propagation. In Proceedings of the Web Conference 2021, pages 3651–3662, 2021.
- Graph convolution machine for context-aware recommender system. Frontiers of Computer Science, 16(6):166614, 2022b.
- Macro graph neural networks for online billion-scale recommender systems. In Proceedings of the ACM on Web Conference 2024, pages 3598–3608, 2024.
- Graph convolutional network for recommendation with low-pass collaborative filters. In International Conference on Machine Learning, pages 10936–10945. PMLR, 2020.
- Adaptive popularity debiasing aggregator for graph collaborative filtering. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 7–17, 2023.
- Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
- Self-supervised graph learning for recommendation. In Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval, pages 726–735, 2021.
- Autoloss: Automated loss function search in recommendations. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 3959–3967, 2021.
- R Tyrrell Rockafellar and Roger J-B Wets. Variational analysis, volume 317. Springer Science & Business Media, 2009.
- Convex optimization. Cambridge university press, 2004.
- Vbpr: visual bayesian personalized ranking from implicit feedback. In Proceedings of the AAAI conference on artificial intelligence, volume 30, 2016b.
- A survey on contrastive self-supervised learning. Technologies, 9(1):2, 2020.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Incorporating second-order functional knowledge for better option pricing. Advances in neural information processing systems, 13, 2000.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.