Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Robustness Analysis of E-Commerce Ranking System (2403.04257v1)

Published 7 Mar 2024 in cs.IR

Abstract: Information retrieval (IR) is a pivotal component in various applications. Recent advances in ML have enabled the integration of ML algorithms into IR, particularly in ranking systems. While there is a plethora of research on the robustness of ML-based ranking systems, these studies largely neglect commercial e-commerce systems and fail to establish a connection between real-world and manipulated query relevance. In this paper, we present the first systematic measurement study on the robustness of e-commerce ranking systems. We define robustness as the consistency of ranking outcomes for semantically identical queries. To quantitatively analyze robustness, we propose a novel metric that considers both ranking position and item-specific information that are absent in existing metrics. Our large-scale measurement study with real-world data from e-commerce retailers reveals an open opportunity to measure and improve robustness since semantically identical queries often yield inconsistent ranking results. Based on our observations, we propose several solution directions to enhance robustness, such as the use of LLMs. Note that the issue of robustness discussed herein does not constitute an error or oversight. Rather, in scenarios where there exists a vast array of choices, it is feasible to present a multitude of products in various permutations, all of which could be equally appealing. However, this extensive selection may lead to customer confusion. As e-commerce retailers use various techniques to improve the quality of search results, we hope that this research offers valuable guidance for measuring the robustness of the ranking systems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (75)
  1. Eric Arrington. 2019. WHAT IS LATENCY AND HOW MUCH IS IT COSTING YOU. (2019). https://akfpartners.com/growth-blog/what-is-latency
  2. Recent advances in adversarial training for adversarial robustness. arXiv preprint arXiv:2102.01356 (2021).
  3. Apache lucene 4. In SIGIR 2012 workshop on open source information retrieval. 17.
  4. Invisible for both camera and lidar: Security of multi-sensor fusion based perception in autonomous driving under physical-world attacks. In 2021 IEEE Symposium on Security and Privacy (SP). IEEE, 176–194.
  5. Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp). Ieee, 39–57.
  6. Expected reciprocal rank for graded relevance. In Proceedings of the 18th ACM conference on Information and knowledge management. 621–630.
  7. Defense of Adversarial Ranking Attack in Text Retrieval: Benchmark and Baseline via Detection. arXiv preprint arXiv:2307.16816 (2023).
  8. H-ernie: A multi-granularity pre-trained language model for web search. In Proceedings of the 45th International ACM SIGIR conference on research and development in information retrieval. 1478–1489.
  9. Tukur Dahiru. 2008. P–VALUE, A TRUE TEST OF STATISTICAL SIGNIFICANCE? A CAUTIONARY NOTE. Annals of Ibadan postgraduate medicine 6, 1 (2008), 21–26.
  10. Neural ranking models with weak supervision. In Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval. 65–74.
  11. Tomasz Drabas and Denny Lee. 2017. Learning PySpark. Packt Publishing Ltd.
  12. Verifai: A toolkit for the formal design and analysis of artificial intelligence-based systems. In International Conference on Computer Aided Verification. Springer, 432–442.
  13. Personalised information retrieval: survey and classification. User Modeling and User-Adapted Interaction 23 (2013), 381–443.
  14. Morgane Goibert and Elvis Dohmatob. 2019. Adversarial robustness via label-smoothing. arXiv preprint arXiv:1906.11567 (2019).
  15. Prakhar Gupta and Yulia Tsvetkov. 2021. Synthesizing Adversarial Negative Responses for Robust Response Ranking and Evaluation. Computational linguistics Association for Computational Linguistics (2021).
  16. Achieving Verified Robustness to Symbol Substitutions via Interval Bound Propagation. In Empirical Methods in Natural Language Processing (EMNLP). 4081–4091.
  17. Behavior-driven query similarity prediction based on pre-trained language models for e-commerce search. In SIGIR 2023 Workshop on eCommerce. https://www.amazon.science/publications/behavior-driven-query-similarity-prediction-based-on-pre-trained-language-models-for-e-commerce-search
  18. Maor Ivgi and Jonathan Berant. 2021. Achieving Model Robustness through Discrete Adversarial Training. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 1529–1544.
  19. Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS) 20, 4 (2002), 422–446.
  20. Certified Robustness to Adversarial Word Substitutions. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (Eds.). Association for Computational Linguistics, Hong Kong, China, 4129–4142. https://doi.org/10.18653/v1/D19-1423
  21. Maurice G Kendall. 1938. A New Measure of Rank Correlation. Biometrika 30, 1/2 (1938), 81–93.
  22. Distilling robust and non-robust features in adversarial examples by information bottleneck. Advances in Neural Information Processing Systems 34 (2021), 17148–17159.
  23. POPQORN: Quantifying robustness of recurrent neural networks. In International Conference on Machine Learning. PMLR, 3468–3477.
  24. Mei Kobayashi and Koichi Takeda. 2000. Information retrieval on the web. ACM computing surveys (CSUR) 32, 2 (2000), 144–173.
  25. Anis Koubaa. 2023. GPT-4 vs. GPT-3.5: A Concise Showdown. (2023).
  26. Sok: Certified robustness for deep neural networks. In 2023 IEEE Symposium on Security and Privacy (SP). IEEE, 1289–1310.
  27. Character-level white-box adversarial attacks against transformers via attachable subwords substitution. arXiv preprint arXiv:2210.17004 (2022).
  28. Order-Disorder: Imitation Adversarial Attacks for Black-box Neural Ranking Models. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security. 2025–2039.
  29. A robust adversarial training approach to machine reading comprehension. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 8392–8400.
  30. Black-box Adversarial Attacks against Dense Retrieval Models: A Multi-view Contrastive Learning Method. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 1647–1656.
  31. Topic-oriented Adversarial Attacks against Black-box Neural Ranking Models. arXiv preprint arXiv:2304.14867 (2023).
  32. Zhiye Liu. 2023. ChatGPT Will Command More Than 30,000 Nvidia GPUs: Report. (2023). https://www.tomshardware.com/news/chatgpt-nvidia-30000-gpus
  33. SlowTrack: Increasing the Latency of Camera-Based Perception in Autonomous Driving Using Adversarial Examples. In AAAI 2024.
  34. A Strong Baseline for Query Efficient Attacks in a Black Box Setting. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 8396–8409.
  35. Introduction to Information Retrieval. Cambridge University Press.
  36. TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP. EMNLP 2020 (2020), 119.
  37. Frequency-guided word substitutions for detecting textual adversarial examples. arXiv preprint arXiv:2004.05887 (2020).
  38. Mavuto M Mukaka. 2012. A guide to appropriate use of correlation coefficient in medical research. Malawi medical journal 24, 3 (2012), 69–71.
  39. Behrooz Parhami. 1994. Voting Algorithms. IEEE transactions on reliability 43, 4 (1994), 617–629.
  40. Prolific. 2019. Definitive human data to deliver world-leading research and AI. (2019). https://www.prolific.co/
  41. Understanding and Mitigating the Tradeoff between Robustness and Accuracy. In International Conference on Machine Learning. PMLR, 7909–7919.
  42. Stephen E Robertson and Steve Walker. 1994. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In SIGIR’94: Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, organised by Dublin City University. Springer, 232–241.
  43. Philip Sedgwick. 2014. Spearman’s rank correlation coefficient. Bmj 349 (2014).
  44. Sok: On the semantic ai security in autonomous driving. arXiv preprint arXiv:2203.05314 (2022).
  45. Robustness Verification for Transformers. In International Conference on Learning Representations.
  46. Walter Simoncini and Gerasimos Spanakis. 2021. SeqAttack: On adversarial attacks for named entity recognition. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 308–318.
  47. Adversarial Semantic Collisions. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 4198–4210.
  48. Towards efficient and effective adversarial training. Advances in Neural Information Processing Systems 34 (2021), 11821–11833.
  49. Julián Urbano and Mónica Marrero. 2017. The treatment of ties in AP correlation. In Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval. 321–324.
  50. Attention is all you need. Advances in neural information processing systems 30 (2017).
  51. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods 17 (2020), 261–272. https://doi.org/10.1038/s41592-019-0686-2
  52. Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2).
  53. Does Physical Adversarial Example Really Matter to Autonomous Driving? Towards System-Level Effect of Adversarial Object Evasion Attack. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 4412–4423.
  54. Certified robustness to word substitution attack with differential privacy. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1102–1112.
  55. Evaluating search result diversity using intent hierarchies. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 415–424.
  56. How well do offline metrics predict online performance of product ranking models?. In Proceedings of the 46th International ACM SIGIR conference on research and development in information retrieval.
  57. Search result diversity evaluation based on intent hierarchies. IEEE Transactions on Knowledge and Data Engineering 30, 1 (2017), 156–169.
  58. Kdgan: Knowledge distillation with generative adversarial networks. Advances in neural information processing systems 31 (2018).
  59. Adversarial distillation for learning with privileged provisions. IEEE transactions on pattern analysis and machine intelligence 43, 3 (2019), 786–797.
  60. BERT rankers are brittle: a study using adversarial document perturbations (2022). DOI: https://doi. org/10.48550/ARXIV 2206 ([n. d.]).
  61. A theoretical analysis of NDCG type ranking measures. In Conference on learning theory. PMLR, 25–54.
  62. Prada: practical black-box adversarial attacks against neural ranking models. ACM Transactions on Information Systems 41, 4 (2023), 1–27.
  63. Adversarial attacks and defenses in images, graphs and text: A review. International Journal of Automation and Computing 17 (2020), 151–178.
  64. Automatic perturbation analysis for scalable certified robustness and beyond. Advances in Neural Information Processing Systems 33, 1129–1141.
  65. Feature squeezing: Detecting adversarial examples in deep neural networks. NDSS.
  66. On the Certified Robustness for Ensemble Models and Beyond. In International Conference on Learning Representations.
  67. SAFER: A Structure-free Approach for Certified Robustness to Adversarial Word Substitutions. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault (Eds.). Association for Computational Linguistics, Online, 3465–3475. https://doi.org/10.18653/v1/2020.acl-main.317
  68. TextHoaxer: budgeted hard-label adversarial attacks on text. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 3877–3884.
  69. A new rank correlation coefficient for information retrieval. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. 587–594.
  70. Bridge the gap between cv and nlp! a gradient-based textual adversarial attack framework. arXiv preprint arXiv:2110.15317 (2021).
  71. Neural ranking models with multiple document fields. In Proceedings of the eleventh ACM international conference on web search and data mining. 700–708.
  72. OpenAttack: An Open-source Textual Adversarial Attack Toolkit. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations. 363–371.
  73. Theoretically principled trade-off between robustness and accuracy. In International conference on machine learning. PMLR, 7472–7482.
  74. Interpretable deep learning under fire. In 29th {normal-{\{{USENIX}normal-}\}} Security Symposium ({normal-{\{{USENIX}normal-}\}} Security 20).
  75. FreeLB: Enhanced Adversarial Training for Natural Language Understanding. In International Conference on Learning Representations.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Ningfei Wang (12 papers)
  2. Yupin Huang (2 papers)
  3. Han Cheng (8 papers)
  4. Jiri Gesi (8 papers)
  5. Xiaojie Wang (108 papers)
  6. Vivek Mittal (4 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.