Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Session-level Normalization and Click-through Data Enhancement for Session-based Evaluation (2401.12445v1)

Published 23 Jan 2024 in cs.IR

Abstract: Since a user usually has to issue a sequence of queries and examine multiple documents to resolve a complex information need in a search session, researchers have paid much attention to evaluating search systems at the session level rather than the single-query level. Most existing session-level metrics evaluate each query separately and then aggregate the query-level scores using a session-level weighting function. The assumptions behind these metrics are that all queries in the session should be involved, and their orders are fixed. However, if a search system could make the user satisfied with her first few queries, she may not need any subsequent queries. Besides, in most real-world search scenarios, due to a lack of explicit feedback from real users, we can only leverage some implicit feedback, such as users' clicks, as relevance labels for offline evaluation. Such implicit feedback might be different from the real relevance in a search session as some documents may be omitted in the previous query but identified in the later reformulations. To address the above issues, we make two assumptions about session-based evaluation, which explicitly describe an ideal session-search system and how to enhance click-through data in computing session-level evaluation metrics. Based on our assumptions, we design a session-level metric called Normalized U-Measure (NUM). NUM evaluates a session as a whole and utilizes an ideal session to normalize the result of the actual session. Besides, it infers session-level relevance labels based on implicit feedback. Experiments on two public datasets demonstrate the effectiveness of NUM by comparing it with existing session-based metrics in terms of correlation with user satisfaction and intuitiveness. We also conduct ablation studies to explore whether these assumptions hold.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. The relationship between IR effectiveness measures and user satisfaction. In SIGIR 2007: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, The Netherlands, July 23-27, 2007. ACM, 773–774. https://doi.org/10.1145/1277741.1277902
  2. Integrating Representation and Interaction for Context-Aware Document Ranking. ACM Trans. Inf. Syst. (2022). https://doi.org/10.1145/3529955
  3. Overview of the NTCIR-16 Session Search (SS) Task. Proceedings of NTCIR-16. to appear (2022).
  4. Factors determining the performance of indexing systems,(Volume 1: Design). Cranfield: College of Aeronautics 28 (1966).
  5. Convolutional Neural Networks for Soft-Matching N-Grams in Ad-hoc Search. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, WSDM 2018, Marina Del Rey, CA, USA, February 5-9, 2018, Yi Chang, Chengxiang Zhai, Yan Liu, and Yoelle Maarek (Eds.). ACM, 126–134. https://doi.org/10.1145/3159652.3159659
  6. Convolutional Neural Network Architectures for Matching Natural Language Sentences. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada. 2042–2050.
  7. Scott B. Huffman and Michael Hochster. 2007. How well does result relevance predict session satisfaction?. In SIGIR 2007: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, The Netherlands, July 23-27, 2007. ACM, 567–574. https://doi.org/10.1145/1277741.1277839
  8. Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20, 4 (2002), 422–446. https://doi.org/10.1145/582415.582418
  9. Discounted Cumulated Gain Based Evaluation of Multiple-Query IR Sessions. In Advances in Information Retrieval , 30th European Conference on IR Research, ECIR 2008, Glasgow, UK, March 30-April 3, 2008. Proceedings, Vol. 4956. 4–15. https://doi.org/10.1007/978-3-540-78646-7_4
  10. Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search. ACM Trans. Inf. Syst. 25, 2 (2007), 7. https://doi.org/10.1145/1229179.1229181
  11. Rosie Jones and Kristina Lisa Klinkner. 2008. Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs. In Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM 2008, Napa Valley, California, USA, October 26-30, 2008. ACM, 699–708. https://doi.org/10.1145/1458082.1458176
  12. Evaluating multi-query sessions. In Proceeding of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, Beijing, China, July 25-29, 2011. ACM, 1053–1062. https://doi.org/10.1145/2009916.2010056
  13. Overview of the TREC 2010 Session Track. In Proceedings of The Nineteenth Text REtrieval Conference, TREC 2010, Gaithersburg, Maryland, USA, November 16-19, 2010 (NIST Special Publication), Vol. 500-294. National Institute of Standards and Technology (NIST).
  14. Diane Kelly. 2009. Methods for Evaluating Interactive Information Retrieval Systems with Users. Found. Trends Inf. Retr. 3, 1-2 (2009), 1–224. https://doi.org/10.1561/1500000012
  15. Maurice G Kendall. 1938. A new measure of rank correlation. Biometrika 30, 1/2 (1938), 81–93.
  16. Good abandonment in mobile and PC internet search. In Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009, Boston, MA, USA, July 19-23, 2009. ACM, 43–50. https://doi.org/10.1145/1571941.1571951
  17. From a User Model for Query Sessions to Session Rank Biased Precision (sRBP). In Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval, ICTIR 2019, Santa Clara, CA, USA, October 2-5, 2019. ACM, 109–116. https://doi.org/10.1145/3341981.3344216
  18. Towards Designing Better Session Search Evaluation Metrics. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR 2018, Ann Arbor, MI, USA, July 08-12, 2018. ACM, 1121–1124. https://doi.org/10.1145/3209978.3210097
  19. Investigating Cognitive Effects in Session-level Search User Satisfaction. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, August 4-8, 2019. ACM, 923–931. https://doi.org/10.1145/3292500.3330981
  20. Learning to Match using Local and Distributed Representations of Text for Web Search. In Proceedings of the 26th International Conference on World Wide Web, WWW 2017, Perth, Australia, April 3-7, 2017. ACM, 1291–1299. https://doi.org/10.1145/3038912.3052579
  21. Alistair Moffat and Justin Zobel. 2008. Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Inf. Syst. 27, 1 (2008), 2:1–2:27. https://doi.org/10.1145/1416950.1416952
  22. Semantic components enhance retrieval of domain-specific documents. In Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, CIKM 2007, Lisbon, Portugal, November 6-10, 2007. ACM, 429–438. https://doi.org/10.1145/1321440.1321502
  23. Contextual Re-Ranking with Behavior Aware Transformers. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, SIGIR 2020, Virtual Event, China, July 25-30, 2020. ACM, 1589–1592. https://doi.org/10.1145/3397271.3401276
  24. Tetsuya Sakai. 2012. Evaluation with informational and navigational intents. In Proceedings of the 21st World Wide Web Conference 2012, WWW 2012, Lyon, France, April 16-20, 2012. ACM, 499–508. https://doi.org/10.1145/2187836.2187904
  25. Tetsuya Sakai and Zhicheng Dou. 2013. Summaries, ranked retrieval and sessions: a unified framework for information access evaluation. In The 36th International ACM SIGIR conference on research and development in Information Retrieval, SIGIR ’13, Dublin, Ireland - July 28 - August 01, 2013. ACM, 473–482. https://doi.org/10.1145/2484028.2484031
  26. Charles Spearman. 1961. The proof and measurement of association between two things. (1961).
  27. Standardized assessment of reading performance: The new international reading speed texts IReST. Investigative ophthalmology & visual science 53, 9 (2012), 5452–5461.
  28. A Markovian Approach to Evaluate Session-Based IR Systems. In Advances in Information Retrieval - 41st European Conference on IR Research, ECIR 2019, Cologne, Germany, April 14-18, 2019, Proceedings, Part I, Vol. 11437. Springer, 621–635. https://doi.org/10.1007/978-3-030-15712-8_40
  29. Learning to extract cross-session search tasks. WWW 2013 - Proceedings of the 22nd International Conference on World Wide Web (2013), 1353–1363. https://doi.org/10.1145/2488388.2488507
  30. Alfan Farizki Wicaksono and Alistair Moffat. 2021. Modeling search and session effectiveness. Inf. Process. Manag. 58, 4 (2021), 102601. https://doi.org/10.1016/j.ipm.2021.102601
  31. End-to-End Neural Ad-hoc Ranking with Kernel Pooling. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, August 7-11, 2017. ACM, 55–64. https://doi.org/10.1145/3077136.3080809
  32. Yiming Yang and Abhimanyu Lad. 2009. Modeling Expected Utility of Multi-session Information Distillation. In Advances in Information Retrieval Theory, Second International Conference on the Theory of Information Retrieval, ICTIR 2009, Cambridge, UK, September 10-12, 2009, Proceedings (Lecture Notes in Computer Science), Vol. 5766. Springer, 164–175. https://doi.org/10.1007/978-3-642-04417-5_15
  33. Cascade or Recency: Constructing Better Evaluation Metrics for Session Search. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, SIGIR 2020, Virtual Event, China, July 25-30, 2020. ACM, 389–398. https://doi.org/10.1145/3397271.3401163
  34. Contrastive Learning of User Behavior Sequence for Context-Aware Document Ranking. In CIKM ’21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1 - 5, 2021. ACM, 2780–2791. https://doi.org/10.1145/3459637.3482243
  35. Improving Session Search by Modeling Multi-Granularity Historical Query Change. In WSDM ’22, The Fifteenth ACM International Conference on Web Search and Data Mining, February 21-25, 2022, Tempe, AZ, USA. ACM. https://doi.org/10.1145/3488560.3498415
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Haonan Chen (49 papers)
  2. Zhicheng Dou (113 papers)
  3. Jiaxin Mao (47 papers)