Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Categorical, Ratio, and Professorial Data: The Case for Reciprocal Rank (2312.12672v1)

Published 20 Dec 2023 in cs.IR

Abstract: Search engine results pages are usually abstracted as binary relevance vectors and hence are categorical data, meaning that only a limited set of operations is permitted, most notably tabulation of occurrence frequencies, with determination of medians and averages not possible. To compare retrieval systems it is thus usual to make use of a categorical-to-numeric effectiveness mapping. A previous paper has argued that any desired categorical-to-numeric mapping may be used, provided only that there is an argued connection between each category of SERP and the score that is assigned to that category by the mapping. Further, once that plausible connection has been established, then the mapped values can be treated as real-valued observations on a ratio scale, allowing the computation of averages. This article is written in support of that point of view, and to respond to ongoing claims that SERP scores may only be averaged if very restrictive conditions are imposed on the effectiveness mapping.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (14)
  1. F. Diaz. Best-case retrieval evaluation: Improving the sensitivity of reciprocal rank with lexicographic precision, 2023. arXiv:2306.07908v1.
  2. F. Diaz and A. Ferraro. Offline retrieval evaluation without evaluation metrics. In Proc. ACM Int. Conf. on Research and Development in Information Retrieval (SIGIR), pages 599–609, 2022.
  3. Scales of evaluation measures: From theory to experimentation. In Proc. Italian Inf. Retr. Wrkshp., pages 10–11, 2019a.
  4. A general theory of IR evaluation measures. IEEE Transactions on Knowledge and Data Engineering, 31(3):409–422, 2019b.
  5. How do interval scales help us with better understanding IR evaluation measures? Information Retrieval, 23(3):289–317, 2020.
  6. Towards meaningful statements in IR evaluation: Mapping evaluation measures to interval scales. IEEE Access, 9:136182–136216, 2021.
  7. Response to Moffat’s comment on “Towards meaningful statements in IR evaluation: Mapping evaluation measures to interval scales”. arXiv2212.11735, 2022.
  8. N. Fuhr. Some common mistakes in IR evaluation, and how they can be avoided. SIGIR Forum, 51(3):32–41, 2017.
  9. A. Moffat. Batch evaluation metrics in information retrieval: Measures, scales, and meaning. IEEE Access, 10:105564–105577, 2022.
  10. A. Moffat and J. Mackenzie. How much freedom does an effectiveness metric really have? arXiv:2309.09477, 2023.
  11. A flexible framework for offline effectiveness metrics. In Proc. ACM Int. Conf. on Research and Development in Information Retrieval (SIGIR), pages 578–587, 2022.
  12. T. Sakai. On Fuhr’s guideline for IR evaluation. SIGIR Forum, 54(1):12:1–12:8, 2020.
  13. S. S. Stevens. On the theory of scales of measurement. Science, 103(2684):677–680, 1946.
  14. J. Zobel. When measurement misleads: The limits of batch assessment of retrieval systems. SIGIR Forum, 56(1):1–20, 2022.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com