Consistent Difficulty Calibration Across LSC Editions

Establish a fully consistent, standardized protocol for calibrating task difficulty across different editions of the ACM Lifelog Search Challenge to enable fair year-over-year comparisons of interactive lifelog retrieval systems under the live competitive format.

Background

The paper compares interactive lifelog retrieval systems across LSC'22, LSC'23, and LSC'24, noting that task difficulty can vary year-to-year, especially for subjective or open-ended tasks. Although average scores per task type are used to mitigate variability, achieving a consistent difficulty calibration remains challenging.

The authors highlight that ensuring fully comparable difficulty across editions may require re-running historical systems on current tasks with the same users and interfaces, which is not feasible in the live competition setup. Hence, the need for an agreed-upon calibration protocol is explicitly identified as an open challenge.

References

Although we mitigate this by averaging scores across multiple tasks and reporting per-task-type performance, a fully consistent difficulty calibration across years remains an open challenge.

The State-of-the-Art in Lifelog Retrieval: A Review of Progress at the ACM Lifelog Search Challenge Workshop 2022-24 (2506.06743 - Tran et al., 7 Jun 2025) in Section Conclusion, Subsection Limitations