Documenting end-to-end reproducibility protocols for prior LLM-based query generation studies

Identify and document the precise end-to-end experimental procedures—including prompt issuance, output handling, query extraction, dataset preparation, and baseline selection—required to fully reproduce the experiments of Wang et al. (2023) and Alaniz et al. (2023) on LLM-based Boolean query generation.

Background

The authors encountered multiple issues while attempting to reproduce prior studies, citing insufficient methodological detail. Without fully specified protocols, replication is hindered and comparative evaluation across models and datasets remains unreliable.

References

While extending the setups by\citet{wang2023chatgpt} and \citet{alaniz2023utility}, we ran into several issues and were unable to fully reproduce the publications, as not enough information was given by the authors.

— A Reproducibility and Generalizability Study of Large Language Models for Query Generation (2411.14914 - Staudinger et al., 2024) in Section 5 Discussion

Documenting end-to-end reproducibility protocols for prior LLM-based query generation studies

Background

References

Related Problems