Feasibility of collecting gold-standard validation labels exceeding one-quarter of the sample
Determine, for applied economics studies that use large language models to automate measurement of text-based economic concepts for downstream estimation, the proportion of applications in which it is practically feasible—given time and cost constraints—to collect gold-standard validation labels on more than 25% of the study sample.
References
The share of applications in which it would actually be feasible (from a time and cost perspective) to collect gold-standard labels for more than a quarter of the sample is an open question.
                — Large Language Models: An Applied Econometric Framework
                
                (2412.07031 - Ludwig et al., 9 Dec 2024) in Section 5, Subsection "Monte Carlo Simulations based on Congressional Legislation"