Replicability of earnings-call–based CapEx forecasting using Llama-2
Determine whether the Llama-2 model can reproduce the baseline predictive relationship between large language model–generated signals from earnings call transcripts and firms’ subsequent capital expenditures two quarters ahead, and ascertain whether any failure to replicate is attributable to limitations in Llama-2’s ability to handle long-context inputs.
References
We were unable to replicate the baseline results using Llama-2, potentially due to limitations in its ability to handle long-context inputs. As a result, we do not include out-of-sample test results for the earnings call exercise.
— A Test of Lookahead Bias in LLM Forecasts
(2512.23847 - Gao et al., 29 Dec 2025) in Section “Prompt Earnings Call Transcripts to Predict Capex,” footnote (near Table \ref{tab:llm_capex})