Replicability of earnings-call–based CapEx forecasting using Llama-2

Determine whether the Llama-2 model can reproduce the baseline predictive relationship between large language model–generated signals from earnings call transcripts and firms’ subsequent capital expenditures two quarters ahead, and ascertain whether any failure to replicate is attributable to limitations in Llama-2’s ability to handle long-context inputs.

Background

The paper’s second forecasting exercise follows Jha et al. (2024) by using earnings call transcripts to predict firms’ capital expenditures two quarters ahead. The authors successfully reproduce the baseline finding with Llama-3.3, showing that the LLM-generated prediction score strongly forecasts future investment and that LAP amplifies this relationship.

However, when attempting to extend the analysis with Llama-2 to enable an out-of-sample test for this task, the authors report that they could not replicate the baseline results and attribute this potential failure to Llama-2’s long-context limitations. Consequently, they do not include out-of-sample results for the earnings call exercise, leaving the replicability and the specific cause of failure unresolved.

References

We were unable to replicate the baseline results using Llama-2, potentially due to limitations in its ability to handle long-context inputs. As a result, we do not include out-of-sample test results for the earnings call exercise.

A Test of Lookahead Bias in LLM Forecasts  (2512.23847 - Gao et al., 29 Dec 2025) in Section “Prompt Earnings Call Transcripts to Predict Capex,” footnote (near Table \ref{tab:llm_capex})