Mechanism behind the outlier performance on the Bitcoin hash rate question

Determine the causal mechanism that explains the anomalous outperformance of the biased GPT-4-Turbo augmentation on Forecasting Question 3, which asked for Bitcoin's network hash rate per second (in TH/s) on December 31, 2023, relative to both the superforecasting GPT-4-Turbo augmentation and the control condition.

Background

In exploratory analyses, the authors observed that removing Question 3 (the Bitcoin network hash rate forecast) changed the overall pattern of results: the superforecasting LLM augmentation then outperformed the biased augmentation. With Question 3 included, the biased augmentation condition exhibited significantly higher accuracy, which the authors attribute to participants in that condition making much higher forecasts, possibly reflecting confusion about the target metric.

The authors note suggestive evidence of misunderstanding around Question 3, including that the median prediction in the biased augmentation was five orders of magnitude higher and that participants in the biased augmentation were at least twice less likely to provide hash rate forecasts that looked like forecasts of the Bitcoin USD spot price. However, they explicitly state they are unsure of the exact mechanism behind this anomaly.

References

While we remain unsure what exactly the mechanism behind this finding is, we argue that given the fact of this anomaly on our results, the exploratory analyses present a plausible approach to understanding our data, suggesting that superforecasting LLM augmentation improves significantly upon the control, while also finding that the biased LLM augmentation similarly improves upon the control while underperforming the more targeted superforecasting prompt.

AI-Augmented Predictions: LLM Assistants Improve Human Forecasting Accuracy (2402.07862 - Schoenegger et al., 12 Feb 2024) in Discussion section