Uncertainty about GPT-4 training data contents and numerical rounding behavior
Determine the composition of GPT-4’s training data and quantify the extent to which large language models round or otherwise transform continuous numerical values during generation.
References
It is unclear what is in the training data, or to what degree LLMs round continuous variables, as OpenAI has been secretive about the training data and has not shared the source code for ChatGPT-4.
— Can Base ChatGPT be Used for Forecasting without Additional Optimization?
(2404.07396 - Pham et al., 11 Apr 2024) in Section: Predicting Macroeconomic Variables