Extent of memorization effects in LLM-based embedding forecasts
Determine the extent to which memorization affects forecasts that rely on embeddings generated by large language models such as GPT-4o in financial prediction tasks, by quantifying how much embedding representations encode training-period outcomes or lookahead information that contaminates forecasting evaluations.
References
Finally, using embeddings along with a supervised step has been proposed by \citet{chenExpectedReturnsLarge2022}, though it remains unknown to what extent the memorization problem affects forecasts using LLMs embeddings.
— The Memorization Problem: Can We Trust LLMs' Economic Forecasts?
(2504.14765 - Lopez-Lira et al., 20 Apr 2025) in Subsection: Related Literature