Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LLMs learn governing principles of dynamical systems, revealing an in-context neural scaling law (2402.00795v4)

Published 1 Feb 2024 in cs.LG and cs.AI

Abstract: Pretrained LLMs are surprisingly effective at performing zero-shot tasks, including time-series forecasting. However, understanding the mechanisms behind such capabilities remains highly challenging due to the complexity of the models. We study LLMs' ability to extrapolate the behavior of dynamical systems whose evolution is governed by principles of physical interest. Our results show that LLaMA 2, a LLM trained primarily on texts, achieves accurate predictions of dynamical system time series without fine-tuning or prompt engineering. Moreover, the accuracy of the learned physical rules increases with the length of the input context window, revealing an in-context version of neural scaling law. Along the way, we present a flexible and efficient algorithm for extracting probability density functions of multi-digit numbers directly from LLMs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. What learning algorithm is in-context learning? investigations with linear models. arXiv preprint arXiv:2211.15661, 2022.
  2. Understanding intermediate layers using linear classifier probes. arXiv preprint arXiv:1610.01644, 2016.
  3. Markov models applications in natural language processing: a survey. Int. J. Inf. Technol. Comput. Sci, 2:1–16, 2022.
  4. Bhattacharyya, A. On a measure of divergence between two statistical populations defined by their probability distribution. Bull. Calcutta Math. Soc., 35:99–110, 1943.
  5. Bhattacharyya, A. On a measure of divergence between two multinomial populations. Sankhya: Indian J. Stat., pp.  401–406, 1946.
  6. Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pp.  1877–1901, 2020.
  7. Nhits: Neural hierarchical interpolation for time series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp.  6989–6997, 2023.
  8. GATGPT: A Pre-trained Large Language Model with Graph Attention Network for Spatiotemporal Imputation. arXiv preprint arXiv:2311.14332, 2023.
  9. Feature extraction based on the Bhattacharyya distance. Pattern Recognit., 36(8):1703–1709, 2003.
  10. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021.
  11. Lognormal distributions. Marcel Dekker New York, 1987.
  12. ForecastPFN: Synthetically-Trained Zero-Shot Forecasting. arXiv preprint arXiv:2311.01933, 2023.
  13. A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level. Proc. Natl. Acad. Sci. U.S.A., 119(32):e2123433119, 2022.
  14. Einstein, A. Über die von der molekularkinetischen Theorie der Wärme geforderte Bewegung von in ruhenden Flüssigkeiten suspendierten Teilchen. Ann. Phys., 4, 1905.
  15. Handbook of Markov decision processes: methods and applications, volume 40. Springer Science & Business Media, 2012.
  16. Monash time series forecasting archive. arXiv preprint arXiv:2105.06643, 2021.
  17. Large language models are zero-shot time series forecasters. arXiv preprint arXiv:2310.07820, 2023.
  18. Mamba: Linear-time sequence modeling with selective state spaces, 2023.
  19. Language models represent space and time. arXiv preprint arXiv:2310.02207, 2023.
  20. Hull, J. C. Options, futures, and other derivatives. Pearson, 11th edition, 2021.
  21. Time-llm: Time series forecasting by reprogramming large language models. arXiv preprint arXiv:2310.01728, 2023a.
  22. Large models for time series and spatio-temporal data: A survey and outlook. arXiv preprint arXiv:2310.10196, 2023b.
  23. Comparing measures of sample skewness and kurtosis. J. Roy. Stat. Soc. D-Sta., 47(1):183–189, 1998.
  24. Kailath, T. The divergence and Bhattacharyya distance measures in signal selection. IEEE Trans. Commun. Technol., 15(1):52–60, 1967.
  25. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
  26. Adam: A method for stochastic optimization, 2017.
  27. Comparing simulated and measured values using mean squared deviation and its components. Agron. J., 92(2):345–352, 2000.
  28. Emergent world representations: Exploring a sequence model trained on a synthetic task, 2023.
  29. Lorenz, E. N. Deterministic nonperiodic flow. J. Atmos. Sci., 20(2):130–141, 1963.
  30. Grab: Finding provably better data permutations than random reshuffling, 2023.
  31. The geometry of truth: Emergent linear structure in large language model representations of true/false datasets. arXiv preprint arXiv:2310.06824, 2023.
  32. A scalable hierarchical distributed language model. In Advances in Neural Information Processing Systems, volume 21, 2008.
  33. Emergent linear representations in world models of self-supervised sequence models. arXiv preprint arXiv:2309.00941, 2023.
  34. Show your work: Scratchpads for intermediate computation with language models. arXiv preprint arXiv:2112.00114, 2021.
  35. Oksendal, B. Stochastic differential equations: an introduction with applications. Springer Science & Business Media, 2013.
  36. OpenAI. GPT-4 Technical Report. arXiv preprint arXiv:2303.08774, 2023.
  37. The linear representation hypothesis and the geometry of large language models. arXiv preprint arXiv:2311.03658, 2023.
  38. Mapping language models to grounded conceptual spaces. In International Conference on Learning Representations, 2021.
  39. Perrin, J. Mouvement brownien et réalité moléculaire. Annal. Chim. Phys., 18:1–114, 1909.
  40. Platen, E. An introduction to numerical methods for stochastic differential equations. Acta Numer., 8:197–246, 1999.
  41. Language models are unsupervised multitask learners, 2019.
  42. Continuous martingales and Brownian motion, volume 293. Springer Science & Business Media, 2013.
  43. How much knowledge can you pack into the parameters of a language model? arXiv preprint arXiv:2002.08910, 2020.
  44. Large language model prediction capabilities: Evidence from a real-world forecasting tournament. arXiv preprint arXiv:2310.13014, 2023.
  45. Sethna, J. P. Statistical mechanics: Entropy, order parameters, and complexity. Oxford University Press, 2021.
  46. Beyond neural scaling laws: beating power law scaling via data pruning, 2023.
  47. Strogatz, S. H. Nonlinear dynamics and chaos: With applications to physics, biology, chemistry, and engineering. CRC Press, 2nd edition, 2015.
  48. Challenging big-bench tasks and whether chain-of-thought can solve them. arXiv preprint arXiv:2210.09261, 2022.
  49. Takens, F. Detecting strange attractors in turbulence. In Dynamical Systems and Turbulence, Warwick 1980, pp.  366–381. Springer, 2006.
  50. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  51. Attention is all you need. In Advances in Neural Information Processing Systems, volume 30, 2017.
  52. Prompt-based domain discrimination for multi-source time series domain adaptation. arXiv preprint arXiv:2312.12276, 2023.
  53. Emergent abilities of large language models. Trans. Mach. Learn. Res., 2022a.
  54. Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems, volume 35, pp.  24824–24837, 2022b.
  55. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. In Advances in Neural Information Processing Systems, volume 34, pp.  22419–22430, 2021.
  56. Transformer multivariate forecasting: Less is more? arXiv preprint arXiv:2401.00230, 2023.
  57. Informer: Beyond efficient transformer for long sequence time-series forecasting. In AAAI Conference on Artificial Intelligence, volume 35, pp.  11106–11115, 2021.
  58. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. In International Conference on Machine Learning, pp.  27268–27286. PMLR, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Toni J. B. Liu (6 papers)
  2. Nicolas Boullé (32 papers)
  3. Raphaël Sarfati (7 papers)
  4. Christopher J. Earls (9 papers)
Citations (8)
X Twitter Logo Streamline Icon: https://streamlinehq.com