Ultimate limit on learning non-Markovian behavior: Fisher information rate and excess information (2310.03968v1)
Abstract: We address the fundamental limits of learning unknown parameters of any stochastic process from time-series data, and discover exact closed-form expressions for how optimal inference scales with observation length. Given a parametrized class of candidate models, the Fisher information of observed sequence probabilities lower-bounds the variance in model estimation from finite data. As sequence-length increases, the minimal variance scales as the square inverse of the length -- with constant coefficient given by the information rate. We discover a simple closed-form expression for this information rate, even in the case of infinite Markov order. We furthermore obtain the exact analytic lower bound on model variance from the observation-induced metadynamic among belief states. We discover ephemeral, exponential, and more general modes of convergence to the asymptotic information rate. Surprisingly, this myopic information rate converges to the asymptotic Fisher information rate with exactly the same relaxation timescales that appear in the myopic entropy rate as it converges to the Shannon entropy rate for the process. We illustrate these results with a sequence of examples that highlight qualitatively distinct features of stochastic processes that shape optimal learning.
- Fisher information of correlated stochastic processes. arXiv preprint arXiv:2206.00463, 2022.
- J. P. Crutchfield. Between order and chaos. Nature Physics, 8(January):17–24, 2012.
- N. Barnett and J. P. Crutchfield. Computational mechanics of input-output processes: Structured transformations and the ϵitalic-ϵ\epsilonitalic_ϵ-transducer. J. Stat. Phys., 161(2):404–451, 2015. SFI Working Paper 14-12-046; arxiv.org: 1412.2690 [cond-mat.stat-mech].
- Spectral simplicity of apparent complexity, Part I: The nondiagonalizable metadynamics of prediction. Chaos, 28:033115, 2018.
- Many roads to synchrony: Natural time scales and their algorithms. Physical Review E, 89:042135, 2014. Santa Fe Institute Working Paper 10-11-025; arxiv.org:1010.5545 [nlin.CD].
- Synchronization and control in intrinsic and designed computation: An information-theoretic analysis of competing models of stochastic computation. CHAOS, 20(3):037105, 2010. Santa Fe Institute Working Paper 10-08-015; arxiv.org:1007.5354 [cond-mat.stat-mech].
- R. B. Ash. Information Theory. John Wiley and Sons, New York, 1965.
- Computational mechanics: Pattern and prediction, structure and simplicity. J. Stat. Phys., 104:817–879, 2001.
- D. R. Upper. Theory and Algorithms for Hidden Markov Models and Generalized Hidden Markov Models. PhD thesis, University of California, Berkeley, 1997. Published by University Microfilms Intl, Ann Arbor, Michigan.
- Beyond the spectral theorem: Decomposing arbitrary functions of nondiagonalizable operators. AIP Advances, 8:065305, 2018.
- Spectral simplicity of apparent complexity, Part II: Exact complexities and complexity spectra. Chaos, 28:033116, 2018.
- Shannon entropy rate of hidden Markov processes. Journal of Statistical Physics, 183(2):32, 2021.
- D. Blackwell. The entropy of functions of finite-state Markov chains. volume 28, pages 13–20, Publishing House of the Czechoslovak Academy of Sciences, Prague, 1957. Held at Liblice near Prague from November 28 to 30, 1956.
- Regularities unseen, randomness observed: Levels of entropy convergence. CHAOS, 13(1):25–54, 2003.
- Fraudulent white noise: Flat power spectra belie arbitrarily complex processes. Phys. Rev. Research, 3:013170, Feb 2021.
- Elements of information theory. John Wiley & Sons, 2012.
- Exact complexity: Spectral decomposition of intrinsic computation. Phys. Lett. A, 380(9-10):998–1002, 2016.
- N. F. Travers. Exponential bounds for convergence of entropy rate approximations in hidden markov models satisfying a path-mergeability condition. Stochastic Proc. Appln., 124(12):4149–4170, 2014.
- N. Travers and J. P. Crutchfield. Infinite excess entropy processes with countable-state generators. Entropy, 16:1396–1413, 2014. SFI Working Paper 11-11-052; arxiv.org:1111.3393 [math.IT].
- Time’s barbed arrow: Irreversibility, crypticity, and stored information. Phys. Rev. Lett., 103(9):094101, 2009.
- Occam’s quantum strop: Synchronizing and compressing classical cryptic processes via a quantum channel. Scientific Reports, 6:20495, 2016. Santa Fe Institute Working Paper 15-08-030; arxiv.org:1508.02760 [quant-ph].
- Minimized state-complexity of quantum-encoded cryptic processes. Phys. Rev. A, 93(5):052317, 2016.
- Practical unitary simulator for non-Markovian complex processes. Phys. Rev. Lett., 120:240502, Jun 2018.
- S. Watanabe. Almost all learning machines are singular. In 2007 IEEE Symposium on Foundations of Computational Intelligence, pages 383–388. IEEE, 2007.
- T. J. Rothenberg. Identification in parametric models. Econometrica, 39(3):577–591, 1971.
- P. Stoica and T. L. Marzetta. Parameter estimation problems with singular information matrices. IEEE Transactions on Signal Processing, 49(1):87–90, 2001.
- Stochastic metrology and the empirical distribution. arXiv preprint arXiv:2305.16480, 2023.
- Probe thermometry with continuous measurements. arXiv preprint arXiv:2307.13407, 2023.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Complexity-calibrated benchmarks for machine learning reveal when next-generation reservoir computer predictions succeed and mislead. arXiv preprint arXiv:2303.14553, 2023.
- Bayesian structural inference for hidden processes. Physical Review E, 89:042119, 2014. Santa Fe Institute Working Paper 13-09-027, arXiv:1309.1392 [stat.ML].