Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Prediction from compression for models with infinite memory, with applications to hidden Markov and renewal processes (2404.15454v1)

Published 23 Apr 2024 in math.ST, cs.IT, math.IT, and stat.TH

Abstract: Consider the problem of predicting the next symbol given a sample path of length n, whose joint distribution belongs to a distribution class that may have long-term memory. The goal is to compete with the conditional predictor that knows the true model. For both hidden Markov models (HMMs) and renewal processes, we determine the optimal prediction risk in Kullback- Leibler divergence up to universal constant factors. Extending existing results in finite-order Markov models [HJW23] and drawing ideas from universal compression, the proposed estimator has a prediction risk bounded by redundancy of the distribution class and a memory term that accounts for the long-range dependency of the model. Notably, for HMMs with bounded state and observation spaces, a polynomial-time estimator based on dynamic programming is shown to achieve the optimal prediction risk {\Theta}(log n/n); prior to this work, the only known result of this type is O(1/log n) obtained using Markov approximation [Sha+18]. Matching minimax lower bounds are obtained by making connections to redundancy and mutual information via a reduction argument.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. Kweku Abraham, Elisabeth Gassiat and Zacharie Naulet “Fundamental limits for learning hidden Markov model parameters” In IEEE Transactions on Information Theory 69.3 IEEE, 2022, pp. 1777–1794
  2. Grigory Alexandrovich, Hajo Holzmann and Anna Leister “Nonparametric identification and maximum likelihood estimation for hidden Markov models” In Biometrika 103.2 Oxford University Press, 2016, pp. 423–434
  3. “Tensor decompositions for learning latent variable models”, 2014 arXiv:1210.7559 [cs.LG]
  4. “Smoothed Analysis of Tensor Decompositions” In CoRR abs/1311.3651, 2013 arXiv: http://arxiv.org/abs/1311.3651
  5. John J. Birch “Approximations for the Entropy for Functions of Markov Chains” In The Annals of Mathematical Statistics 33.3 Institute of Mathematical Statistics, 1962, pp. 930–938
  6. Avrim Blum, Adam Kalai and Hal Wasserman “Noise-tolerant learning, the parity problem, and the statistical query model” In Journal of the ACM (JACM) 50.4 ACM New York, NY, USA, 2003, pp. 506–519
  7. “Redundancy rates for renewal and other processes” In IEEE Transactions on Information Theory 42.6, 1996, pp. 2065–2072 DOI: 10.1109/18.556596
  8. “Efficient universal noiseless source codes” In IEEE Transactions on Information Theory 27.3, 1981, pp. 269–279 DOI: 10.1109/TIT.1981.1056355
  9. L. Davisson “Universal noiseless coding” In IEEE Transactions on Information Theory 19.6, 1973, pp. 783–795 DOI: 10.1109/TIT.1973.1055092
  10. Yohann De Castro, Elisabeth Gassiat and Claire Lacour “Minimax Adaptive Estimation of Nonparametric Hidden Markov Models” In Journal of Machine Learning Research 17.111, 2016, pp. 1–43 URL: http://jmlr.org/papers/v17/15-381.html
  11. Yohann De Castro, Elisabeth Gassiat and Sylvain Le Corff “Consistent estimation of the filtering and marginal smoothing distributions in nonparametric hidden Markov models” In IEEE Transactions on Information Theory 63.8 IEEE, 2017, pp. 4758–4777
  12. “Learning Markov distributions: Does estimation trump compression?” In 2016 IEEE International Symposium on Information Theory (ISIT), 2016, pp. 2689–2693 DOI: 10.1109/ISIT.2016.7541787
  13. Meir Feder, Neri Merhav and Michael Gutman “Universal prediction of individual sequences” In IEEE transactions on Information Theory 38.4 IEEE, 1992, pp. 1258–1270
  14. Vitaly Feldman, Will Perkins and Santosh Vempala “On the complexity of random satisfiability problems with planted solutions” In Proceedings of the forty-seventh annual ACM symposium on Theory of Computing, 2015, pp. 77–86
  15. “Analytic variations on redundancy rates of renewal processes” In IEEE Transactions on Information Theory 48.11, 2002, pp. 2911–2921 DOI: 10.1109/TIT.2002.804115
  16. Élisabeth Gassiat “Universal Coding and Order Identification by Model Selection Methods” Springer, 2018
  17. Yanjun Han, Soham Jana and Yihong Wu “Optimal prediction of Markov chains with and without spectral gap” In Advances in Neural Information Processing Systems 34, 2021, pp. 11233–11246
  18. Yanjun Han, Soham Jana and Yihong Wu “Optimal prediction of Markov chains with and without spectral gap” In IEEE Transactions on Information Theory 69.6, 2023, pp. 3920–3959
  19. David Haussler, Jyrki Kivinen and Manfred K Warmuth “Sequential prediction of individual sequences under general loss functions” In IEEE Transactions on Information Theory 44.5 IEEE, 1998, pp. 1906–1925
  20. Yi Hao, A. Orlitsky and V. Pichapati “On learning Markov chains” In In Advances in Neural Information Processing Systems, 2018, pp. 648–657
  21. Godfrey H Hardy and Srinivasa Ramanujan “Asymptotic formulaæ in combinatory analysis” In Proceedings of the London Mathematical Society 2.1 Wiley Online Library, 1918, pp. 75–115
  22. “Minimal realization problems for hidden markov models” In IEEE Transactions on Signal Processing 64.7 IEEE, 2015, pp. 1896–1904
  23. “Sum of squares lower bounds for refuting any CSP” In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, 2017, pp. 132–145
  24. “The performance of universal encoding” In IEEE Transactions on Information Theory 27.2, 1981, pp. 199–207 DOI: 10.1109/TIT.1981.1056331
  25. Luc Lehéricy “Nonasymptotic control of the MLE for misspecified nonparametric hidden Markov models” In Electronic Journal of Statistics 15.2 The Institute of Mathematical Statisticsthe Bernoulli Society, 2021, pp. 4916–4965
  26. David A Levin and Yuval Peres “Markov chains and mixing times” American Mathematical Soc., 2017
  27. L. Mirsky “SYMMETRIC GAUGE FUNCTIONS AND UNITARILY INVARIANT NORMS” In Quarterly Journal of Mathematics 11, 1960, pp. 50–59 URL: https://api.semanticscholar.org/CorpusID:120585992
  28. “Learning nonsingular phylogenies and hidden Markov models” In Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, 2005, pp. 366–375
  29. “Information Theory: From Coding to Learning” http://www.stat.yale.edu/~yw562/teaching/itbook-export.pdf Cambridge University Press, 2024
  30. J. Rissanen “Universal coding, information, prediction, and estimation” In IEEE Transactions on Information Theory 30.4, 1984, pp. 629–636 DOI: 10.1109/TIT.1984.1056936
  31. “Learning Overcomplete HMMs” In Advances in Neural Information Processing Systems (NeurIPS), 2017
  32. “Prediction with a short memory” In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, 2018, pp. 1074–1087
  33. G.W. Stewart “On the Continuity of the Generalized Inverse” In SIAM Journal on Applied Mathematics 17.1, 1969, pp. 33–45 DOI: 10.1137/0117004
  34. “Practically Solving LPN” In 2021 IEEE International Symposium on Information Theory (ISIT), 2021, pp. 2399–2404 DOI: 10.1109/ISIT45174.2021.9518109
  35. Qun Xie and Andrew R Barron “Asymptotic minimax regret for data compression, gambling, and prediction” In IEEE Transactions on Information Theory 46.2 IEEE, 2000, pp. 431–445
  36. “Information-theoretic determination of minimax rates of convergence” In Annals of Statistics JSTOR, 1999, pp. 1564–1599

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com