Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Properties of the Strong Data Processing Constant for Rényi Divergence (2403.10656v2)

Published 15 Mar 2024 in cs.IT and math.IT

Abstract: Strong data processing inequalities (SDPI) are an important object of study in Information Theory and have been well studied for $f$-divergences. Universal upper and lower bounds have been provided along with several applications, connecting them to impossibility (converse) results, concentration of measure, hypercontractivity, and so on. In this paper, we study R\'enyi divergence and the corresponding SDPI constant whose behavior seems to deviate from that of ordinary $\Phi$-divergences. In particular, one can find examples showing that the universal upper bound relating its SDPI constant to the one of Total Variation does not hold in general. In this work, we prove, however, that the universal lower bound involving the SDPI constant of the Chi-square divergence does indeed hold. Furthermore, we also provide a characterization of the distribution that achieves the supremum when $\alpha$ is equal to $2$ and consequently compute the SDPI constant for R\'enyi divergence of the general binary channel.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (13)
  1. R. Ahlswede and P. Gács, “Spreading of sets in product spaces and hypercontraction of the markov operator,” The annals of probability, pp. 925–939, 1976.
  2. M. Raginsky, “Strong data processing inequalities and ϕitalic-ϕ\phiitalic_ϕ-sobolev inequalities for discrete channels,” IEEE Transactions on Information Theory, vol. 62, no. 6, pp. 3355–3389, 2016.
  3. Y. Polyanskiy and Y. Wu, “Strong data-processing inequalities for channels and bayesian networks,” in Convexity and Concentration.   Springer, 2017, pp. 211–249.
  4. I. Csiszár, “Information-type measures of difference of probability distributions and indirect observation,” Studia Scientiarum Mathematicarum Hungarica, vol. 2, pp. 229–318, 1967. [Online]. Available: https://ci.nii.ac.jp/naid/10028997448/en/
  5. F. Liese and I. Vajda, “On divergences and informations in statistics and information theory,” IEEE Trans. Inf. Theor., vol. 52, no. 10, pp. 4394–4412, 2006. [Online]. Available: http://dx.doi.org/10.1109/TIT.2006.881731
  6. H. S. Witsenhausen, “On sequences of pairs of dependent random variables,” Siam Journal on Applied Mathematics, vol. 28, pp. 100–113, 1975. [Online]. Available: https://api.semanticscholar.org/CorpusID:123902515
  7. V. Anantharam, A. Gohari, S. Kamath, and C. Nair, “On maximal correlation, hypercontractivity, and the data processing inequality studied by erkip and cover,” arXiv preprint arXiv:1304.6133, 2013.
  8. J. E. Cohen, Y. Iwasa, G. Rautu, M. B. Ruskai, E. Seneta, and G. Zbaganu, “Relative entropy under mappings by stochastic matrices,” Linear algebra and its applications, vol. 179, pp. 211–235, 1993.
  9. P. Del Moral, M. Ledoux, and L. Miclo, “On contraction properties of markov kernels,” Probability Theory and Related Fields, vol. 126, pp. 395–420, 01 2003.
  10. T. Van Erven and P. Harremoës, “Rényi divergence and Kullback-Leibler divergence,” IEEE Transactions on Information Theory, vol. 60, no. 7, pp. 3797–3820, 2014.
  11. A. R. Esposito and M. Mondelli, “Concentration without independence via information measures,” 2023.
  12. L. Gross, “Logarithmic sobolev inequalities,” American Journal of Mathematics, vol. 97, no. 4, pp. 1061–1083, 1975. [Online]. Available: http://www.jstor.org/stable/2373688
  13. A. R. Esposito, A. Vandenbroucque, and M. Gastpar, “Lower bounds on the Bayesian risk via information measure,” arXiv preprint arXiv:2303.12497, 2023, Journal of Machine Learning Research, accepted for publication.
Citations (1)

Summary

We haven't generated a summary for this paper yet.