Properties of the Strong Data Processing Constant for Rényi Divergence (2403.10656v2)
Abstract: Strong data processing inequalities (SDPI) are an important object of study in Information Theory and have been well studied for $f$-divergences. Universal upper and lower bounds have been provided along with several applications, connecting them to impossibility (converse) results, concentration of measure, hypercontractivity, and so on. In this paper, we study R\'enyi divergence and the corresponding SDPI constant whose behavior seems to deviate from that of ordinary $\Phi$-divergences. In particular, one can find examples showing that the universal upper bound relating its SDPI constant to the one of Total Variation does not hold in general. In this work, we prove, however, that the universal lower bound involving the SDPI constant of the Chi-square divergence does indeed hold. Furthermore, we also provide a characterization of the distribution that achieves the supremum when $\alpha$ is equal to $2$ and consequently compute the SDPI constant for R\'enyi divergence of the general binary channel.
- R. Ahlswede and P. Gács, “Spreading of sets in product spaces and hypercontraction of the markov operator,” The annals of probability, pp. 925–939, 1976.
- M. Raginsky, “Strong data processing inequalities and ϕitalic-ϕ\phiitalic_ϕ-sobolev inequalities for discrete channels,” IEEE Transactions on Information Theory, vol. 62, no. 6, pp. 3355–3389, 2016.
- Y. Polyanskiy and Y. Wu, “Strong data-processing inequalities for channels and bayesian networks,” in Convexity and Concentration. Springer, 2017, pp. 211–249.
- I. Csiszár, “Information-type measures of difference of probability distributions and indirect observation,” Studia Scientiarum Mathematicarum Hungarica, vol. 2, pp. 229–318, 1967. [Online]. Available: https://ci.nii.ac.jp/naid/10028997448/en/
- F. Liese and I. Vajda, “On divergences and informations in statistics and information theory,” IEEE Trans. Inf. Theor., vol. 52, no. 10, pp. 4394–4412, 2006. [Online]. Available: http://dx.doi.org/10.1109/TIT.2006.881731
- H. S. Witsenhausen, “On sequences of pairs of dependent random variables,” Siam Journal on Applied Mathematics, vol. 28, pp. 100–113, 1975. [Online]. Available: https://api.semanticscholar.org/CorpusID:123902515
- V. Anantharam, A. Gohari, S. Kamath, and C. Nair, “On maximal correlation, hypercontractivity, and the data processing inequality studied by erkip and cover,” arXiv preprint arXiv:1304.6133, 2013.
- J. E. Cohen, Y. Iwasa, G. Rautu, M. B. Ruskai, E. Seneta, and G. Zbaganu, “Relative entropy under mappings by stochastic matrices,” Linear algebra and its applications, vol. 179, pp. 211–235, 1993.
- P. Del Moral, M. Ledoux, and L. Miclo, “On contraction properties of markov kernels,” Probability Theory and Related Fields, vol. 126, pp. 395–420, 01 2003.
- T. Van Erven and P. Harremoës, “Rényi divergence and Kullback-Leibler divergence,” IEEE Transactions on Information Theory, vol. 60, no. 7, pp. 3797–3820, 2014.
- A. R. Esposito and M. Mondelli, “Concentration without independence via information measures,” 2023.
- L. Gross, “Logarithmic sobolev inequalities,” American Journal of Mathematics, vol. 97, no. 4, pp. 1061–1083, 1975. [Online]. Available: http://www.jstor.org/stable/2373688
- A. R. Esposito, A. Vandenbroucque, and M. Gastpar, “Lower bounds on the Bayesian risk via information measure,” arXiv preprint arXiv:2303.12497, 2023, Journal of Machine Learning Research, accepted for publication.