Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 154 tok/s
Gemini 2.5 Pro 43 tok/s Pro
GPT-5 Medium 23 tok/s Pro
GPT-5 High 32 tok/s Pro
GPT-4o 119 tok/s Pro
Kimi K2 175 tok/s Pro
GPT OSS 120B 362 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

Partial information decomposition: redundancy as information bottleneck (2405.07665v2)

Published 13 May 2024 in cs.IT, math.IT, and stat.ML

Abstract: The partial information decomposition (PID) aims to quantify the amount of redundant information that a set of sources provides about a target. Here, we show that this goal can be formulated as a type of information bottleneck (IB) problem, termed the "redundancy bottleneck" (RB). The RB formalizes a tradeoff between prediction and compression: it extracts information from the sources that best predict the target, without revealing which source provided the information. It can be understood as a generalization of "Blackwell redundancy", which we previously proposed as a principled measure of PID redundancy. The "RB curve" quantifies the prediction--compression tradeoff at multiple scales. This curve can also be quantified for individual sources, allowing subsets of redundant sources to be identified without combinatorial optimization. We provide an efficient iterative algorithm for computing the RB curve.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. P. L. Williams and R. D. Beer, Nonnegative decomposition of multivariate information, arXiv preprint arXiv:1004.2515  (2010).
  2. A. Kolchinsky, A Novel Approach to the Partial Information Decomposition, Entropy 24, 403 (2022).
  3. P. L. Williams, Information dynamics: Its theory and application to embodied cognitive systems, Ph.D. (2011).
  4. N. Tishby, F. Pereira, and W. Bialek, The information bottleneck method, in 37th Allerton Conf on Communication (1999).
  5. Y. Wang, J. M. L. Ribeiro, and P. Tiwary, Past–future information bottleneck for sampling molecular reaction coordinate simultaneously with thermodynamics and kinetics, Nature Communications 10, 3573 (2019), publisher: Nature Publishing Group.
  6. A. Kolchinsky, B. D. Tracey, and D. H. Wolpert, Nonlinear information bottleneck, Entropy 21, 1181 (2019a).
  7. I. Fischer, The conditional entropy bottleneck, Entropy 22, 999 (2020).
  8. Z. Goldfeld and Y. Polyanskiy, The information bottleneck problem and its applications in machine learning, IEEE Journal on Selected Areas in Information Theory 1, 19 (2020).
  9. R. Ahlswede and J. Körner, Source Coding with Side Information and a Converse for Degraded Broadcast Channels, IEEE Transactions on Information Theory , 9 (1975).
  10. H. Witsenhausen and A. Wyner, A conditional entropy bound for a pair of discrete random variables, IEEE Transactions on Information Theory 21, 493 (1975).
  11. A. Kolchinsky, B. D. Tracey, and S. Van Kuyk, Caveats for information bottleneck in deterministic scenarios, in ICLR 2019 (2019) arXiv: 1808.07593.
  12. B. Rodríguez Gálvez, R. Thobaben, and M. Skoglund, The convex information bottleneck lagrangian, Entropy 22, 98 (2020).
  13. E. Benger, S. Asoodeh, and J. Chen, The cardinality bound on the information bottleneck representations is tight, in 2023 IEEE International Symposium on Information Theory (ISIT) (IEEE, 2023) pp. 1478–1483.
  14. B. C. Geiger and I. S. Fischer, A comparison of variational bounds for the information bottleneck functional, Entropy 22, 1229 (2020).
  15. K. A. Murphy and D. S. Bassett, Machine-learning optimized measurements of chaotic dynamical systems via the information bottleneck, Physical Review Letters 132, 197201 (2024).
  16. N. Slonim, N. Friedman, and N. Tishby, Multivariate Information Bottleneck, Neural Computation 18, 10.1162/neco.2006.18.8.1739 (2006).
  17. C. Shannon, The lattice theory of information, Transactions of the IRE Professional Group on Information Theory 1, 105 (1953).
  18. W. McGill, Multivariate information transmission, Transactions of the IRE Professional Group on Information Theory 4, 93 (1954).
  19. F. M. Reza, An Introduction to Information Theory (Dover Publications, Inc, 1961).
  20. H. K. Ting, On the amount of information, Theory of Probability & Its Applications 7, 439 (1962).
  21. T. Han, Linear dependence structure of the entropy space, Information and Control 29, 337 (1975).
  22. R. W. Yeung, A new outlook on shannon’s information measures, IEEE transactions on information theory 37, 466 (1991).
  23. A. J. Bell, The co-information lattice, in Proceedings of the Fifth International Workshop on Independent Component Analysis and Blind Signal Separation: ICA, Vol. 2003 (2003).
  24. A. F. Gomes and M. A. Figueiredo, Orders between channels and implications for partial information decomposition, Entropy 25, 975 (2023).
  25. V. Griffith and C. Koch, Quantifying synergistic mutual information, in Guided Self-Organization: Inception (Springer, 2014) pp. 159–190.
  26. V. Griffith and T. Ho, Quantifying redundant information in predicting a target random variable, Entropy 17, 4644 (2015).
  27. N. Bertschinger and J. Rauh, The blackwell relation defines no lattice, in 2014 IEEE International Symposium on Information Theory (IEEE, 2014) pp. 2479–2483.
  28. D. Blackwell, Equivalent comparisons of experiments, The annals of mathematical statistics , 265 (1953).
  29. P. Venkatesh and G. Schamberg, Partial information decomposition via deficiency for multivariate gaussians, in 2022 IEEE International Symposium on Information Theory (ISIT) (IEEE, 2022) pp. 2892–2897.
  30. T. Mages, E. Anastasiadi, and C. Rohner, Non-negative decomposition of multivariate information: From minimum to blackwell specific information, 10.20944/preprints202403.0285.v2  (2024).
  31. L. Le Cam, Sufficiency and approximate sufficiency, The Annals of Mathematical Statistics , 1419 (1964).
  32. P. K. Banerjee and G. Montufar, The Variational Deficiency Bottleneck, in 2020 International Joint Conference on Neural Networks (IJCNN) (IEEE, Glasgow, United Kingdom, 2020) pp. 1–8.
  33. I. Csiszár and F. Matus, Information projections revisited, IEEE Transactions on Information Theory 49, 1474 (2003).
  34. N. Ay, Confounding ghost channels and causality: a new approach to causal information flows, Vietnam journal of mathematics 49, 547 (2021).
  35. A. Kolchinsky and L. M. Rocha, Prediction and modularity in dynamical systems, arXiv preprint arXiv:1106.3703  (2011).
  36. S. Hidaka and M. Oizumi, Fast and exact search for the partition with minimal information loss, PLoS One 13, e0201126 (2018).
  37. L. E. Dubins, On extreme points of convex sets, Journal of Mathematical Analysis and Applications 5, 237 (1962).
  38. T. M. Cover and J. A. Thomas, Elements of information theory (John Wiley & Sons, 2006).
  39. R. Timo, A. Grant, and G. Kramer, Lossy broadcasting with complementary side information, IEEE Transactions on Information Theory 59, 104 (2012).
Citations (1)

Summary

  • The paper introduces the redundancy bottleneck, showing how to reframe Blackwell redundancy as an information bottleneck problem focused on shared data.
  • It presents an iterative algorithm that efficiently computes the redundancy curve, enabling scalable analysis of high-dimensional datasets.
  • The approach enhances multi-scale analysis and source identification, offering practical insights for fields like neuroscience and machine learning.

Exploring the Redundancy Bottleneck: A Novel Perspective on Information Theory

Introduction to the Concepts

The journey through understanding how information is shared among multiple sources in relation to a target variable has been a central theme in information theory, often explored through frameworks like Partial Information Decomposition (PID) and the Information Bottleneck (IB). Let's unpack the concepts gradually.

The Information Bottleneck (IB) Method

IB is a method designed for extracting relevant information from one variable (X) that is crucial for predicting another variable (Y). This method introduces a bottleneck variable (Q) which serves as a filtered form of X carrying only the necessary information needed about Y. The effectiveness of this filtration is measured by two terms:

  1. Compression of X: Captured by mutual information between X and Q, I(X;Q)I(X;Q), indicating how much of X is squeezed into Q.
  2. Prediction of Y: Described by the mutual information between Y and Q, I(Y;Q)I(Y;Q), demonstrating how well Q predicts Y.

This set-up leads to a trade-off curve representing different balances of compression and prediction, helping customize information extraction based on specific needs.

Partial Information Decomposition (PID)

PID targets decomposing the information that a group of source variables provides about a target variable into components like redundancy and synergy. Redundancy reflects the shared information from all sources about the target, while synergy represents the unique predictive power arising from the collaboration of multiple sources beyond what they could accomplish individually.

Bridging PID with Information Bottleneck: The Redundancy Bottleneck

The concept of the Redundancy Bottleneck (RB) showcases a new bridging methodology between PID and IB, focusing initially on redundancy. Here’s how it functions and its implications:

Basic Formulation

The groundbreaking aspect of the Redundancy Bottleneck lies in its ability to remodel Blackwell redundancy into a format that resembles an IB problem but focuses on redundancy. By setting up a structure that trades off the amount of information (redundancy) used for prediction against the cost of compression or the specificity of information source, RB highlights scales of redundancy across sources.

Implications of RB

  1. Multi-Scale Analysis: RB enables a finer examination of how redundancy varies across different predictive scales, offering a nuanced view of information distribution among sources.
  2. Source Identification: With RB, it becomes practical to determine groups of sources that contribute most redundantly without combinatorial complications, a feature particularly handy in high-dimensional datasets.
  3. Iterative Optimization: The authors elucidate an iterative algorithm to compute the RB curve efficiently, enhancing computational feasibility for larger and more complex datasets.

Theoretical and Practical Significance

The transition to viewing redundancy through the lens of the Information Bottleneck enriches theoretical understanding and adds practical tools for data scientists. From neuroscience to machine learning, understanding how redundancy scales with information compression can impact model interpretation and design, particularly in systems where redundancy might affect performance.

Future Directions

Looking ahead, the natural progression would be extending these ideas to other components like synergy or exploring how changes in the set-up of source variables might alter the observed redundancy. Additionally, adapting these theoretical constructs to real-world high-dimensional data could open new avenues in both research and application fields.

The Redundancy Bottleneck not only enriches the existing framework around information theory but also provides robust tools and concepts for tackling real-world data challenges where understanding intricate details of information flow and redundancy is crucial.

Authors (1)

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 4 tweets and received 56 likes.

Upgrade to Pro to view all of the tweets about this paper: