Equitability, mutual information, and the maximal information coefficient (1301.7745v1)

Published 31 Jan 2013 in q-bio.QM, math.ST, stat.ME, stat.ML, and stat.TH

Abstract: Reshef et al. recently proposed a new statistical measure, the "maximal information coefficient" (MIC), for quantifying arbitrary dependencies between pairs of stochastic quantities. MIC is based on mutual information, a fundamental quantity in information theory that is widely understood to serve this need. MIC, however, is not an estimate of mutual information. Indeed, it was claimed that MIC possesses a desirable mathematical property called "equitability" that mutual information lacks. This was not proven; instead it was argued solely through the analysis of simulated data. Here we show that this claim, in fact, is incorrect. First we offer mathematical proof that no (non-trivial) dependence measure satisfies the definition of equitability proposed by Reshef et al.. We then propose a self-consistent and more general definition of equitability that follows naturally from the Data Processing Inequality. Mutual information satisfies this new definition of equitability while MIC does not. Finally, we show that the simulation evidence offered by Reshef et al. was artifactual. We conclude that estimating mutual information is not only practical for many real-world applications, but also provides a natural solution to the problem of quantifying associations in large data sets.

Citations (574)

View on Semantic Scholar

Summary

The paper challenges MIC's claimed equitability by arguing that its mathematical foundation is flawed.
It uses formal proofs and simulations to demonstrate that MIC fails to meet the invariance standards set by the Data Processing Inequality.
The study reinforces the robustness of mutual information as a dependency measure, highlighting its practical advantages in data analysis.

Analysis of "Equitability, mutual information, and the maximal information coefficient"

This paper presents a critical examination of the Maximal Information Coefficient (MIC), a statistical metric introduced by Reshef et al., purporting to measure dependency between stochastic variables equitably. The authors, Kinney and Atwal, challenge the claims surrounding MIC's supposed mathematical property of "equitability," providing a rigorous argument that disputes its validity.

Core Arguments and Proofs

The central critique revolves around the definition of equitability proposed by Reshef et al., which lacks a rigorous mathematical foundation. Kinney and Atwal contend that no non-trivial dependency measure can satisfy this equitability criterion. They introduce an alternative concept termed "self-equitability," deriving it from the Data Processing Inequality (DPI), a fundamental principle in information theory. Mutual information complies with self-equitability by being invariant to invertible transformations of variables, a property not shared by MIC.

Through formal proofs, the paper demonstrates that MIC fails to satisfy DPI and thus does not uphold the conditions of self-equitability. This conclusion stems from MIC's failure to maintain consistency across transformed variable dependencies, contradicting fundamental properties expected of an equitable dependency measure.

Numerical Simulations and Examples

To substantiate their theoretical claims, the authors employ various simulations and provide toy examples illustrating how MIC violates notions of dependence upheld by mutual information. This includes tests showing MIC's inability to adapt to invertible transformations and its failure under DPI scenarios, where it produces unreliable results compared to mutual information.

The paper further addresses the performance of MIC on simulated data, analyzing how it behaves under certain conditions that Reshef et al. claimed demonstrated equitability. Kinney and Atwal argue these results are artifacts of limited data and misinterpretations in the estimation algorithm used alongside MIC.

Implications and Future Directions

The analysis provided not only questions the applicability of MIC as an equitable measure but also reinforces the practicality and utility of mutual information in diverse applications. The paper emphasizes the computational efficiency and theoretical justifications for using mutual information, particularly as data sizes grow and theoretical considerations dominate empirical adjustments.

This research has implications for the evaluation and selection of statistical measures in data analysis, emphasizing the need for metrics that adhere to theoretical standards like DPI. It also suggests future inquiries into resolving estimation challenges associated with mutual information in sparse data settings.

Conclusion

Kinney and Atwal's work provides a comprehensive critique of the Maximal Information Coefficient and reaffirms the foundational role of mutual information in quantifying dependencies equitably. Their proposed self-equitability criterion based on DPI sets a theoretical standard for future research, providing a framework that balances computational feasibility with rigorous mathematical integrity. As data complexities continue to evolve, such principled approaches are integral in refining how dependencies are measured and interpreted across various scientific fields.

PDF Markdown