Estimating Mutual Information for Discrete-Continuous Mixtures (1709.06212v3)

Published 19 Sep 2017 in cs.IT, cs.LG, and math.IT

Abstract: Estimating mutual information from observed samples is a basic primitive, useful in several machine learning tasks including correlation mining, information bottleneck clustering, learning a Chow-Liu tree, and conditional independence testing in (causal) graphical models. While mutual information is a well-defined quantity in general probability spaces, existing estimators can only handle two special cases of purely discrete or purely continuous pairs of random variables. The main challenge is that these methods first estimate the (differential) entropies of X, Y and the pair (X;Y) and add them up with appropriate signs to get an estimate of the mutual information. These 3H-estimators cannot be applied in general mixture spaces, where entropy is not well-defined. In this paper, we design a novel estimator for mutual information of discrete-continuous mixtures. We prove that the proposed estimator is consistent. We provide numerical experiments suggesting superiority of the proposed estimator compared to other heuristics of adding small continuous noise to all the samples and applying standard estimators tailored for purely continuous variables, and quantizing the samples and applying standard estimators tailored for purely discrete variables. This significantly widens the applicability of mutual information estimation in real-world applications, where some variables are discrete, some continuous, and others are a mixture between continuous and discrete components.

Citations (152)

View on Semantic Scholar

Summary

The paper proposes a novel estimator that directly estimates the Radon-Nikodym derivative to handle mixed discrete-continuous variables.
It leverages k-nearest neighbor distances to achieve consistent estimates with lower variance and reduced mean squared errors compared to traditional methods.
Numerical experiments demonstrate superior performance in high-dimensional and zero-inflated datasets, enhancing applications in feature selection and network inference.

Estimating Mutual Information for Discrete-Continuous Mixtures

The paper "Estimating Mutual Information for Discrete-Continuous Mixtures" tackles a significant challenge in information theory and machine learning: estimating mutual information (MI) for data consisting of both discrete and continuous variables. MI is a critical metric used across various domains for tasks including clustering, feature selection, and graph model inference. Traditional methods predominantly cater to purely discrete or purely continuous datasets, using the 3H estimator, which involves separately estimating the entropies for the variables involved and their joint distribution. However, this technique becomes ineffective in mixed-variable scenarios where entropy is undefined for some components.

The authors contribute a novel estimator capable of handling discrete-continuous mixtures by directly estimating the Radon-Nikodym derivative, circumventing the need to estimate all individual entropy terms. This proposed estimator is consistent and outperforms existing heuristic approaches, such as adding Gaussian noise to all samples or adapting purely continuous estimators by quantization.

The theoretical advancements are underpinned by the development of an estimator utilizing $k$ -nearest neighbor distances to infer mutual information. The authors rigorously prove the estimator's consistency and demonstrate its application efficacy using synthetic and real-world data. The paper's assertion is supported by numerical experiments, which exhibit superior performance against traditional estimators by maintaining reduced mean squared errors in varied contexts, such as zero-inflated datasets and high-dimensional scenarios.

A theoretical exposition shows that the proposed estimator remains consistent and exhibits low variance under assumptions typical of real-world data distributions. These assumptions include the finiteness of discrete points in the dataset and the integrability of log density ratios over the probability space, among others. The variance analysis leverages an adaptation of the Efron-Stein inequality in conjunction with a careful breakdown of error contributions from eliminating individual data samples.

Practically, this method facilitates mutual information computation in diverse real-world applications like computational biology, where data often integrates continuous expressions with discrete presence/absence states resulting from technical limitations like dropout in single-cell RNA sequencing. The improved performance of the estimator suggests greater reliability in using MI for tasks such as feature selection and gene network inference in the context of noisy biological data.

Future research could extend the investigation into handling high-dimensional discrete-continuous mixtures or integrating more advanced kernel methods specifically tailored to mixed data characteristics. The alignment with the growing complexity and heterogeneous nature of data in machine learning and data science will benefit significantly from such robust estimators, enabling better-automated insights and decision-making support across domains.

PDF Markdown

Related Papers

YouTube

Show All Videos