Contrastive Log-Ratio Upper Bound (CLUB)
- CLUB is a method that tightly upper bounds mutual information by contrasting log-likelihoods of matching and mismatched data pairs, ensuring unbiased estimation under independence.
- It employs a variational approximation (vCLUB) to substitute p(x|y) with a neural network model, enabling closed-form computations in exponential family cases.
- The framework improves scalability via negative sampling and demonstrates superior performance in synthetic experiments, representation learning, and domain adaptation.
The Contrastive Log-ratio Upper Bound (CLUB) is a framework for estimating and minimizing mutual information (MI) in high-dimensional scenarios where only samples from the relevant distributions, rather than their explicit forms, are available. CLUB constructs a tight, variationally-approximable upper bound on MI that enables stable and scalable MI minimization—tasks where lower bound estimators are inapplicable. The CLUB methodology addresses the estimation bias, computational scalability, and numerical instability that afflict earlier MI upper bound strategies, thereby facilitating its use in representation learning, domain adaptation, and information bottleneck contexts (Cheng et al., 2020).
1. Formal Definition of CLUB
Let be random variables with joint distribution . Mutual information is defined as
The CLUB estimator presumes access to the conditional density and defines
Given i.i.d. samples , the empirical estimate is
This estimator relies on the explicit evaluation or modeling of , and operates by contrasting log-likelihoods across matched and mismatched pairs.
2. Upper Bound Derivation and Tightness
Define the gap . The derivation proceeds as follows: By concavity of and Jensen’s inequality, , guaranteeing that . Thus,
with equality if and only if is independent of , i.e., . The bound is tight for independent variables and grows otherwise; the magnitude reflects the deviation from independence.
3. Variational Approximation: vCLUB
When is unknown or intractable, CLUB is made practical via a variational approximation. A conditional model (typically parameterized by neural networks) is introduced: with the sample estimator
For parametric exponential family forms, such as Gaussians with mean and diagonal covariance, this estimator admits closed-form computations based on Mahalanobis distances.
4. Theoretical Properties and Bias Analysis
The theoretical guarantees of CLUB are as follows:
- Exactness: , with equality if and only if .
- vCLUB as Upper Bound: Let . If , then .
- Approximation Error: If, in addition, and , then .
A plausible implication is that strong variational modeling of ensures not only the validity but the tightness of the CLUB upper bound relative to the true mutual information.
5. Scalable MI Minimization Training: Negative Sampling Scheme
The original estimator is quadratic in . CLUB–S and vCLUB–S accelerate computation via negative sampling, lowering the complexity to . Training involves two main phases:
- Conditional Model Update: Fit to maximize conditional log-likelihood over data batches.
- MI Minimization via Negative Sampling: For each positive pair, one negative is sampled, constructing . Then,
is minimized w.r.t. the generative model parameters .
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
Initialize model p_σ(x,y) and encoder q_θ(x|y). repeat Sample a batch {(x_i, y_i)}_{i=1}^N from p_σ. // (1) Update q_θ by maximizing conditional log‐likelihood L(θ) ← (1/N) ∑_{i=1}^N log q_θ(x_i|y_i). θ ← θ + η ∇_θ L(θ). // (2) Compute one‒negative vCLUB estimate for i=1…N do draw k_i ∼ Uniform({1,…,N}); U_i ← log q_θ(x_i|y_i) - log q_θ(x_{k_i}|y_i). end for Ĩ ← (1/N) ∑_{i=1}^N U_i. // (3) Update σ by minimizing Ĩ σ ← σ − η′ ∇_σ Ĩ (backprop through samples via reparametrization). until convergence |
This sampled variant is unbiased and substantially more scalable. The negative sampling strategy improves both statistical and computational characteristics of the bound.
6. Empirical Evaluation and Practical Performance
CLUB and its variational extensions have undergone validation on both synthetic and real-world benchmarks:
- Synthetic Estimation: On Gaussian/Cubic data (, true ), CLUB attains the lowest bias and minimum squared error compared to competing lower and upper bound MI estimators. CLUB–S incurs slightly higher variance but maintains unbiasedness.
- Information Bottleneck (MNIST, latent dim=256): CLUB and vCLUB achieve lower test classification error (approaching ) than DVB (VUB), MINE, NWJ, InfoNCE, and Leave-One-Out estimators. Negative sampling further enhances generalization.
- Unsupervised Domain Adaptation: In MNISTMNIST-M and USPSMNIST settings, within a disentangled representation objective minimizing , CLUB–S provides the highest or near-highest target domain accuracy (–), surpassing lower-bound and prior upper-bound estimators that suffer from numerical instability.
The aggregate findings demonstrate that CLUB delivers a tight, stable, and computationally efficient upper bound for MI estimation and minimization in high-dimensional deep learning tasks (Cheng et al., 2020).