Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

InfoNet: Neural Estimation of Mutual Information without Test-Time Optimization (2402.10158v1)

Published 15 Feb 2024 in cs.IT and math.IT

Abstract: Estimating mutual correlations between random variables or data streams is essential for intelligent behavior and decision-making. As a fundamental quantity for measuring statistical relationships, mutual information has been extensively studied and utilized for its generality and equitability. However, existing methods often lack the efficiency needed for real-time applications, such as test-time optimization of a neural network, or the differentiability required for end-to-end learning, like histograms. We introduce a neural network called InfoNet, which directly outputs mutual information estimations of data streams by leveraging the attention mechanism and the computational efficiency of deep learning infrastructures. By maximizing a dual formulation of mutual information through large-scale simulated training, our approach circumvents time-consuming test-time optimization and offers generalization ability. We evaluate the effectiveness and generalization of our proposed mutual information estimation scheme on various families of distributions and applications. Our results demonstrate that InfoNet and its training process provide a graceful efficiency-accuracy trade-off and order-preserving properties. We will make the code and models available as a comprehensive toolbox to facilitate studies in different fields requiring real-time mutual information estimation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (69)
  1. Melanoma and nevus skin lesion classification using handcraft and deep learning feature fusion via mutual information measures. Entropy, 22(4):484, 2020.
  2. On maximal correlation, hypercontractivity, and the data processing inequality studied by erkip and cover. arXiv preprint arXiv:1304.6133, 2013.
  3. Bach, F. Information theory with kernel methods. IEEE Transactions on Information Theory, 69(2):752–775, 2022.
  4. Mutual information neural estimation. In International conference on machine learning, pp.  531–540. PMLR, 2018a.
  5. Mutual information neural estimation. In International conference on machine learning, pp.  531–540. PMLR, 2018b.
  6. Estimating optimal transformations for multiple regression and correlation. Journal of the American statistical Association, 80(391):580–598, 1985.
  7. Buja, A. Remarks on functional canonical variates, alternating least squares methods and ace. The Annals of Statistics, pp.  1032–1069, 1990.
  8. Hashing with mutual information. IEEE transactions on pattern analysis and machine intelligence, 41(10):2424–2437, 2019.
  9. Statistical validation of mutual information calculations: Comparison of alternative numerical algorithms. Physical review E, 71(6):066208, 2005.
  10. Scalable infomin learning. Advances in Neural Information Processing Systems, 35:2226–2239, 2022.
  11. The frontier of simulation-based inference. Proceedings of the National Academy of Sciences, 117(48):30055–30062, 2020.
  12. Beyond normal: On the evaluation of mutual information estimators. arXiv preprint arXiv:2306.11078, 2023.
  13. Estimation of the information by an adaptive partitioning of the observation space. IEEE Transactions on Information Theory, 45(4):1315–1321, 1999.
  14. Asymptotic evaluation of certain markov process expectations for large time. iv. Communications on pure and applied mathematics, 36(2):183–212, 1983.
  15. Copula theory: an introduction. In Copula Theory and Its Applications: Proceedings of the Workshop Held in Warsaw, 25-26 September 2009, pp.  3–31. Springer, 2010.
  16. Normalized mutual information feature selection. IEEE Transactions on neural networks, 20(2):189–201, 2009.
  17. Selection of input variables for data driven models: An average shifted histogram partial mutual information estimator approach. Journal of Hydrology, 367(3-4):165–176, 2009.
  18. Kernel measures of conditional dependence. Advances in neural information processing systems, 20, 2007.
  19. Estimating mutual information by local gaussian approximation. arXiv preprint arXiv:1508.00536, 2015a.
  20. Efficient estimation of mutual information for strongly dependent variables. In Artificial intelligence and statistics, pp.  277–286. PMLR, 2015b.
  21. Estimating mutual information for discrete-continuous mixtures. Advances in neural information processing systems, 30, 2017.
  22. Demystifying fixed k𝑘kitalic_k-nearest neighbor information estimators. IEEE Transactions on Information Theory, 64(8):5629–5661, 2018.
  23. Sliced mutual information: A scalable measure of statistical dependence. Advances in Neural Information Processing Systems, 34:17567–17578, 2021.
  24. Kernel constrained covariance for dependence measurement. In International Workshop on Artificial Intelligence and Statistics, pp.  112–119. PMLR, 2005.
  25. Parametric bayesian estimation of differential entropy and relative entropy. Entropy, 12(4):818–843, 2010.
  26. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp.  297–304. JMLR Workshop and Conference Proceedings, 2010.
  27. On the sample complexity of hgr maximal correlation functions for large datasets. IEEE Transactions on Information Theory, 67(3):1951–1980, 2020.
  28. Hulle, M. M. V. Edgeworth approximation of multivariate differential entropy. Neural computation, 17(9):1903–1910, 2005.
  29. A statistical framework for neuroimaging data analysis based on mutual information estimated via a gaussian copula. Human brain mapping, 38(3):1541–1573, 2017.
  30. Perceiver io: A general architecture for structured inputs & outputs. arXiv preprint arXiv:2107.14795, 2021.
  31. Kendall, M. G. A new measure of rank correlation. Biometrika, 30(1/2):81–93, 1938.
  32. Relative performance of mutual information estimation methods for quantifying the dependence among short and noisy data. Physical Review E, 76(2):026209, 2007.
  33. Equitability, mutual information, and the maximal information coefficient. Proceedings of the National Academy of Sciences, 111(9):3354–3359, 2014.
  34. Efficient estimation in the bivariate normal copula model: normal margins are least favourable. Bernoulli, pp.  55–77, 1997.
  35. Normalized mutual information based registration using k-means clustering and shading correction. Medical image analysis, 10(3):432–439, 2006.
  36. Estimating mutual information. Physical review E, 69(6):066138, 2004.
  37. Optimization of vmd using kernel-based mutual information for the extraction of weak features to detect bearing defects. Measurement, 168:108402, 2021.
  38. Input feature selection by mutual information based on parzen window. IEEE transactions on pattern analysis and machine intelligence, 24(12):1667–1671, 2002.
  39. Simulation intelligence: Towards a new generation of scientific methods. arXiv preprint arXiv:2112.03235, 2021.
  40. The randomized dependence coefficient. Advances in neural information processing systems, 26, 2013.
  41. Geometric k-nearest neighbor estimation of entropy and mutual information. Chaos: An Interdisciplinary Journal of Nonlinear Science, 28(3), 2018.
  42. Consistency of data-driven histogram methods for density estimation and classification. The Annals of Statistics, 24(2):687–706, 1996.
  43. An efficient algorithm for information decomposition and extraction. In 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp.  972–979. IEEE, 2015.
  44. Estimating conditional mutual information for discrete-continuous mixtures using multi-dimensional adaptive histograms. In Proceedings of the 2021 SIAM international conference on data mining (SDM), pp.  387–395. SIAM, 2021.
  45. Fairness-aware learning for continuous attributes and treatments. In International Conference on Machine Learning, pp.  4382–4391. PMLR, 2019.
  46. Formal limitations on the measurement of mutual information. In International Conference on Artificial Intelligence and Statistics, pp.  875–884. PMLR, 2020.
  47. Estimation of mutual information using kernel density estimators. Physical Review E, 52(3):2318, 1995.
  48. Ccmi: Classifier based conditional mutual information estimation. In Uncertainty in artificial intelligence, pp.  1083–1093. PMLR, 2020.
  49. Estimation of rényi entropy and mutual information based on generalized nearest-neighbor graphs. Advances in Neural Information Processing Systems, 23, 2010.
  50. Paninski, L. Estimation of entropy and mutual information. Neural computation, 15(6):1191–1253, 2003.
  51. A robust estimator of mutual information for deep learning interpretability. Machine Learning: Science and Technology, 4(2):025006, 2023.
  52. Learning transferable visual models from natural language supervision. In International conference on machine learning, pp.  8748–8763. PMLR, 2021.
  53. H3d-net: Few-shot high-fidelity 3d head reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  5620–5629, 2021.
  54. Detecting novel associations in large data sets. science, 334(6062):1518–1524, 2011.
  55. Reynolds, D. A. et al. Gaussian mixture models. Encyclopedia of biometrics, 741(659-663), 2009.
  56. Runge, J. Conditional independence testing based on a nearest-neighbor estimator of conditional mutual information. In International Conference on Artificial Intelligence and Statistics, pp.  938–947. PMLR, 2018.
  57. Shannon, C. E. A mathematical theory of communication. The Bell system technical journal, 27(3):379–423, 1948.
  58. Shapiro, A. Monte carlo sampling methods. Handbooks in operations research and management science, 10:353–425, 2003.
  59. Silverman, B. W. Density estimation for statistics and data analysis. Routledge, 2018.
  60. The mutual information: detecting and evaluating dependencies between variables. Bioinformatics, 18(suppl_2):S231–S240, 2002.
  61. Density ratio estimation in machine learning. Cambridge University Press, 2012.
  62. Approximating mutual information by maximum likelihood density ratio estimation. In New challenges for feature selection in data mining and knowledge discovery, pp.  5–20. PMLR, 2008.
  63. Optimization of mutual information for multiresolution image registration. IEEE transactions on image processing, 9(12):2083–2099, 2000.
  64. The upward bias in measures of information derived from limited data samples. Neural Computation, 7(2):399–407, 1995.
  65. Nearest neighbor estimate of conditional mutual information in feature selection. Expert Systems with Applications, 39(16):12697–12708, 2012.
  66. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  67. Estimation of mutual information: A survey. In Rough Sets and Knowledge Technology: 4th International Conference, RSKT 2009, Gold Coast, Australia, July 14-16, 2009. Proceedings 4, pp.  389–396. Springer, 2009.
  68. Maximal correlation regression. IEEE Access, 8:26591–26601, 2020.
  69. Pointodyssey: A large-scale synthetic dataset for long-term point tracking. arXiv preprint arXiv:2307.15055, 2023.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com