Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The Normalized Cross Density Functional: A Framework to Quantify Statistical Dependence for Random Processes (2212.04631v3)

Published 9 Dec 2022 in cs.LG, cs.AI, cs.IT, and math.IT

Abstract: This paper presents a novel approach to measuring statistical dependence between two random processes (r.p.) using a positive-definite function called the Normalized Cross Density (NCD). NCD is derived directly from the probability density functions of two r.p. and constructs a data-dependent Hilbert space, the Normalized Cross-Density Hilbert Space (NCD-HS). By Mercer's Theorem, the NCD norm can be decomposed into its eigenspectrum, which we name the Multivariate Statistical Dependence (MSD) measure, and their sum, the Total Dependence Measure (TSD). Hence, the NCD-HS eigenfunctions serve as a novel embedded feature space, suitable for quantifying r.p. statistical dependence. In order to apply NCD directly to r.p. realizations, we introduce an architecture with two multiple-output neural networks, a cost function, and an algorithm named the Functional Maximal Correlation Algorithm (FMCA). With FMCA, the two networks learn concurrently by approximating each other's outputs, extending the Alternating Conditional Expectation (ACE) for multivariate functions. We mathematically prove that FMCA learns the dominant eigenvalues and eigenfunctions of NCD directly from realizations. Preliminary results with synthetic data and medium-sized image datasets corroborate the theory. Different strategies for applying NCD are proposed and discussed, demonstrating the method's versatility and stability beyond supervised learning. Specifically, when the two r.p. are high-dimensional real-world images and a white uniform noise process, FMCA learns factorial codes, i.e., the occurrence of a code guarantees that a specific training set image was present, which is important for feature learning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Deep canonical correlation analysis. In Proceedings of the 30th International Conference on International Conference on Machine Learning, pages 1247–1255, 2013.
  2. Kernel independent component analysis. Journal of machine learning research, 3(Jul):1–48, 2002.
  3. A probabilistic interpretation of canonical correlation analysis. Technical Report 688, Department of Statistics, University of California, 2005.
  4. Christopher TH Baker. The numerical treatment of integral equations. Oxford University Press, 1977.
  5. Horace B Barlow. Unsupervised learning. Neural computation, 1(3):295–311, 1989.
  6. Finding minimum entropy codes. Neural Computation, 1(3):412–423, 1989.
  7. Mutual information neural estimation. In International conference on machine learning, pages 531–540. PMLR, 2018.
  8. Estimating optimal transformations for multiple regression and correlation. Journal of the American statistical Association, 80(391):580–598, 1985.
  9. When does non-negative matrix factorization give a correct decomposition into parts? Advances in neural information processing systems, 16, 2003.
  10. State aggregation learning from markov transition data. Advances in Neural Information Processing Systems, 32, 2019.
  11. Dimensionality reduction for supervised learning with reproducing kernel hilbert spaces. Journal of Machine Learning Research, 5(Jan):73–99, 2004.
  12. A kernel statistical test of independence. Advances in neural information processing systems, 20, 2007.
  13. Anna Hagenblad. Aspects of the Identification of Wiener Models. 1999.
  14. Maximum likelihood estimation of wiener models. In Proc. IEEE Conf. Decision and Control, volume 3, pages 2417–2418. IEEE, 2000.
  15. Kernel mean embedding of probability measures and its applications to functional data analysis. Scandinavian Journal of Statistics, 2023.
  16. Non-stationary gaussian process regression with hamiltonian monte carlo. In Artificial Intelligence and Statistics, pages 732–740. PMLR, 2016.
  17. Learning unknown ode models with gaussian processes. In International conference on machine learning, pages 1959–1968. PMLR, 2018.
  18. Bo Hu and José C Príncipe. Mimo modeling by learning explicitly the projection space: The maximum correlation ratio cost function. IEEE Transactions on Signal Processing, 69:6039–6054, 2021.
  19. Bo Hu and José C Príncipe. Cross density kernel for nonstationary signal processing. In 2023 IEEE Statistical Signal Processing Workshop (SSP), pages 195–199. IEEE, 2023.
  20. Gaussian universal features, canonical correlations, and common information. In 2018 IEEE Information Theory Workshop (ITW), pages 1–5. IEEE, 2018.
  21. On universal features for high-dimensional learning and inference. arXiv preprint arXiv:1911.09105, 2019.
  22. Independent component analysis: algorithms and applications. Neural networks, 13(4-5):411–430, 2000.
  23. Kari Karhunen. Über lineare Methoden in der Wahrscheinlichkeitsrechnung: Akadem. Abhandlung. PhD thesis, Sana, 1947.
  24. Algorithms for non-negative matrix factorization. Advances in neural information processing systems, 13, 2000.
  25. Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755):788–791, 1999.
  26. A survey of matrix theory and matrix inequalities, volume 14. Courier Corporation, 1992.
  27. Kernel mean embedding of distributions: A review and beyond. Foundations and Trends® in Machine Learning, 10(1-2):1–141, 2017.
  28. On surrogate loss functions and f-divergences. 2009.
  29. Liam Paninski. Estimation of entropy and mutual information. Neural Computation, 15(6):1191–1253, 2003.
  30. Emanuel Parzen. An approach to time series analysis. The Annals of Mathematical Statistics, 32(4):951–989, 1961.
  31. Emanuel Parzen. On estimation of a probability density function and mode. The annals of mathematical statistics, 33(3):1065–1076, 1962.
  32. Emanuel Parzen. Stochastic processes. SIAM, 1999.
  33. Scikit-learn: Machine learning in python. the Journal of machine Learning research, 12:2825–2830, 2011.
  34. José C Príncipe. Information theoretic learning: Renyi’s entropy and kernel perspectives. Springer Science & Business Media, 2010.
  35. Kernel adaptive filtering: a comprehensive introduction. John Wiley & Sons, 2011.
  36. Alfréd Rényi. On measures of dependence. Acta mathematica hungarica, 10(3-4):441–451, 1959.
  37. Fuzzy spectral clustering by pcca+: application to markov state models and data classification. Advances in Data Analysis and Classification, 7:147–179, 2013.
  38. Bernard W Silverman. Density estimation for statistics and data analysis. Routledge, 2018.
  39. Hilbert space embeddings and metrics on probability measures. The Journal of Machine Learning Research, 11:1517–1561, 2010.
  40. New York City Taxi and Limousine Commission. Tlc trip record data, 2024. URL https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page. Accessed: 2024-01-20.
  41. Norbert Wiener. Extrapolation, Interpolation, and Smoothing of Stationary Time Series: with Engineering Applications. Cambridge, MA, USA: MIT Press, 1950.
  42. Christopher KI Williams. Prediction with gaussian processes: From linear regression to linear prediction and beyond. In Learning in graphical models, pages 599–621. Springer, 1998.
  43. Gaussian processes for machine learning, volume 2. MIT press Cambridge, MA, 2006.
  44. Identification of hammerstein–wiener models. Automatica, 49(1):70–81, 2013.
  45. Spectral state compression of markov processes. IEEE transactions on information theory, 66(5):3202–3231, 2019.
  46. Gaussian regression and optimal finite dimensional linear models. 1997.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com