Computational Hypergraph Discovery, a Gaussian Process framework for connecting the dots (2311.17007v1)
Abstract: Most scientific challenges can be framed into one of the following three levels of complexity of function approximation. Type 1: Approximate an unknown function given input/output data. Type 2: Consider a collection of variables and functions, some of which are unknown, indexed by the nodes and hyperedges of a hypergraph (a generalized graph where edges can connect more than two vertices). Given partial observations of the variables of the hypergraph (satisfying the functional dependencies imposed by its structure), approximate all the unobserved variables and unknown functions. Type 3: Expanding on Type 2, if the hypergraph structure itself is unknown, use partial observations of the variables of the hypergraph to discover its structure and approximate its unknown functions. While most Computational Science and Engineering and Scientific Machine Learning challenges can be framed as Type 1 and Type 2 problems, many scientific problems can only be categorized as Type 3. Despite their prevalence, these Type 3 challenges have been largely overlooked due to their inherent complexity. Although Gaussian Process (GP) methods are sometimes perceived as well-founded but old technology limited to Type 1 curve fitting, their scope has recently been expanded to Type 2 problems. In this paper, we introduce an interpretable GP framework for Type 3 problems, targeting the data-driven discovery and completion of computational hypergraphs. Our approach is based on a kernel generalization of Row Echelon Form reduction from linear systems to nonlinear ones and variance-based analysis. Here, variables are linked via GPs and those contributing to the highest data variance unveil the hypergraph's structure. We illustrate the scope and efficiency of the proposed approach with applications to (algebraic) equation discovery, network discovery (gene pathways, chemical, and mechanical) and raw data analysis.
- Learning non-gaussian graphical models via hessian scores and triangular transport. arXiv preprint arXiv:2101.03093, 2021.
- Error analysis of kernel/gp methods for nonlinear and parametric pdes. arXiv preprint arXiv:2305.04962, 2023.
- Kernel methods are competitive for operator learning. Journal of Computational Physics, 2023.
- Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the national academy of sciences, 113(15):3932–3937, 2016.
- Solving and learning nonlinear pdes with gaussian processes. Journal of Computational Physics, 447:110668, 2021.
- Sparse cholesky factorization for solving nonlinear pdes via gaussian processes. arXiv preprint arXiv:2304.01294, 2023.
- David Maxwell Chickering. Optimal structure identification with greedy search. Journal of machine learning research, 3(Nov):507–554, 2002.
- One-shot learning of stochastic differential equations with data adapted kernels. Physica D: Nonlinear Phenomena, 444:133583, 2023.
- Sensitivity analysis and model validation. Secondary analysis of electronic health records, pages 263–271, 2016.
- Land surface processes relevant to sub-seasonal to seasonal (s2s) prediction. In Sub-Seasonal to Seasonal Prediction, pages 165–181. Elsevier, 2019.
- A non-adapted sparse approximation of pdes with stochastic inputs. Journal of Computational Physics, 230(8):3015–3034, 2011.
- Structure learning in graphical modeling. Annual Review of Statistics and Its Application, 4:365–393, 2017.
- Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3):432–441, 2008.
- Relational analytics: Guidelines for analysis and action. Routledge, 2021.
- Causal inference in statistics: A primer. John Wiley & Sons, 2016.
- Simple, low-cost and accurate data-driven geophysical forecasting with learned kernels. Proceedings of the Royal Society A, 477(2252):20210326, 2021.
- Learning dynamical systems from data: A simple cross-validation perspective, part i: Parametric kernel flows. Physica D: Nonlinear Phenomena, 421:132817, 2021.
- Learning dynamical systems from data: A simple cross-validation perspective, part iv: case with partial observations. Physica D: Nonlinear Phenomena, 454:133853, 2023.
- P Howard. Modeling with ode. Lecture Notes, 2009.
- Toru Ishihara. Enumeration of hypergraphs. European Journal of Combinatorics, 22(4):503–509, 2001.
- Towards a learning theory of cause-effect inference. In International Conference on Machine Learning, pages 1452–1461. PMLR, 2015.
- A survey of optimal recovery. In Optimal estimation in approximation theory, pages 1–54. Springer, 1977.
- Kernel pca and de-noising in feature spaces. In NIPS, volume 11, pages 536–542, 1998.
- Counterfactuals and causal inference. Cambridge University Press, 2015.
- Art B Owen. Variance components and generalized sobol’indices. SIAM/ASA Journal on Uncertainty Quantification, 1(1):19–41, 2013.
- H. Owhadi and C. Scovel. Operator Adapted Wavelets, Fast Solvers, and Numerical Homogenization, from a game theoretic approach to numerical approximation and algorithm design. Cambridge Monographs on Applied and Computational Mathematics. Cambridge University Press, 2019.
- Statistical Numerical Approximation. Notices of the AMS, 66(10), 2019.
- Houman Owhadi. Computational graph completion. Research in the Mathematical Sciences, 9(2):1–33, 2022.
- Houman Owhadi. Do ideas have shape? idea registration as the continuous limit of artificial neural networks. Physica D: Nonlinear Phenomena, 444:133592, 2023.
- Kernel Mode Decomposition and the programming of kernels. Springer, 2021.
- Richard S. Palais. The symmetries of solitons, 1997.
- Elements of causal inference: foundations and learning algorithms. The MIT Press, 2017.
- Learning continuous exponential families beyond gaussian. arXiv preprint arXiv:2102.09198, 2021.
- Causal protein-signaling networks derived from multiparameter single-cell data. Science, 308(5721):523–529, 2005.
- Sparse cholesky factorization by kullback–leibler minimization. SIAM Journal on Scientific Computing, 43(3):A2019–A2046, 2021.
- Sparse recovery of elliptic solvers from matrix-vector products. SIAM Journal on Scientific Computing, 2023.
- Compression, inversion, and approximate pca of dense kernel matrices at near-linear computational complexity. Multiscale Modeling & Simulation, 19(2):688–730, 2021.
- Economic networks: The new challenges. science, 325(5939):422–425, 2009.
- The hardness of conditional independence testing and the generalised covariance measure. The Annals of Statistics, 48(3):1514–1538, 2020.
- Ilya M Sobol. Global sensitivity indices for nonlinear mathematical models and their monte carlo estimates. Mathematics and computers in simulation, 55(1-3):271–280, 2001.
- IM Soboĺ. Sensitivity estimates for nonlinear mathematical models. Math. Model. Comput. Exp., 1:407, 1993.
- An algorithm for fast recovery of sparse causal graphs. Social science computer review, 9(1):62–72, 1991.
- Probabilistic latent variable models for distinguishing between cause and effect. Advances in neural information processing systems, 23, 2010.
- Network analysis of the stock market, 2015.
- Aldo V Vecchia. Estimation and model identification for continuous spatial processes. Journal of the Royal Statistical Society Series B: Statistical Methodology, 50(2):297–312, 1988.
- Kernel-based conditional independence test and application in causal discovery. In 27th Conference on Uncertainty in Artificial Intelligence (UAI 2011), pages 804–813. AUAI Press, 2011.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.