Hierarchical Correlation Reconstruction (HCR)

Updated 22 November 2025

Hierarchical Correlation Reconstruction (HCR) is a nonparametric framework that models joint and conditional probability densities using hierarchical orthonormal polynomial expansions.
It provides closed-form estimation of mixed moments, enabling interpretable reconstruction of dependencies, uncertainty quantification, and effective conditional density modeling.
HCR scales to high-dimensional data through basis truncation, regularization, and parallel computation, with applications in time series prediction, independence testing, and neural architectures.

Hierarchical Correlation Reconstruction (HCR) is a nonparametric, polynomial-basis framework for modeling and reconstructing joint and conditional probability densities in multivariate data, with applications spanning time series, conditional regression, statistical independence testing, neural architectures, and high-dimensional structure discovery. The HCR approach decomposes distributions into hierarchical orthonormal polynomial expansions, allowing systematic modeling of arbitrarily high-order statistical dependencies, direct closed-form parameter estimation, and interpretable mixed-moment representations. This enables efficient density prediction, uncertainty quantification, multilayer latent structure discovery, and scalable independence testing with computational complexity that is well-controlled by basis selection and regularization.

1. Mathematical Foundations, Formalism, and Hierarchical Expansion

The HCR methodology models the joint density $\rho(\mathbf{x})$ of $d$ normalized variables $\mathbf{x} = (x_1, \ldots, x_d) \in [0,1]^d$ , replacing the original marginals with approximately uniform variables via empirical CDF transformation or parametric mapping. One selects an orthonormal polynomial basis $\{f_j\}$ on $[0,1]$ —commonly rescaled Legendre polynomials, with $f_0(x) = 1$ and, for $j>0$ ,

$f_1(x) = \sqrt{3}(2x-1),\quad f_2(x)=\sqrt{5}(6x^2-6x+1),\ \ldots$

On $[0,1]^d$ , the joint basis functions are

$f_\mathbf{j}(\mathbf{x}) = \prod_{i=1}^{d} f_{j_i}(x_i)$

for multi-indices $\mathbf{j} = (j_1,\ldots,j_d)$ . The density expansion is: $\rho(\mathbf{x}) = \sum_{\mathbf{j}\in B} a_{\mathbf{j}} f_{\mathbf{j}}(\mathbf{x}),$ where $B$ is a finite index set, often constrained by maximal degree or number of nonzero $j_i$ (hierarchical truncation). The normalization $a_{0,\ldots,0}=1$ ensures $\int_{[0,1]^d}\rho(\mathbf{x})\,d\mathbf{x}=1$ .

The coefficients $a_{\mathbf{j}}$ have an explicit interpretation as mixed moments (“moment-like” or “cumulant-like” parameters). Terms where only one $j_i$ is nonzero adjust the $i$ -th marginal, pairwise nonzero terms calibrate correlations, higher-order nonzero indices encode higher-order or cross-cumulant dependencies, yielding a natural hierarchy of correlation reconstruction (Duda, 2018, Duda, 2022, Duda, 8 May 2024).

Estimation of $a_\mathbf{j}$ is accomplished by mean-square optimal projection: $a_\mathbf{j} = \frac{1}{n}\sum_{k=1}^n f_\mathbf{j}(\mathbf{x}^{(k)})$ for $n$ observations $\mathbf{x}^{(k)}$ . The expansion is dense in $L^2$ , so as $m \to \infty$ and $n \to \infty$ , the true density is recovered (Duda, 2018).

2. Conditional Density Modeling, Regression, and Interpretation

For conditional modeling, such as $\rho(y|x)$ for scalar response $y$ given feature vector $x = (x_1,\dots,x_d)$ , HCR expands the conditional density in the same polynomial basis after normalizing $y$ (and $x$ coordinates) to $[0,1]$ ,

$\tilde \rho(y|x) = 1 + \sum_{i=1}^m f_i(y) a_i(x)$

where $a_i(x)$ parameterizes the effect of $x$ on the $i$ -th “moment-like” term of $y$ . Each $a_i(x)$ is learned by regressing the training responses $f_i(y^{(t)})$ against feature-transformed $x^{(t)}$ via

$a_i(x) = \sum_{k=1}^{d'} c_{ik} \psi_k(x)$

where $\{\psi_k(x)\}$ are chosen feature functions (e.g., polynomial features or one-hot encodings for categorical data).

Feature selection and model regularization are implemented via canonical correlation analysis (CCA) and $L_1$ (lasso) sparsity (Duda, 2022). Calibration and normalization (e.g., nonnegative thresholding and $L^1$ normalization) ensure positive and valid conditional densities even when polynomial expansions produce negative values.

The result is an efficient, highly interpretable conditional density estimator that can capture multimodality, skewness, and higher-moment structure in the conditional laws of complex data (Duda et al., 2018, Duda et al., 2019, Duda, 2022).

3. Algorithms, Scalability, and Practical Implementation

HCR estimation is embarrassingly parallel at the level of separate coefficients due to orthogonality. For time series, each coefficient is updated via a direct average or, for nonstationary data, via an online exponentially weighted moving average: $a_j^{t+1} = \lambda\, a_j^t + (1-\lambda) f_j(X^{t+1}),\quad 0<\lambda<1$ where $f_j$ evaluated on the new observation $X^{t+1}$ (Duda, 2018).

For conditional modeling, linear least squares (optionally with lasso regularization) is used for each coefficient, leading to total batch costs of $O(m\, n\, K^2)$ for $m$ polynomial degrees and $K$ features. In practice, coefficients up to degree 4–8 are sufficient; feature selection and CCA dimension reduction are used to ensure computational tractability on high-dimensional data (Duda, 2022).

In statistical independence testing, HCR is uniquely efficient: all pairwise or low-order mixed moments are computed in $O(n\,d^2\,m^2)$ time, providing mutual information estimators and independence tests competitive with HSIC but with strictly linear scaling in sample size and interpretability of significant moment directions (Duda et al., 25 Aug 2025).

4. Applications: Multivariate Time Series, Conditional Prediction, and Beyond

HCR has been successfully applied in several domains:

Time Series Prediction: Full probability density forecast of financial returns given lagged context, with direct estimation and polynomial basis enabling modeling of arbitrary joint structure and risk quantification beyond AR/MA/ARCH families, including non-Gaussian and nonstationary settings (Duda, 2018).
Conditional Distribution Estimation: Prediction of conditional densities in applications such as bid-ask spread estimation (Duda et al., 2019), redshift prediction for AGNs (Duda, 2022), and income credibility evaluation (Duda et al., 2018). In all these, HCR provides interpretability at the level of which features (and their moments) modulate specific “moment-like” coefficients of the target, as well as uncertainty quantification and accurate imputation.
High-Dimensional Structure Discovery: As a direct density modeling tool, HCR supports systematic exploration of higher-order dependencies and enables hierarchical latent factor models reminiscent of total-correlation-based approaches (see CorEx) (Steeg et al., 2014).
Neural Architectures: Novel biologically-inspired neurons using joint distribution modeling with HCR enable multidirectional activation and conditional expectation propagation, extending or generalizing KAN (Kolmogorov-Arnold Networks) and standard multilayer perceptrons (MLPs). This supports propagation in any direction (inputs conditioned on outputs or vice versa), as well as higher-order moment/uncertainty transfer (Duda, 8 May 2024).

5. Statistical Independence Testing and Mutual Information Estimation

HCR serves as a computationally scalable alternative to kernel-based dependency criteria (notably HSIC):

Mixed-Moment Basis: Pairwise or higher-order mixed-moment features are computed across variables, with the null hypothesis of independence corresponding to vanishing (non-marginal) mixed moments.
Mutual Information Estimation: A bias-corrected sum of squared nontrivial mixed moments approximates the mutual information between multivariate vectors, valid for weakly dependent or copula-close distributions: $I(\mathbf{u};\mathbf{v}) \approx \sum_{j\neq0,\,k\neq0} a_{j,k}^2$
Linear Time Testing: Each mixed moment requires $O(n)$ work. Multiple testing and feature selection is handled via permutation strategies or order-statistics-based $p$ -values. Empirically, detection power matches or exceeds that of HSIC at much lower computational cost, especially for large $n$ (Duda et al., 25 Aug 2025).

6. Hierarchical Structure and Comparative Perspectives

HCR's capacity to model and reconstruct dependencies in a hierarchical, moment-wise manner allows it to bridge cross-disciplinary techniques:

Total Correlation and Layerwise Factorization: In discrete data, the CorEx approach seeks latent variables that capture multivariate dependencies by maximizing explained total correlation. Both HCR and CorEx employ hierarchical decomposition, with HCR using explicit mixed-moment polynomial expansions and CorEx using mutual-information-theoretic objectives with unsupervised hierarchical grouping (Steeg et al., 2014).
Comparison to Classical Methods: Relative to AR/MA/ARCH, HCR admits universal approximation and direct modeling of higher-order and nonlinear dependencies, not requiring strict moment or distributional assumptions. Unlike kernel dependence tests, HCR's basis is explicit and transparent, with directly interpretable coefficients and order selection.
High-Dimensional Structure Recovery: In high-dimensional covariance estimation, hierarchical approaches and two-step cleaning can recover block structure more robustly under non-i.i.d. noise or when population structure is hierarchical (García-Medina et al., 2022).

7. Limitations, Regularization, and Future Directions

Basis Explosion in High Dimensions: The number of coefficients grows exponentially with $d$ (dimension) and $m$ (maximal polynomial degree). Practicality requires restriction to pairwise or low-order terms; selective sparsity, lasso, or tensor decomposition are used to limit overfitting.
Nonnegativity of Densities: Polynomials of finite degree may dip negative; monotonic calibration and normalization are needed for valid density functions.
Model Selection and Regularization: Cross-validation and regression penalties control overfitting and feature inflation, especially in conditional modeling with many predictors (Duda, 2022).
Extensions: Online/adaptive learning, multidirectional neural architectures, biologically plausible information bottleneck training, inclusion of time as a variable, and integration with nonlinear or neural network predictors for moment functions (Duda, 2018, Duda, 8 May 2024).

HCR continues to offer a highly general, nonparametric, interpretable, and scalable paradigm for probabilistic modeling and inference in modern data analysis. By providing explicit control over which dependencies are modeled and rapid, closed-form estimation, it is positioned as a powerful tool in high-dimensional statistics, machine learning, and computational sciences (Duda, 2018, Duda et al., 2018, Duda et al., 2019, Duda, 2022, Duda, 8 May 2024, Duda et al., 25 Aug 2025).