Papers
Topics
Authors
Recent
2000 character limit reached

Tomographic Quantile Forests (TQF)

Updated 20 December 2025
  • Tomographic Quantile Forests are a nonparametric, tree-based regression method that estimates multivariate conditional distributions via directional quantile projections.
  • It leverages the Cramér–Wold theorem and minimizes the sliced Wasserstein distance to capture arbitrary, nonconvex, and multimodal support shapes.
  • The framework integrates an augmented QRF++ model with alternating convex optimization to deliver efficient, uncertainty-quantified predictions in multivariate regression.

Tomographic Quantile Forests (TQF) are a nonparametric, tree-based regression approach for uncertainty-quantified prediction in multivariate response problems, designed to learn and reconstruct the full conditional distribution P(yx)P(y \mid x) of a vector-valued target using quantile estimation along arbitrary directions. TQF leverages all one-dimensional projections of the response, invoking the Cramér–Wold theorem to uniquely determine multivariate conditional laws, and reconstructs these distributions via sliced Wasserstein distance minimization. The framework integrates an augmented quantile forest model (“QRF++”) for efficient directional quantile regression with an alternating convex optimization procedure for distributional reconstruction, enabling flexible, nonconvex, and multimodal uncertainty representation without separately training models for each direction (Kanazawa, 18 Dec 2025).

1. Multivariate Regression and Problem Setting

Given covariates xRpx \in \mathbb{R}^p and multivariate responses yRdy \in \mathbb{R}^d, the objective is to recover the conditional law P(yx)P(y \mid x), not merely its mean or marginal characteristics. TQF exploits the mathematical property that the laws of all projected variables u=nyu = n^\top y for nSd1n \in S^{d-1}, the unit sphere in Rd\mathbb{R}^d, determine the law of yy through the Cramér–Wold device. This setup translates the multivariate distributional estimation problem into a continuum of one-dimensional quantile regression tasks, establishing the foundation for conditional distribution learning in arbitrary directions.

2. Directional Quantile Estimation in TQF

The key modeling step is learning the conditional τ\tau-quantile function Qτ(nyx)Q_\tau(n^\top y \mid x) for every direction nn and quantile level τ(0,1)\tau \in (0,1), defined as:

Qτ(nyx)=inf{q:Pr[nyqx]τ}.Q_\tau(n^\top y \mid x) = \inf\{q : \Pr[n^\top y \leq q \mid x] \geq \tau\}.

Training aims to minimize the pinball (check) loss,

ρτ(r)={τr,r0 (τ1)r,r<0,\rho_\tau(r) = \begin{cases} \tau r, & r \geq 0 \ (\tau-1) r, & r < 0 \end{cases},

across augmented input data where directional and Fourier features are incorporated to capture all-projection dependence and higher-order distributional features.

TQF adopts the QRF++ backbone—an extension of Quantile Regression Forests—embedding (x,n)(x, n) as the input and using output targets comprising multiple quantiles and random Fourier features of u=nyu = n^\top y. In training, for each data pair, GG independent random projection directions are sampled, and orthogonal rotations further augment the feature set. This produces G×NG \times N records per sample, supporting tree-based partitioning that is sensitive to input and projection configuration.

Model symmetrization ensures the quantile function satisfies

Qτ(nyx)=Q1τ(nyx)Q_\tau(n^\top y \mid x) = -Q_{1-\tau}(-n^\top y \mid x)

by defining the symmetrized model

FQ(x,n,τ)=12[FQ(x,n,τ)FQ(x,n,1τ)],F_Q^*(x, n, \tau) = \frac{1}{2} [F_Q(x, n, \tau) - F_Q(x, -n, 1-\tau)],

where FQF_Q is the raw quantile prediction forest.

3. Multivariate Distribution Reconstruction via Sliced Wasserstein Matching

After the forest is trained, for a test covariate xnewx_\mathrm{new}, quantiles are predicted for each of KK randomly chosen directions {nk}k=1K\{n_k\}_{k=1}^K at MM quantile levels:

{Qk,m}=FQ(xnew,nk,qm),m=1,,M.\{Q_{k, m}\} = F_Q^*(x_\mathrm{new}, n_k, q_m), \quad m = 1, \ldots, M.

The reconstruction task is then cast as finding a discrete empirical measure Z={(wj,zj)}j=1JZ = \{(w_j, z_j)\}_{j=1}^J approximating P(yxnew)P(y \mid x_\mathrm{new}) by minimizing the (discrete) sliced 1-Wasserstein loss:

L(Z)=1Kk=1K1Mm=1MQ(nk,qm;Z)Qk,m,L(Z) = \frac{1}{K}\sum_{k=1}^K\frac{1}{M} \sum_{m=1}^M |Q(n_k, q_m; Z) - Q_{k, m}|,

where Q(nk,qm;Z)Q(n_k, q_m; Z) denotes the qmq_m-quantile of the projection nkyn_k^\top y under ZZ. The optimization alternates:

  • Weight step: Solve the convex optimization of weights {wj}\{w_j\} for fixed supports,
  • Support step: Fit KDE to current weighted cloud and resample supports,
  • Ensemble merging: Run parallel alternations, combine supports, and prune low-weight or redundant points via loss minimization.

The process is summarized in the following pseudocode:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Input:
  D_slice = { (n_k, q_m, Q_{k,m}) } for k=1..K, m=1..M
  N0 = initial support size, N1 = regular support size, E = ensemble size
Output:
  Weighted point cloud Z_merged

1. Initialize z_j (j=1..N0)
2. Uniform weights w_j  1/N0
3. Optimize {w_j} to minimize L({(w_j, z_j)}, D_slice)
4. Repeat until convergence:
    a. Fit KDE to current cloud Z
    b. Sample N1 new {z_j} from KDE
    c. Optimize weights {w_j} on new support
    d. Update loss, check for decrease
5. For e=1..E in parallel:
    a. Sample N1 from final KDE  {z_j^(e)}
    b. Optimize weights {w_j^(e)} on {z_j^(e)}
    c. Collect all (w_j^(e), z_j^(e)) into Z_*
6. Prune Z_*:
   For ℓ, keep top ℓ points, renormalize, compute L_ℓ
   Choose ℓ* minimizing L_ℓ
Return top ℓ* points
This Quantile-Matching Empirical Measure (QMEM) procedure efficiently yields a support-weighted point cloud representing the learned P(yx)P(y \mid x).

4. Computational Complexity and Algorithmic Characteristics

The main training phase requires tree induction for BB trees across G×NG \times N samples, yielding O(BGNlog(GN))O(B G N \log(GN)) complexity per coordinate due to ensemble-based splitting. Quantile predictions require O(BKlog(GN))O(B K \log(GN)) operations for KK query directions.

QMEM reconstruction involves a convex optimization in JJ variables (O(J2)O(J^2) per solve, or O(J3)O(J^3) in the worst case), repeated over IaltI_\mathrm{alt} alternations and EE ensembles. With practical settings J1000J\sim 1000 and E20E\sim 20, this results in manageable per-query cost.

Typically, K[30,100]K \in [30, 100] directions suffice for d5d \leq 5; in higher dimensions, quasi-Monte Carlo sampling or spherical designs may be employed for improved projection coverage.

5. Theoretical Properties and Methodological Comparisons

TQF’s QMEM stage produces an empirical measure minimizing the sliced-1-Wasserstein distance to the model-predicted projected quantiles. The sliced Wasserstein loss is convex with respect to the weights, with global convergence and stability under gradient-based optimization.

Under standard forest honesty and sufficient data (with NN \to \infty and G,K,MG, K, M \to \infty), consistency holds for the estimated directional quantiles, and the QMEM reconstruction converges in sliced Wasserstein distance.

Classical Directional Quantile Regression (DQR) fits separate (typically linear) models for each direction, intersecting quantile-defining halfspaces to obtain only convex central regions, which cannot represent nonconvex or multimodal conditional supports. TQF overcomes these limitations by modeling all directions simultaneously via a nonparametric forest, imposing no convexity or unimodality restriction, and capturing arbitrary support shapes (e.g., two moons, annuli, regions with holes). TQF thus generalizes DQR approaches by enabling efficient joint estimation and reconstruction without restrictive assumptions.

6. Practical Aspects and Implementation Guidance

Quantile regions at level τ\tau and direction nn are obtained as the halfspace {y:nyFQ(x,n,τ)}\{y: n^\top y \leq F_Q^*(x, n, \tau)\}. Central conditional regions emerge as intersections over multiple random directions.

To sample from the reconstructed P^(yx)\hat P(y \mid x), one simply draws a support point zjz_j with probability wjw_j from the final QMEM, optionally adding Gaussian noise for smoothness.

Essential hyperparameters include:

  • BB (trees in forest): 50–200
  • TT (number Fourier frequencies): 0–10
  • G,G~G, \tilde G (sample/feature augmentations): 5–20
  • M,KM, K (quantile and projection discretizations): 20–50
  • N0,N1N_0, N_1 (support sizes for QMEM): N010N_0 \sim 10, N1100N_1 \sim 100
  • EE (ensemble runs in QMEM): 10–20

Larger G,G~G, \tilde G improve directional dependence capture but increase training cost. MM controls granularity of quantile reconstruction, with higher values resolving finer features. KK sets a tradeoff between reconstruction fidelity and speed. N1N_1 should balance expressivity for multimodal supports against computational burden in the convex optimization of weights.

TQF presents a framework for nonparametric, parallelizable, and distribution-free multivariate conditional uncertainty estimation, suited primarily to tabular data settings and providing an unrestricted, data-adaptive characterization of predictive uncertainty (Kanazawa, 18 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Tomographic Quantile Forests (TQF).