Discretized NeuTucF Model for Sparse Turbulence Data
- The paper introduces a neural variant of Tucker factorization that discretizes continuous wind field data, achieving state-of-the-art performance in turbulence estimation.
- It employs quantile-based binning and a four-mode neural Tucker decomposition to embed discrete indices and capture complex spatio-temporal interactions.
- Empirical results show lower MAE and RMSE compared to baselines, demonstrating robust imputation and regression of atmospheric turbulence quantities.
The discretized NeuTucF model is a neural variant of Tucker factorization tailored to model sparse, multi-dimensional turbulence data using quantized, discrete input features. Developed to estimate turbulence quantities such as the Richardson number from wind profile radar data, NeuTucF achieves state-of-the-art performance on continuous yet sparse three-dimensional wind fields by embedding discretized indices into low-rank representations and capturing multiway spatio-temporal interactions via a four-mode neural Tucker decomposition. This methodology enables imputation and regression of missing entries in real-world atmospheric datasets, outperforming common baseline models (Fan et al., 5 Dec 2025).
1. Input Discretization and Quantization
The model operates on four continuous variables: altitude (), and three wind speed components (, , ). Each variable undergoes the following transformation pipeline:
- Standardization: Each feature is z-normalized,
where and are the empirical mean and standard deviation.
- Quantile-based binning: Each standardized feature is partitioned into equally populated bins. Quantile boundaries are computed as
where denotes the empirical quantile function.
- Piecewise discretization: Each real-valued standardized input is mapped to its corresponding bin index using
Applying this scheme to all features produces discrete indices for each observation. All quantized data are then aggregated into a four-way array
with encoding the (possibly missing) Richardson number at the joint quantized coordinates.
2. Tucker-Based Neural Model Architecture
The NeuTucF model encodes interactions between quantized features using a four-mode neural Tucker factorization:
- Factor matrices (embeddings): For each mode (corresponding to ), a learnable matrix
provides -dimensional embeddings for each discrete value. For instance, embeds bin index of height.
- Core tensor: A dense core tensor
encodes higher-order interactions between modes. All experiments fix .
- Tucker reconstruction: Classically, the estimated value at is
3. Neural Interaction Tensor Construction
Rather than multiplying full mode-factor matrices as in classical Tucker, NeuTucF constructs the "interaction tensor" using outer products of the relevant embeddings: where denotes the outer product. Entrywise,
The flattened interaction tensor is then linearly projected using the flattened core : where is a sigmoid or identity activation. This neural mapping allows fully expressive four-way nonlinear interactions parameterized by the core tensor.
4. Training Objective and Optimization
Training minimizes the mean squared error (MSE) over all observed entries : where are ground-truth Richardson numbers and controls weight regularization.
Implementation utilizes PyTorch and the Adam optimizer (learning rate ), training for approximately 100 epochs in minibatches. This configuration suits high-dimensional, sparse regression over wind field tensors.
5. Inference and Reconstruction
After training, the model can impute or regress missing entries by passing their respective discrete indices through the same embedding, outer product, and linear mapping pipeline. The inference formula is equivalent to Tucker-factor reconstruction: This unified parametrization allows estimation anywhere within the quantized input domain without explicit tensor completion.
6. Empirical Results and Performance Metrics
Model efficacy is evaluated via five-fold cross-validation using mean absolute error (MAE), root-mean-square error (RMSE), and coefficient of determination ():
| Model | MAE () | RMSE () | () |
|---|---|---|---|
| M1 (NeuTucF) | |||
| Baselines (M2–M5) | Higher | Higher | Lower |
All baseline models (M2–M5) exhibit higher MAE and RMSE and lower compared to NeuTucF under matched embedding size and Tucker rank (). The chosen discretization and low-rank parametric structure yield the best cross-validated error; no explicit ablation of bin count or Tucker rank is reported, but these hyperparameters are fixed for fair comparison (Fan et al., 5 Dec 2025).
7. Context, Significance, and Limitations
The discretized NeuTucF approach addresses the challenge of estimating turbulence quantities from sparse, incomplete, or irregularly-sampled atmospheric measurements, particularly in settings where only wind profile radar is available. Discretization adapts continuous inputs for compatibility with embedding-based neural tensor factorization models, while the full four-way Tucker core captures complex spatio-temporal dependencies. The method demonstrates robust imputational and regression accuracy across a variety of low-altitude turbulence datasets. A plausible implication is that similar discretized neural factorization architectures could be adapted for regression tasks involving other high-dimensional, spatio-temporal geophysical data.
However, the model's effectiveness is contingent on the appropriateness of the chosen binning strategy and Tucker rank. The paper reports no explicit sensitivity analysis on these hyperparameters, leaving open the question of how choices such as (the number of quantile bins) affect generalization or bias (Fan et al., 5 Dec 2025).