Fuel Mix Predictor Methods

Updated 4 December 2025

Fuel Mix Predictor is a modeling framework that forecasts, optimizes, and infers fuel mixtures using techniques such as time-series analysis, chemometrics, and deep learning.
It leverages methods like hierarchical forecast reconciliation, compositional data analysis, and nonlinear optimization to accurately predict key fuel properties.
The approach spans applications from combustion engines to fusion, providing uncertainty-aware and data-driven predictions for practical engineering challenges.

A Fuel Mix Predictor is an algorithmic or modeling framework designed to forecast, optimize, or infer the composition and resulting properties or performance of fuel mixtures—whether for combustion engines, power generation, advanced propulsion, alternative fuels, or fusion applications. Such predictors leverage statistical, machine learning, kinetic, chemometric, or physical-surrogate techniques to relate input variables (e.g., mixture fractions, spectral data, operating conditions) to target outcomes (e.g., output properties, ignition parameters, combustion characteristics). The methodology and implementation details vary substantially depending on the domain, composition constraints, property domain, and the fidelity of available data and models.

1. Statistical Forecasting in Power Systems

Fuel Mix Predictors for electricity generation employ hierarchical time-series methods and compositional data analysis to forecast the share or absolute quantity of generation from each fuel class (e.g., coal, renewables, gas). The two principal frameworks are bottom-up hierarchical forecast reconciliation and log-ratio-transformed multivariate time-series analysis. The fundamental challenge is to enforce sum-to-total constraints and coherency across hierarchical partitions (e.g., region, fuel class).

Bottom-Up Hierarchical Forecast Reconciliation constructs base forecasts for disaggregated fuel-type series (e.g., $b_{t,i}$ for fuel $i$ at time $t$ ), aggregates to higher levels (total production), and applies a linear reconciliation operator $P$ to produce coherent forecasts:

$\hat{\mathbf{y}}^{\mathrm{rec}} = P\,\hat{\mathbf{y}}^{\mathrm{base}}$

The MinT reconcilier uses recent forecast-error covariance $W$ :

$P_{\mathrm{MinT}} = S(S^\top W^{-1}S)^{-1}S^\top W^{-1}$

Compositional Data Analysis (CoDa) transforms daily fuel mix vectors $\mathbf{d}_t$ from the simplex to Euclidean space via log-ratio transforms (alr, clr, ilr), fits vector-autoregressive or univariate models, forecasts in the transformed space, and inversely maps predictions to proportions. Coherence, closure, and compositional distance (Aitchison metric) are strictly preserved. Bottom-up methods exhibit lowest mean absolute scaled error (MASE) in operational settings when fossil fuel shares dominate; forecast errors increase with the share and variability of renewables (Shang et al., 29 Oct 2025).

2. Surrogate and Optimization-Based Prediction for Engine and Fuel Surrogates

For design and optimization of surrogate fuels or blends replicating real-fuel thermophysical and chemical kinetic properties, multivariate surrogate formulation algorithms combine chemometric property predictors with constrained nonlinear optimization.

Infrared Spectroscopy-Driven Octane Modeling uses principal component regression on ATR-FTIR spectra to predict octane numbers (RON, MON):

$\mathrm{ON} = \sum_\nu W_\nu A(\nu) + b$

Mixture spectra are constructed by mole-fraction weighted sums of neat component spectra; the resulting absorbance vector $A(\nu)$ enters the linear model. Typical errors are $E_\mathrm{global} \simeq 1.2$ octane units, with leave-one-out RMSE of 0.8–1.2 units, competitive with or better than tailored empirical correlations. Additional mixture properties (density, $H/C$ , distillation cuts, $C$ – $C$ bond-type fractions) are predicted via EOS evaluation (e.g., REFPROP) or linear mixing (Daly et al., 2018).

Multivariate Nonlinear Optimization targets minimal weighted deviation from specified target properties:

$\min_{x\in\mathbb{R}^N} f(x) = \sum_{j=1}^M \beta_j \frac{|P_j(x) - P_j^{\mathrm{tar}}|}{P_j^{\mathrm{tar}}}$

$\text{s.t.}\quad \sum_{i=1}^N x_i = 1,\ \quad 0 \le x_i \le u_i$

where $x_i$ are mole fractions, $P_j$ are property predictors, $\beta_j$ are user-specified weights. Optimization employs L-BFGS-B with 100 multi-starts and auto-reduction (species with $x_i<0.04$ pruned recursively). Full (8–13 species) and reduced (4–7 species) palettes achieve average absolute errors across all properties below 5%. The algorithm matches all target properties (octane, distillation, C–C, density, H/C) within target tolerances across standard FACE gasoline surrogates. The approach generalizes with new spectral training, property predictors, and optimization weights to arbitrary fuel families.

3. Machine Learning and Chemoinformatics-Enabled Surrogates

Machine learning models, especially neural-network surrogates, are applied to high-dimensional regression tasks involving complex compositional and operational descriptors.

Hybrid ML–Kinetics Frameworks employ feed-forward deep neural networks with additive, physical, structural, and thermodynamic descriptors (totaling, e.g., 46 chemistry + 4 conditions = 50 features) to predict ignition delay time (IDT) in blended fuel/additive scenarios. The model is trained on chemically simulated IDT targets, using fully connected architectures (e.g., [50,128,64,32,1] with ReLU), trained by Adam optimizer with batch size 64 and early stopping (Rabbani, 2021).

In symbolics, for input vector $x_i$ : $\tau_i \approx f(x_i; \Theta)$ with loss

$L(\Theta) = \frac{1}{N}\sum_{i=1}^N \left(\tau_i^\mathrm{true} - f(x_i; \Theta)\right)^2$

Performance on "seen" and "unseen" additives achieves $R^2 > 0.97$ ; RMSE below $6\times10^{-3}$ s, provided new additives remain within the convex hull of the training set. Extension mandates recomputation of descriptors for new chemistries and, if necessary, retraining with additional kinetic simulations.

Gaussian Process and Probabilistic Generative Surrogates for fuel property prediction use GP priors (e.g., Matérn $3/2$ kernel for density vs. $[P, T, c]$ inputs), with uncertainty quantification enabled via closed-form posterior variances. Adversarial deep generative models approximate the conditional distribution $p(\gamma\mid x)$ for physical properties, supporting multi-fidelity training via data concatenation, nonlinear autoregressive stacking (NARGP), or conditional generators with high- and low-fidelity coupling. Key metrics are $L_2$ -MRE ( $\sim 10^{-2}$ for 50% training), $R^2>0.998$ ; data-fusion reduces transcritical errors below 5%. Uncertainty quantification informs sampling and process-integration strategies (Freitas et al., 2021).

4. FTIR/Machine Learning Model Structures for Property Mapping

Accurate and interpretable physicochemical property prediction for neat fuels, blends, and sustainable aviation fuel (SAF) candidates can be achieved from liquid-phase FTIR spectra using a staged NMF–ensemble learning framework (Comesana et al., 2 Aug 2024).

Data Processing and Feature Extraction: Spectra are binned (±0.5 cm $^{-1}$ ), smoothed (15-bin MA), baseline-corrected, clipped, and region-masked (2500–2000 cm $^{-1}$ ), then area-normalized. NMF decomposes the $m\times n$ input matrix $X$ (samples $\times$ wavenumbers) into nonnegative component weights $W$ and basis spectra $H$ ( $X\approx WH$ ). The number of retained components $r$ is tuned to minimize validation RMSE in property prediction.

Target Properties and Modeling: For each property (final boiling point $T_b$ , flash point $T_\mathrm{flash}$ , freezing point $T_\mathrm{freeze}$ , density at 15°C $\rho_{15}$ , kinematic viscosity at $-20^\circ$ C $\nu_{-20}$ ), tree-based ensemble regressors (ExtraTrees, RandomForest, GradientBoosting) are fit on the $r$ -dim NMF representation. 5-fold CV is used for hyperparameter tuning; ensemble importances enable assignment of NMF basis functions to chemical-group signatures (e.g., aromatic C–H bending at 770 cm $^{-1}$ , alkane C–H stretch at 2850–2960 cm $^{-1}$ ). Explanatory tools include impurity-based feature importance, partial dependence, and SHAP values.

Inference and Deployment: For new spectral input, preprocessing and non-negative least-squares extraction of $h$ (NMF weights) precede property prediction via trained models. Cloud/web deployment involves FTIR spectral upload, prediction API endpoints, and SHAP-enabled dashboard visualization (Comesana et al., 2 Aug 2024).

5. Domain-Specific Applications: Combustion, Supersonic Flows, and Fusion

Combustion and Propulsion

In high-speed flow and propulsion applications, Large Eddy Simulation (LES)-resolved Fuel Mix Predictors correlate macroscopic mixing metrics (jet penetration, mixing-layer growth, efficiency, plume spreading angle) to fundamental fuel thermophysical properties (molecular weight $M_f$ , density ratio $\rho_r$ , specific-heat ratio $\gamma_f$ ).

Formulations include:

Near-field universal $J$ -scaling:

$\frac{h_j}{J D} \approx C_1 \left(\frac{x}{J D}\right)^{n_1}$

Far-field fuel-dependent penetration:

$\frac{h_j}{D} \approx A_f \left(\frac{x}{D}\right)^{n_f}$

with $A_f$ fitted as a linear combination of $M_f,\, \rho_r,\, \gamma_f$ .

Mixing-layer thickness:

$\delta_m / D \propto J^{-0.3} M_f^{-0.1} \rho_r^{-0.2}$

Empirical fits from LES resolve how lighter, high- $\gamma$ (e.g., H $_2$ ) fuels enhance jet entrainment by modulating shock, vortex, and turbulent structures, whereas higher-molecular-weight or denser blends suppress mixing-layer development (Boukharfane, 15 May 2025).

Fusion

Vlasov–Fokker–Planck kinetic models treat ablator–hotspot mix effects via spatially resolved (e.g., Gaussian-profile) compositional inhomogeneities. Core equations evolve distribution function moments, handle anisotropic collision/krook operators, and couple to electron-fluid and radiative losses. Key diagnostic constructs:

Radiative loss:

$W(T_e) = 1.69 \times 10^{-32} n_e \sqrt{T_e \cdot \sum_a n_a Z_a^2}$

Fusion reactivity suppression:

$\Delta_r(x) = \frac{r(x)}{r_M(x)}$

with $r(x)$ the actual rate, $r_M$ Maxwellian-equivalent.

Localized mix (e.g., 1.9% C over 5 μm) increases $\alpha$ -stopping, radiative loss, and triggers contractive flows—leading to sub-ignition conditions even if the same total mix uniformly distributed would have permitted ignition (Sadler et al., 2019).

6. Model Evaluation, Performance, and Extension Guidelines

Forecast Model Accuracy is assessed via MASE, RMSE, CRPS, or compositional Aitchison distance; model selection and hyperparameters are guided by cross-validation. Best practice is to use bottom-up reconciliation for hierarchical data and ilr transformation for compositional mix vectors with zeros.

Physicochemical Surrogate Models are evaluated by cross-validated RMSE, mean relative error, and $R^2$ . Practical guidance for extending frameworks includes augmenting spectrum-property training data (especially near nonlinear regimes or for new classes), expanding optimization palettes, tuning loss/weighting by application, and integrating uncertainty-aware methods (e.g., GP posterior variance, MC–generated predictive intervals).

Integration and Deployment recommendations include containerized APIs, web tools with real-time spectral upload and SHAP interpretability, and embedding surrogates in CFD/process simulators for rapid, uncertainty-aware property evaluations.

Each Fuel Mix Predictor must be specified, validated, and tuned in the context of the composition domain, available data (spectral, compositional, mechanistic), target properties, and operating regime. The frameworks described above, as implemented in current research, jointly establish rigorous, extensible substrates for advancing prediction, optimization, and control of complex fuel mixtures across combustion, energy, and plasma domains (Daly et al., 2018, Comesana et al., 2 Aug 2024, Rabbani, 2021, Freitas et al., 2021, Sadler et al., 2019, Shang et al., 29 Oct 2025, Boukharfane, 15 May 2025).