ML-Based Soft Sensors

Updated 10 February 2026

ML-based soft sensors are inferential models that estimate hard-to-measure process variables using data-driven and physics-informed algorithms trained on historical and simulated data.
They integrate methods such as PLS, MLP, LSTM, and adaptive learning paradigms with robust data preprocessing to achieve real-time, accurate predictions.
Industries like chemicals, energy, semiconductors, and robotics deploy these sensors to improve process control, quality assurance, and predictive maintenance.

A machine learning-based soft sensor is an inferential model that estimates hard-to-measure or costly process variables in real time from easily measured physical sensor data by utilizing statistical or neural models learned from historical and/or simulated data. ML-based soft sensors are routinely deployed in industrial sectors such as chemicals, energy, semiconductors, robotics, and manufacturing for real-time estimation, control, and quality assurance, filling a critical gap where inline physical sensing is infeasible, too slow, or too expensive (Lawrence et al., 2024).

1. Core Concepts and Definitions

A soft sensor (“virtual sensor”) computes a function $\hat y = s(x, z; w)$ , mapping frequently available process measurements $x$ (e.g., temperatures, pressures, flows) and possibly auxiliary signals $z$ (e.g., spectroscopic, image, or acoustic data) to an inferred value $\hat y$ of a process variable $y$ that is unavailable in real time. The parameter vector $w$ is estimated from historical data.

Types of Soft Sensors

Data-driven (Black-box): Models are trained purely from data, such as PLS, random forests, support vector regression (SVR), multilayer perceptrons (MLP), LSTMs, graph neural networks, and newer prompt-based LLM methods (Lawrence et al., 2024, Tong et al., 6 Jan 2025).
Hybrid (Gray-box): Combine first-principles models (e.g., Kalman filters, dynamic simulators) with data-driven correction terms for extrapolation (Kubosawa et al., 2022).
Physics-driven (White-box): Pure first-principles estimation, used when full mechanistic knowledge exists, but less common for complex or poorly characterized systems.

Soft sensors are essential for process control, predictive maintenance, digital twins, anomaly detection, and quality forecasting.

2. Mathematical and Architectural Foundations

Soft-sensing tasks are formalized as supervised learning—mapping $x_i \in \mathbb{R}^d$ to $y_i \in \mathbb{R}$ via a learned function $f(x;\theta)$ minimizing a risk function, typically mean squared error (MSE), mean absolute error (MAE), or root mean squared error (RMSE):

$\min_{\theta}\; \frac{1}{N}\sum_{i=1}^N\,\mathcal{L}(y_i, f(x_i; \theta))$

Representative Model Classes

Model	Canonical Formula	Typical Use Case
PLS	$\hat y = X B$ , $B = W (P^T W)^{-1} q$	Linear, collinear input domains (Urhan et al., 2019)
Lasso/RVM	$\min_{w}\frac{1}{2}\Vert y-Xw\Vert_2^2+\lambda\Vert w\Vert_1$	Sparse/high-dimensional regimes
MLP	$h^{(l)} = \sigma(W^{(l)} h^{(l-1)} + b^{(l)}),\;\hat y=h^{(L)}$	General regressors
LSTM	Recurrent cell: see section 4.3, (Lawrence et al., 2024, Fernandes et al., 3 Feb 2026)	Temporal process, sequential data
GPR	Bayesian regression with kernel $k(\cdot,\cdot)$ , predictive mean/variance	Uncertainty, small data (Sofla et al., 2024)
GNN	Message-passing over process graph: $h_i^{(k+1)}=\phi(h_i^{(k)},\{h_j^{(k)}\})$	Topology-aware, multi-plant (Theisen et al., 5 Feb 2025)
LLM-based/ICL	In-context prompt-based prediction	Zero/few-shot, interpretable (Tong et al., 6 Jan 2025)

Architectural choices are dictated by process dynamics, variable types, data availability, input-output complexity, and need for extrapolation.

3. Data Handling, Training Regimes, and Losses

Data Preprocessing

Soft sensor quality is critically dependent on careful preprocessing:

Outlier removal using IQR, PCA+Hotelling's $T^2$ , or segmental mean-shift algorithms (Oster et al., 2021).
Missing value imputation (median fill or zero-preference), feature selection (SHAP, domain retrieval, or ML-based ranking) (Tong et al., 6 Jan 2025).
Feature scaling (zero-mean/unit-variance normalization, min-max scaling) and variable embedding for categorical inputs.
Alignment of high-frequency sensor data with infrequent lab/inspection values: e.g., rolling window averaging prior to matching (Oster et al., 2021, Fan et al., 2023).

Model Training Paradigms

Batch/Offline: Model is trained on $\{(x_i, y_i)\}$ , hyperparameters selected via cross-validation.
Online/Adaptive: Models are updated in moving windows (sliding or delayed), just-in-time learning, or recursive schemes to handle process drift (Kneale et al., 2017, Urhan et al., 2019).
Semi-supervised: Incorporate abundant unlabeled data (confidence regularization, manifold/variance penalties) to improve performance with sparse $y$ (Esche et al., 2021, Cacciarelli et al., 2022).
Active learning: Query by informativeness to minimize labeling cost (Cacciarelli et al., 2022).
Transfer/Few-shot: Pretrain across multiple units/plants, calibrate per-unit using few labeled points or unsupervised adaptation (Grimstad et al., 2023, Grimstad et al., 2024, Farahani et al., 2020, Theisen et al., 5 Feb 2025).

Loss and Evaluation Metrics

Regression: MSE, MAE, RMSE, $R^2$ , piecewise error grouping for tolerance geometry in high-precision tasks (Fan et al., 2023).
Classification/Detection: Accuracy, confusion matrix, early-warning recall, false-positive rate (Fan et al., 2023).
Uncertainty Quantification: Predictive intervals via Bayesian/posterior variance or LLM-generated confidence intervals (Tong et al., 6 Jan 2025, Sofla et al., 2024).
Model selection: Hyperparameter tuning via grid/random search for minimal test error; model interpretability by SHAP, feature attribution, or knowledge-discovery analysis.

4. Model Types and Advanced Architectures

Linear and Classical Statistical Methods

Partial least squares (PLS) and principal component regression (PCR) remain foundational for high-dimensional, noisy process data with moderate nonlinearity (Urhan et al., 2019, Lawrence et al., 2024). Sparse regularization (Lasso, RVM) improves high-dimensional performance, particularly with lagged or collinear predictors (Urhan et al., 2019).

Deep Learning Architectures

Dense/MLP: Effective for nonlinear, cross-sectional mappings; proven in VDU quality prediction and on-device regression (Oster et al., 2021, Ling et al., 2023).
LSTM/RNN: Mandatory for time-sequenced or memory-driven processes, enabling accurate wafer-inspection prediction and well pressure estimation (Fan et al., 2023, Fernandes et al., 3 Feb 2026).
GNN/Graph Attention: For topologically flexible, multi-sensor, and multi-plant scenarios, leveraging plant process connectivity (Theisen et al., 5 Feb 2025).
Hierarchical/Multi-task/Latent Variable Models: Pool information across units, enable shared structure and few-shot transfer, yielding $\mathcal{O}(K)$ parameter adaptation per unit (Grimstad et al., 2023, Grimstad et al., 2024).

Recent Paradigms

In-context Learning (ICL) using LLMs: Replace parameter tuning with prompt-driven inference, supporting zero/few-shot regression, uncertainty-awareness, and self-explanation, all with no model retraining. Variable selection is performed by retrieval-augmented prompting with high selection consistency; predictions are justified by chain-of-thought reasoning and assigned empirical confidence intervals (Tong et al., 6 Jan 2025).
RL-based dynamic models: Use a reinforcement-learning agent as a soft sensor for state estimation in closed-loop control, updating dynamic simulation parameters for robust extrapolation (Kubosawa et al., 2022).

5. Transfer Learning, Adaptability, and Robustness

Transfer and Multi-Unit Learning

Learning a soft-sensor mapping $f(x; c_i, \theta)$ parametrized by shared weights $\theta$ and unit-specific contexts $c_i$ enables efficient adaptation to new units with minimal calibration data. Hierarchical Bayesian regularization encodes prior knowledge and enables few-shot per-unit adaptation with empirical MAPE dropping below 5% after 1–3 labeled points in petroleum well flowmeter modeling (Grimstad et al., 2023). Deep latent variable models further combine semi-supervised and multi-task learning; unlabeled data (often abundant) can be leveraged to improve accuracy under severe label scarcity, sometimes achieving near-asymptotic error with only 4–6 new labeled samples in real multiphase well data (Grimstad et al., 2024).

Topology-Aware and Domain-Invariant Methods

Graph neural networks that model plants as graphs of units and streams support flexible sensor layouts, topological heterogeneity, and facilitate transfer learning with minimal retraining. Zero-shot inference on a new plant topology achieves test RMSE competitive with fully retrained models; fine-tuning with a handful of samples further reduces RMSE by up to 24.2% (Theisen et al., 5 Feb 2025).

Domain-adversarial neural regression (DANN-R) extends the capacity for transfer by enforcing domain invariance in hidden features, reducing cross-plant MSE by a factor of 2–3 under label-free adaptation (Farahani et al., 2020).

6. Practical Applications and Industry Case Studies

ML-based soft sensors are extensively validated across diverse industrial sectors:

Sector	Variable Estimated	Modeling Approach	Error Metric / Result	Reference
Petroleum	BHP/flow/metrology	LSTM + transfer/few-shot	MAPE < 2%; competitive with PDG	(Fernandes et al., 3 Feb 2026, Grimstad et al., 2023)
Semiconductors	Wafer metrology	LSTM, piecewise error grouping	>82% in "decent" error bands	(Fan et al., 2023)
Chemical plants	Distillation curve	MLP, SHAP feature attribution	ANN MAE below SARIMA baselines	(Oster et al., 2021)
Robotics	Force, torque, tactile	SVM, random forest, CNN, GPR	Force RMSE < 40mN, 93% accuracy	(Yang et al., 2021, Shen et al., 3 Feb 2026, Sofla et al., 2024)
Soft grippers	Curvature measurement	GPR, SVR	RMSE ~0.22 m $^{-1}$	(Sofla et al., 2024)
Fluid process	Flow estimation (edge)	Quantized MLP on MCU/FPGA	Inference: $1\,\mu$ s, MSE < prior	(Ling et al., 2023)
Power generation	Active power, LHV	DANN-R adversarial NN	2–3 $\times$ MSE reduction	(Farahani et al., 2020)

In each application, real-time, robust, and accurate soft sensors replace unreliable, delayed, or cost-prohibitive hard measurements.

7. Limitations, Trends, and Future Directions

Current Limitations

Limited interpretability and diagnosability of deep and black-box models; advances in explanation (SHAP, LLM self-explanation) are mitigating but not eliminating this issue (Tong et al., 6 Jan 2025, Oster et al., 2021).
Dependence on quality of historical data—errors and bias due to systematic labeling errors, insufficient process coverage, or drift remain critical concerns (Fan et al., 2023, Urhan et al., 2019).
Transfer learning success depends on the degree of process similarity and shared invariants across units/plants (Theisen et al., 5 Feb 2025).

Emerging Trends

Few-shot/semi-supervised adaptation: Leveraging unlabeled process data and meta-learned model structure yields rapid deployment on new units—central to scalable, cross-plant soft sensing (Grimstad et al., 2024, Grimstad et al., 2023).
Topology-aware representations: GNNs encode process flowsheets/industrial topologies, supporting sensor mismatch and robust transfer (Theisen et al., 5 Feb 2025) [KANS framework, (Tew et al., 2 Jan 2025)].
Prompt-based and retrieval-augmented modeling: LLMs—when equipped with in-context demonstrations and domain retrieval—enable plug-and-play soft sensors with uncertainty quantification and natural language explanations, eliminating the need for gradient-based training and tuning (Tong et al., 6 Jan 2025).
Edge/embedded deployment: Quantized, memory- and power-efficient models for inference on MCUs/FPGAs enable sub-millisecond, microjoule-level operation at scale (Ling et al., 2023).
Physics-guided, RL-based, and hybrid models: RL agents as soft sensors and physics-informed nets are advancing extrapolation and robustness to unseen regimes (Kubosawa et al., 2022, Lawrence et al., 2024).

Soft sensors are expected to further evolve toward data-efficient, explainable, and easily transferable systems, deeply integrated with domain knowledge, process topology, and human-in-the-loop workflows while meeting industrial constraints on reliability, security, and latency.