Pseudo-Sequential Data (PSD)
- Pseudo-Sequential Data (PSD) is a method that sorts unordered industrial data by a key feature, enabling recurrent modeling on non-sequential datasets.
- The RP-CATE framework integrates PSD with a recurrent perceptron and channel attention modules to boost predictive accuracy, interpretability, and computational efficiency.
- The approach has been validated in applications like chemical engineering, demonstrating significant improvements in error metrics and model improvement rate over traditional methods.
The Recurrent Perceptron-based Channel Attention Transformer Encoder (RP-CATE) is a hybrid machine learning architecture designed for industrial modeling applications where datasets often display non-sequential sample associations such as monotonicity and periodicity. Addressing two major limitations in previous industrial hybrid methods—restricted architectural scope and inadequate exploitation of sample-wise dependencies—RP-CATE introduces a novel combination of Transformer-inspired channel attention, a custom Recurrent Perceptron module, and specialized data structuring techniques. These innovations facilitate enhanced predictive accuracy, computational efficiency, and interpretability, yielding substantial improvement over both mechanistic and machine learning baselines (Yang et al., 22 Dec 2025).
1. Architectural Composition
RP-CATE is composed of modular, stackable encoding "blocks" with a five-stage functional pipeline:
- Pseudo-Sequential Data (PSD) Module: Converts raw, non-sequential input into a sorted pseudo-sequence by ordering rows by a key feature .
- Recurrent Perceptron (RP) Module: Processes with a recurrent feed-forward structure, exploiting underlying sample associations:
assembling .
- Channel Attention Module: Applies channel-wise weighting through pseudo-image construction and attentive pooling:
- Generates PID via a cyclic sliding window (see Section 2).
- Computes feature attention via elementwise $\softmax(\sigma(H_1 + H_2))$ where and emerge from global max and average pooling across channels.
- Produces attentive features: .
- Feed-Forward Module: Nonlinear transformation via a 3-layer MLP: .
- Prediction Module: Outputs .
Blocks may be repeated times with residual connections. Hyperparameters (window , loop count , learning rate , regularization ) are grid-searched; optimal reported values are , , , and small .
2. Pseudo-Image Data (PID) and Cyclic Sliding Window
The PID mechanism enables spatial pooling over non-image industrial data by reformatting sequential features.
- Given and window , the cyclic sliding window operator
extracts consecutive sample rows for each , wrapping cyclically at dataset boundaries. Each vector is reshaped into , forming PID: .
This enables standard CNN-inspired global max and average pooling over the channel axes, facilitating feature-wise attention operations for non-spatial data.
3. Pseudo-Sequential Data (PSD) Representation
RP-CATE operates on datasets not strictly ordered by time or process steps. To allow recurrent processing, RP-CATE introduces a "pseudo-sequential" structure:
- Select a key feature (e.g., domain-relevant variable).
- Sort so that rows ascend by , producing:
This construction provides a basis for recurrent modeling of sample-to-sample dependencies (monotonicity, periodicity) otherwise obscured in conventional tabular organization.
4. Model Workflow and Training Protocol
The RP-CATE forward and training process follows:
- Build from raw features using the chosen sorting column .
- Loop times (with residual addition post-initial pass):
- RP Module computes recurrent hidden and output states.
- Channel Attention constructs PID via cyclic sliding window, computes per-feature weights, and applies these to .
- Feed-Forward and Prediction stages map attentive outputs to final prediction.
- Objective function: MSE loss penalized with norm () over parameters .
- Parameters updated via Adam optimizer.
The process is formalized in detailed pseudocode (see original source (Yang et al., 22 Dec 2025)).
5. Experimental Evaluation and Comparative Performance
RP-CATE was empirically validated on a chemical engineering hybrid modeling task: correcting Lee–Kesler acentric factor (AF) for petroleum mixtures. Input consisted of samples, independent variables. Target was defined as (prediction bias).
Model benchmarks included:
- Lee–Kesler (mechanistic)
- RBFNN (classical ML)
- GRU, LSTM (RNNs)
- Transformer, TCN (sequence models)
- TGCN-S, DGDL, RADA (graph-based)
Key metrics:
- MAE (mean absolute error)
- RMSE
- AER (%) (average relative error)
- Err < 1% (count below 1% error)
- Err > 5% (count above 5% error)
- MIR (%) (Model Improvement Rate over Lee–Kesler)
Summary results (median std, 5 runs):
| Model | MAE | RMSE | AER (%) | Err<1% | Err>5% | MIR (%) |
|---|---|---|---|---|---|---|
| Lee–Kesler | 0.0781 | 0.1534 | 5.35 | 17 | 17 | 0 |
| RBFNN | 0.0601 | 0.0714 | 12.145.96 | 44 | 3510 | –3.613.9 |
| GRU | 0.0188 | 0.0326 | 1.340.51 | 347 | 33 | 57.823.7 |
| LSTM | 0.0104 | 0.0157 | 0.940.11 | 328 | 0+1 | 60.89.3 |
| Transformer | 0.0345 | 0.0566 | 3.941.85 | 156 | 178 | 25.67.2 |
| TCN | 0.0086 | 0.0153 | 0.86 | 46 | 0 | 78.2 |
| TGCN-S | 0.0192 | 0.0250 | 2.781.23 | 2210 | 810 | 55.326.6 |
| DGDL | 0.0261 | 0.0323 | 2.152.97 | 137 | 48 | 45.029.2 |
| RADA | 0.0143 | 0.0194 | 1.860.45 | 254 | 12 | 56.48.7 |
| RP-CATE | 0.00840.0017 | 0.01320.0020 | 0.810.09 | 513 | 0+1 | 84.154.58 |
Ablation studies revealed performance decrement upon removing RP or Channel Attention modules, underscoring both components’ necessity. Across all grid settings, RP-CATE consistently outperformed vanilla Transformer.
Feature-wise attention maps demonstrated stability and mechanistically grounded feature attribution: feature 0 received on average 0.41→0.45 weight across passes, features 1 and 2 approximately 0.29→0.28. This suggests the attention mechanism meaningfully reflects known process dependencies.
6. Significance and Implementation Implications
RP-CATE advances hybrid modeling in process industries in several respects:
- The RP Module effectively exploits non-sequential sample associations, a legacy challenge for conventional ML architectures applied to unordered tabular industrial datasets.
- The Channel Attention Module dynamically weighs input features according to empirical relevance rather than pre-assumed importance, enhancing model interpretability.
- The pseudo-image data and pooling machinery allow channel-wise attention to operate robustly even where natural spatial structure is absent.
- End-to-end, RP-CATE delivers substantial predictive improvement, evidenced by an 84.15% MIR versus mechanistic baseline and superior accuracy in all comparative experiments.
A plausible implication is that RP-CATE’s architectural principles may be extendable to other domains where latent sample relationships exist but standard sequential or spatial ordering is unavailable.
7. Context within Industrial Hybrid Modeling
RP-CATE addresses persistent limitations in industrial hybrid modeling, combining mechanism-informed data selection (PSD) with robust deep learning modules (RP and channel attention) for low-cost and interpretable prediction enhancement. In contrast to prevailing single-method solutions, RP-CATE defines a comprehensive architecture applicable to a wider range of industrial scenarios, and its mechanism-aware attention offers improved insight into feature importance for process engineering tasks.
RP-CATE may shape future research in hybrid modeling, particularly where legacy mechanistic models require data-driven correction and industrial datasets challenge canonical neural architectures with non-standard sample associations (Yang et al., 22 Dec 2025).