RP-CATE: Recurrent Perceptron Channel Attention Transformer
- The paper introduces RP-CATE, a novel encoder replacing traditional self-attention with a Channel Attention Module alongside a Recurrent Perceptron to capture sample associations.
- It converts non-sequential industrial data into a pseudo-sequential format, enabling effective recurrent modeling through domain-informed sorting.
- Experimental results show RP-CATE outperforms conventional models in accuracy and interpretability, offering state-of-the-art performance in hybrid industrial modeling.
The Recurrent Perceptron-based Channel Attention Transformer Encoder (RP-CATE) is an encoder architecture for industrial hybrid modeling that systematically integrates mechanistic modeling and machine learning-based modeling techniques. RP-CATE addresses limitations in existing hybrid approaches by enabling nuanced sample-to-sample association learning on non-sequential industrial data and providing adaptive feature weighting via channel attention. The architecture is characterized by the novel replacement of transformer self-attention with a Channel Attention Module, the introduction of a Recurrent Perceptron (RP) Module, and custom preprocessing approaches that facilitate robust modeling of complex industrial scenarios (Yang et al., 22 Dec 2025).
1. Architectural Components and Workflow
RP-CATE comprises a modular block structure, each containing sequentially: the Pseudo-Sequential Data (PSD) Module, Recurrent Perceptron Module, Channel Attention Module, Feed-Forward Module, and Prediction Module. Blocks can be stacked or looped for iterations with residual connections.
- PSD Module converts input data to PSD by sorting according to a key feature selected using domain knowledge, producing .
- RP Module computes for , yielding via a two-layer perceptron. This module extracts underlying monotonic or periodic sample associations.
- Channel Attention Module transforms to Pseudo-Image Data (PID) using a cyclic sliding window, applies global max/average pooling, then fuses pooled results through two MLPs before producing normalized attention weights. Output is , with channel attention performed over feature dimensions.
- Feed-Forward Module executes a 3-layer MLP on , outputting .
- Prediction Module linearly reads out predictions: .
Residual connections are incorporated after each loop to stabilize representation learning. The end-to-end pipeline supports flexible hyperparameterization (window , loops , Adam optimizer, learning rate grid search).
2. Pseudo-Sequential Data and Underlying Associations
Industrial datasets typically lack strict sequentiality but may exhibit monotonic or periodic associations not exploited by generic machine learning models. RP-CATE instantiates Pseudo-Sequential Data (PSD) by sorting along a key feature , transforming unordered data to a form amenable for recurrent modeling: This transformation enables the RP Module to leverage recurrent operations and capture sample-wise dependencies akin to those found in temporal domains. The approach is specifically tuned to industrial scenarios where domain-driven sequencing yields improved predictive context.
3. Pseudo-Image Data and Cyclic Sliding Window Operator
RP-CATE generalizes channel attention operations to non-image data by synthesizing Pseudo-Image Data (PID) via a cyclic sliding window operator. For , the cyclic sliding window yields , with . Each is reshaped to patches, concatenating as PID of shape . This enables the application of spatial pooling and channel attention mechanisms classically used in CNN-based models, permitting channel-wise feature weighting irrespective of the original data structure.
Global max-pooling and average-pooling are performed over dimensions, generating and . These are processed by paired 2-layer MLPs and fused via sum, sigmoid, and softmax (over feature dimension) to generate adaptive attention masks reflecting feature relevance.
4. Training, Forward Pass, and Optimization
RP-CATE implements a forward pass as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
PSD ← sort_rows( X, by_column = x′ ) # Pseudo-sequential transformation I ← PSD # Residual identity for i = 1…N: if i>1: PSD ← PSD + I # Residual connection y_RP ← RPModule(PSD) # Recurrent Perceptron PID ← make_PID(y_RP, window = w) # Cyclic sliding window Attentions ← ChannelAttention(PID) # Channel-wise attention y_RP′ ← y_RP ⊙ Attentions y_FFM ← FFN³(y_RP′) # Nonlinear transform Ŷ ← y_FFM W_L + b_L # Linear prediction L = (1/m) ∑_{i=1}^m (Y_i – Ŷ_i)² + λ‖Θ‖₂ update all parameters via Adam(lr) |
Hyperparameters are set by grid search; final selections in the chemical engineering application were , , , , with a small regularization weight.
5. Experimental Validation and Comparative Results
RP-CATE was benchmarked against the Lee–Kesler mechanistic model and a suite of classical (RBFNN), recurrent (LSTM, GRU), sequence-based (Transformer, TCN), and graph-based (TGCN-S, DGDL, RADA) baselines. The task involved hybrid correction of the acentric factor (AF) for petroleum mixtures, with samples and independent variables.
Performance Table
| Model | MAE | RMSE | AER% | #Err < 1% | #Err > 5% | MIR% |
|---|---|---|---|---|---|---|
| Lee–Kesler | 0.0781 | 0.1534 | 5.35 | 17 | 17 | 0 |
| RBFNN | 0.0601± | 0.0714± | 12.14±5.96 | 4±4 | 35±10 | -3.6±13.9 |
| GRU | 0.0188± | 0.0326± | 1.34±0.51 | 34±7 | 3±3 | 57.8±23.7 |
| LSTM | 0.0104± | 0.0157± | 0.94±0.11 | 32±8 | 0+1 | 60.8±9.3 |
| Transformer | 0.0345± | 0.0566± | 3.94±1.85 | 15±6 | 17±8 | 25.6±7.2 |
| TCN | 0.0086 | 0.0153 | 0.86 | 46 | 0 | 78.2 |
| TGCN-S | 0.0192± | 0.0250± | 2.78±1.23 | 22±10 | 8±10 | 55.3±26.6 |
| DGDL | 0.0261± | 0.0323± | 2.15±2.97 | 13±7 | 4±8 | 45.0±29.2 |
| RADA | 0.0143± | 0.0194± | 1.86±0.45 | 25±4 | 1±2 | 56.4±8.7 |
| RP-CATE | 0.0084±0.0017 | 0.0132±0.0020 | 0.81±0.09 | 51±3 | 0+1 | 84.15±4.58 |
RP-CATE demonstrated a MIR of 84.15%, outperforming all competitive baselines. Ablation studies confirmed that both the RP and Channel Attention modules contribute substantially to final accuracy. RP-CATE consistently exceeded plain Transformer performance for all combinations.
Feature-wise attention maps showed stability with average weights: feature 0—0.41 (first pass), 0.45 (second pass); features 1 and 2—about 0.29, reflecting underlying mechanistic importance.
6. Significance and Interpretative Implications
RP Module enables modeling of structured associations in non-sequential industrial datasets, supporting representation of monotonicity and periodicity. Channel Attention Module adaptively weights channel inputs according to their measured influence, echoing mechanisms in physical models. The cyclic sliding window PID methodology extends applicability of channel attention to generic data formats.
A plausible implication is that RP-CATE’s modular design, recurrent sample association modeling, and feature-wise adaptive weighting present a generalized recipe for hybrid industrial scenarios with non-strictly-sequential data, where mechanistic regularities exist but ordering is ambiguous. This suggests RP-CATE may be extensible to other domains such as process control, multivariate sensor fusion, or scientific calibration tasks.
End-to-end, RP-CATE delivers state-of-the-art performance on industrial hybrid modeling, substantiating its role as a comprehensive machine learning architecture for complex scenario representation and predictive accuracy (Yang et al., 22 Dec 2025).