RP-CATE: Recurrent Perceptron Channel Attention Transformer

Updated 30 December 2025

The paper introduces RP-CATE, a novel encoder replacing traditional self-attention with a Channel Attention Module alongside a Recurrent Perceptron to capture sample associations.
It converts non-sequential industrial data into a pseudo-sequential format, enabling effective recurrent modeling through domain-informed sorting.
Experimental results show RP-CATE outperforms conventional models in accuracy and interpretability, offering state-of-the-art performance in hybrid industrial modeling.

The Recurrent Perceptron-based Channel Attention Transformer Encoder (RP-CATE) is an encoder architecture for industrial hybrid modeling that systematically integrates mechanistic modeling and machine learning-based modeling techniques. RP-CATE addresses limitations in existing hybrid approaches by enabling nuanced sample-to-sample association learning on non-sequential industrial data and providing adaptive feature weighting via channel attention. The architecture is characterized by the novel replacement of transformer self-attention with a Channel Attention Module, the introduction of a Recurrent Perceptron (RP) Module, and custom preprocessing approaches that facilitate robust modeling of complex industrial scenarios (Yang et al., 22 Dec 2025).

1. Architectural Components and Workflow

RP-CATE comprises a modular block structure, each containing sequentially: the Pseudo-Sequential Data (PSD) Module, Recurrent Perceptron Module, Channel Attention Module, Feed-Forward Module, and Prediction Module. Blocks can be stacked or looped for $N$ iterations with residual connections.

PSD Module converts input data $X \in \mathbb{R}^{m \times n}$ to PSD by sorting according to a key feature $x'$ selected using domain knowledge, producing $\mathrm{PSD} \in \mathbb{R}^{m \times n}$ .
RP Module computes $h_i = \sigma(\mathrm{PSD}_i U + h_{i-1} W + b_{HL1})$ for $i = 1, \dots, m$ , yielding $y_{RP,i}$ via a two-layer perceptron. This module extracts underlying monotonic or periodic sample associations.
Channel Attention Module transforms $y_{RP}$ to Pseudo-Image Data (PID) using a cyclic sliding window, applies global max/average pooling, then fuses pooled results through two MLPs before producing normalized attention weights. Output is $y_{RP}' = y_{RP} \odot \text{Attentions}$ , with channel attention performed over feature dimensions.
Feed-Forward Module executes a 3-layer MLP on $y_{RP}'$ , outputting $y_{FFM}$ .
Prediction Module linearly reads out predictions: $\hat Y = y_{FFM} W_L + b_L$ .

Residual connections are incorporated after each loop to stabilize representation learning. The end-to-end pipeline supports flexible hyperparameterization (window $w \in \{9,25\}$ , loops $N \in \{1,\dots,5\}$ , Adam optimizer, learning rate grid search).

2. Pseudo-Sequential Data and Underlying Associations

Industrial datasets typically lack strict sequentiality but may exhibit monotonic or periodic associations not exploited by generic machine learning models. RP-CATE instantiates Pseudo-Sequential Data (PSD) by sorting $X$ along a key feature $x'$ , transforming unordered data to a form amenable for recurrent modeling: $\mathrm{PSD} = \mathtt{sort}(X;x') \in \mathbb{R}^{m \times n}$ This transformation enables the RP Module to leverage recurrent operations and capture sample-wise dependencies akin to those found in temporal domains. The approach is specifically tuned to industrial scenarios where domain-driven sequencing yields improved predictive context.

3. Pseudo-Image Data and Cyclic Sliding Window Operator

RP-CATE generalizes channel attention operations to non-image data by synthesizing Pseudo-Image Data (PID) via a cyclic sliding window operator. For $y_{RP} \in \mathbb{R}^{m \times n}$ , the cyclic sliding window yields $Z = \{z_1,\dots,z_m\}$ , $z_i \in \mathbb{R}^{w \times n}$ with $w = k^2$ . Each $z_i$ is reshaped to $k \times k$ patches, concatenating as PID of shape $m \times k \times k \times n$ . This enables the application of spatial pooling and channel attention mechanisms classically used in CNN-based models, permitting channel-wise feature weighting irrespective of the original data structure.

Global max-pooling and average-pooling are performed over $k \times k$ dimensions, generating $H_{\max}$ and $H_{\text{avg}}$ . These are processed by paired 2-layer MLPs and fused via sum, sigmoid, and softmax (over feature dimension) to generate adaptive attention masks reflecting feature relevance.

4. Training, Forward Pass, and Optimization

RP-CATE implements a forward pass as follows:

PSD ← sort_rows( X, by_column = x′ )   # Pseudo-sequential transformation
I   ← PSD                              # Residual identity
for i = 1…N:
    if i>1:
        PSD ← PSD + I                  # Residual connection
    y_RP ← RPModule(PSD)               # Recurrent Perceptron
    PID ← make_PID(y_RP, window = w)   # Cyclic sliding window
    Attentions ← ChannelAttention(PID) # Channel-wise attention
    y_RP′ ← y_RP ⊙ Attentions
    y_FFM ← FFN³(y_RP′)                # Nonlinear transform
    Ŷ   ← y_FFM W_L + b_L             # Linear prediction
L = (1/m) ∑_{i=1}^m (Y_i – Ŷ_i)²  +  λ‖Θ‖₂
update all parameters via Adam(lr)

Hyperparameters are set by grid search; final selections in the chemical engineering application were $w = 25$ , $k = 5$ , $N = 2$ , $lr = 0.001$ , with a small $\ell_2$ regularization weight.

5. Experimental Validation and Comparative Results

RP-CATE was benchmarked against the Lee–Kesler mechanistic model and a suite of classical (RBFNN), recurrent (LSTM, GRU), sequence-based (Transformer, TCN), and graph-based (TGCN-S, DGDL, RADA) baselines. The task involved hybrid correction of the acentric factor (AF) for petroleum mixtures, with $m=60$ samples and $n=3$ independent variables.

Performance Table

Model	MAE	RMSE	AER%	#Err < 1%	#Err > 5%	MIR%
Lee–Kesler	0.0781	0.1534	5.35	17	17	0
RBFNN	0.0601±	0.0714±	12.14±5.96	4±4	35±10	-3.6±13.9
GRU	0.0188±	0.0326±	1.34±0.51	34±7	3±3	57.8±23.7
LSTM	0.0104±	0.0157±	0.94±0.11	32±8	0+1	60.8±9.3
Transformer	0.0345±	0.0566±	3.94±1.85	15±6	17±8	25.6±7.2
TCN	0.0086	0.0153	0.86	46	0	78.2
TGCN-S	0.0192±	0.0250±	2.78±1.23	22±10	8±10	55.3±26.6
DGDL	0.0261±	0.0323±	2.15±2.97	13±7	4±8	45.0±29.2
RADA	0.0143±	0.0194±	1.86±0.45	25±4	1±2	56.4±8.7
RP-CATE	0.0084±0.0017	0.0132±0.0020	0.81±0.09	51±3	0+1	84.15±4.58

RP-CATE demonstrated a MIR of 84.15%, outperforming all competitive baselines. Ablation studies confirmed that both the RP and Channel Attention modules contribute substantially to final accuracy. RP-CATE consistently exceeded plain Transformer performance for all $(w,N)$ combinations.

Feature-wise attention maps showed stability with average weights: feature 0—0.41 (first pass), 0.45 (second pass); features 1 and 2—about 0.29, reflecting underlying mechanistic importance.

6. Significance and Interpretative Implications

RP Module enables modeling of structured associations in non-sequential industrial datasets, supporting representation of monotonicity and periodicity. Channel Attention Module adaptively weights channel inputs according to their measured influence, echoing mechanisms in physical models. The cyclic sliding window PID methodology extends applicability of channel attention to generic data formats.

A plausible implication is that RP-CATE’s modular design, recurrent sample association modeling, and feature-wise adaptive weighting present a generalized recipe for hybrid industrial scenarios with non-strictly-sequential data, where mechanistic regularities exist but ordering is ambiguous. This suggests RP-CATE may be extensible to other domains such as process control, multivariate sensor fusion, or scientific calibration tasks.

End-to-end, RP-CATE delivers state-of-the-art performance on industrial hybrid modeling, substantiating its role as a comprehensive machine learning architecture for complex scenario representation and predictive accuracy (Yang et al., 22 Dec 2025).

PDF Markdown Chat (Pro)

References (1)

RP-CATE: Recurrent Perceptron-based Channel Attention Transformer Encoder for Industrial Hybrid Modeling (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Recurrent Perceptron-based Channel Attention Transformer Encoder (RP-CATE).