Papers
Topics
Authors
Recent
2000 character limit reached

Pseudo-Sequential Data (PSD)

Updated 30 December 2025
  • Pseudo-Sequential Data (PSD) is a method that sorts unordered industrial data by a key feature, enabling recurrent modeling on non-sequential datasets.
  • The RP-CATE framework integrates PSD with a recurrent perceptron and channel attention modules to boost predictive accuracy, interpretability, and computational efficiency.
  • The approach has been validated in applications like chemical engineering, demonstrating significant improvements in error metrics and model improvement rate over traditional methods.

The Recurrent Perceptron-based Channel Attention Transformer Encoder (RP-CATE) is a hybrid machine learning architecture designed for industrial modeling applications where datasets often display non-sequential sample associations such as monotonicity and periodicity. Addressing two major limitations in previous industrial hybrid methods—restricted architectural scope and inadequate exploitation of sample-wise dependencies—RP-CATE introduces a novel combination of Transformer-inspired channel attention, a custom Recurrent Perceptron module, and specialized data structuring techniques. These innovations facilitate enhanced predictive accuracy, computational efficiency, and interpretability, yielding substantial improvement over both mechanistic and machine learning baselines (Yang et al., 22 Dec 2025).

1. Architectural Composition

RP-CATE is composed of modular, stackable encoding "blocks" with a five-stage functional pipeline:

  1. Pseudo-Sequential Data (PSD) Module: Converts raw, non-sequential input X∈Rm×nX \in \mathbb{R}^{m \times n} into a sorted pseudo-sequence {PSDi}\{\mathrm{PSD}_i\} by ordering rows by a key feature x′x'.
  2. Recurrent Perceptron (RP) Module: Processes PSD\mathrm{PSD} with a recurrent feed-forward structure, exploiting underlying sample associations:

    hi=σ(PSDiU+hi−1W+bHL1)(h0=0),h_i = \sigma(\mathrm{PSD}_i U + h_{i-1} W + b_{HL1}) \quad (h_0=0),

    yRP,i=σ(σ(hiV+bHL2)WHL2+b),y_{RP,i} = \sigma\left(\sigma(h_i V + b_{HL2}) W_{HL2} + b\right),

    assembling yRP∈Rm×ny_{RP} \in \mathbb{R}^{m \times n}.

  3. Channel Attention Module: Applies channel-wise weighting through pseudo-image construction and attentive pooling:
    • Generates PID via a cyclic sliding window (see Section 2).
    • Computes feature attention via elementwise $\softmax(\sigma(H_1 + H_2))$ where H1H_1 and H2H_2 emerge from global max and average pooling across channels.
    • Produces attentive features: yRP′=yRP⊙Attentionsy_{RP}' = y_{RP} \odot \mathrm{Attentions}.
  4. Feed-Forward Module: Nonlinear transformation via a 3-layer MLP: yFFM=FFN3(yRP′)y_{FFM} = \mathrm{FFN}^3(y_{RP}').
  5. Prediction Module: Outputs Y^=yFFMWL+bL\hat Y = y_{FFM} W_L + b_L.

Blocks may be repeated NN times with residual connections. Hyperparameters (window ww, loop count NN, learning rate lrlr, regularization λ\lambda) are grid-searched; optimal reported values are w=25w=25, N=2N=2, lr=0.001lr=0.001, and small λ\lambda.

2. Pseudo-Image Data (PID) and Cyclic Sliding Window

The PID mechanism enables spatial pooling over non-image industrial data by reformatting sequential features.

  • Given yRP∈Rm×ny_{RP} \in \mathbb{R}^{m \times n} and window w=k2w = k^2, the cyclic sliding window operator

CSW:Rm×n→Rm×w×n\mathrm{CSW}: \mathbb{R}^{m \times n} \to \mathbb{R}^{m \times w \times n}

extracts ww consecutive sample rows for each ii, wrapping cyclically at dataset boundaries. Each vector zi∈Rw×nz_i \in \mathbb{R}^{w \times n} is reshaped into k×k×nk \times k \times n, forming PID: Rm×k×k×n\mathbb{R}^{m \times k \times k \times n}.

This enables standard CNN-inspired global max and average pooling over the k×kk \times k channel axes, facilitating feature-wise attention operations for non-spatial data.

3. Pseudo-Sequential Data (PSD) Representation

RP-CATE operates on datasets X∈Rm×nX \in \mathbb{R}^{m \times n} not strictly ordered by time or process steps. To allow recurrent processing, RP-CATE introduces a "pseudo-sequential" structure:

  • Select a key feature x′x' (e.g., domain-relevant variable).
  • Sort XX so that rows ascend by x′x', producing:

PSD=fPSD(X;x′)=sort(X;x′)\mathrm{PSD} = f_{\mathrm{PSD}}(X; x') = \mathtt{sort}(X; x')

This construction provides a basis for recurrent modeling of sample-to-sample dependencies (monotonicity, periodicity) otherwise obscured in conventional tabular organization.

4. Model Workflow and Training Protocol

The RP-CATE forward and training process follows:

  1. Build PSD\mathrm{PSD} from raw features XX using the chosen sorting column x′x'.
  2. Loop NN times (with residual addition post-initial pass):
    • RP Module computes recurrent hidden and output states.
    • Channel Attention constructs PID via cyclic sliding window, computes per-feature weights, and applies these to yRPy_{RP}.
    • Feed-Forward and Prediction stages map attentive outputs to final prediction.
  3. Objective function: MSE loss penalized with ℓ2\ell_2 norm (λ\lambda) over parameters Θ\Theta.
  4. Parameters updated via Adam optimizer.

The process is formalized in detailed pseudocode (see original source (Yang et al., 22 Dec 2025)).

5. Experimental Evaluation and Comparative Performance

RP-CATE was empirically validated on a chemical engineering hybrid modeling task: correcting Lee–Kesler acentric factor (AF) for petroleum mixtures. Input consisted of m=60m=60 samples, n=3n=3 independent variables. Target was defined as Y=Ytrue−YLKY = Y_{\rm true} - Y_{\rm LK} (prediction bias).

Model benchmarks included:

  • Lee–Kesler (mechanistic)
  • RBFNN (classical ML)
  • GRU, LSTM (RNNs)
  • Transformer, TCN (sequence models)
  • TGCN-S, DGDL, RADA (graph-based)

Key metrics:

  1. MAE (mean absolute error)
  2. RMSE
  3. AER (%) (average relative error)
  4. #\#Err < 1% (count below 1% error)
  5. #\#Err > 5% (count above 5% error)
  6. MIR (%) (Model Improvement Rate over Lee–Kesler)

Summary results (median ±\pm std, 5 runs):

Model MAE RMSE AER (%) #\#Err<1% #\#Err>5% MIR (%)
Lee–Kesler 0.0781 0.1534 5.35 17 17 0
RBFNN 0.0601±\pm 0.0714±\pm 12.14±\pm5.96 4±\pm4 35±\pm10 –3.6±\pm13.9
GRU 0.0188±\pm 0.0326±\pm 1.34±\pm0.51 34±\pm7 3±\pm3 57.8±\pm23.7
LSTM 0.0104±\pm 0.0157±\pm 0.94±\pm0.11 32±\pm8 0+1 60.8±\pm9.3
Transformer 0.0345±\pm 0.0566±\pm 3.94±\pm1.85 15±\pm6 17±\pm8 25.6±\pm7.2
TCN 0.0086 0.0153 0.86 46 0 78.2
TGCN-S 0.0192±\pm 0.0250±\pm 2.78±\pm1.23 22±\pm10 8±\pm10 55.3±\pm26.6
DGDL 0.0261±\pm 0.0323±\pm 2.15±\pm2.97 13±\pm7 4±\pm8 45.0±\pm29.2
RADA 0.0143±\pm 0.0194±\pm 1.86±\pm0.45 25±\pm4 1±\pm2 56.4±\pm8.7
RP-CATE 0.0084±\pm0.0017 0.0132±\pm0.0020 0.81±\pm0.09 51±\pm3 0+1 84.15±\pm4.58

Ablation studies revealed performance decrement upon removing RP or Channel Attention modules, underscoring both components’ necessity. Across all (w,N)(w,N) grid settings, RP-CATE consistently outperformed vanilla Transformer.

Feature-wise attention maps demonstrated stability and mechanistically grounded feature attribution: feature 0 received on average 0.41→0.45 weight across passes, features 1 and 2 approximately 0.29→0.28. This suggests the attention mechanism meaningfully reflects known process dependencies.

6. Significance and Implementation Implications

RP-CATE advances hybrid modeling in process industries in several respects:

  • The RP Module effectively exploits non-sequential sample associations, a legacy challenge for conventional ML architectures applied to unordered tabular industrial datasets.
  • The Channel Attention Module dynamically weighs input features according to empirical relevance rather than pre-assumed importance, enhancing model interpretability.
  • The pseudo-image data and pooling machinery allow channel-wise attention to operate robustly even where natural spatial structure is absent.
  • End-to-end, RP-CATE delivers substantial predictive improvement, evidenced by an 84.15% MIR versus mechanistic baseline and superior accuracy in all comparative experiments.

A plausible implication is that RP-CATE’s architectural principles may be extendable to other domains where latent sample relationships exist but standard sequential or spatial ordering is unavailable.

7. Context within Industrial Hybrid Modeling

RP-CATE addresses persistent limitations in industrial hybrid modeling, combining mechanism-informed data selection (PSD) with robust deep learning modules (RP and channel attention) for low-cost and interpretable prediction enhancement. In contrast to prevailing single-method solutions, RP-CATE defines a comprehensive architecture applicable to a wider range of industrial scenarios, and its mechanism-aware attention offers improved insight into feature importance for process engineering tasks.

RP-CATE may shape future research in hybrid modeling, particularly where legacy mechanistic models require data-driven correction and industrial datasets challenge canonical neural architectures with non-standard sample associations (Yang et al., 22 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Pseudo-Sequential Data (PSD).