wSTL-NN: Weighted Signal Temporal Logic Neural Networks

Updated 7 May 2026

wSTL-NN is a framework that blends weighted signal temporal logic with neural networks to offer interpretable and quantitatively robust analysis in time-series tasks.
It employs smooth, differentiable relaxations of logical and temporal operators, allowing for gradient-based learning and extraction of explicit, human-readable formulas.
Empirical evaluations demonstrate high classification accuracy, efficient fault diagnosis, and practical rule extraction in safety-critical and dynamic environments.

Weighted Signal Temporal Logic Neural Networks (wSTL-NN) combine the expressive temporal, Boolean, and quantitative semantics of weighted Signal Temporal Logic (wSTL) with the trainability and representational power of neural networks. By encoding wSTL subformulas as differentiable neurons and learning parameters via gradient-based optimization, wSTL-NN provides an interpretable, formally grounded, and computationally efficient framework for time-series classification and other temporal logic specification learning tasks. The approach is characterized by explicit importance weighting of logic and temporal operators, smooth differentiable relaxations of min/max semantics, and full end-to-end trainability, delivering models whose final output is an explicit, human-readable wSTL formula.

1. Weighted Signal Temporal Logic: Syntax and Quantitative Semantics

Weighted Signal Temporal Logic (wSTL) extends classical STL by assigning nonnegative importance weights to each Boolean and temporal operator in the formula. Let $s:\mathbb{N}\to\mathbb{R}^d$ denote a discrete $d$ -dimensional signal.

A wSTL formula $\tilde{\varphi}$ admits the following recursive grammar (Yan et al., 2021, Chen et al., 2022, Li et al., 2022):

Atomic predicate: $a^\top s(k)\le c$
Negation: $\lnot\tilde{\varphi}$
Weighted conjunction/disjunction: ${}^{w_1}\!\tilde{\varphi}_1 \wedge {}^{w_2}\!\tilde{\varphi}_2$ , ${}^{w_1}\!\tilde{\varphi}_1 \vee {}^{w_2}\!\tilde{\varphi}_2$
Weighted temporal operators:
- Always: $\mathbf{G}_{I}^{\mathbf{w}}\tilde{\varphi}$
- Eventually: $\mathbf{F}_{I}^{\mathbf{w}}\tilde{\varphi}$

Here $w_1,w_2>0$ are clause weights, $d$ 0 are interval weights on time window $d$ 1, and $d$ 2.

Quantitative semantics assign a real-valued robustness to each formula, with higher weights amplifying the quantitative satisfaction of their associated subformulae. Standard min/max are replaced by soft, differentiable aggregations respecting the weights:

For conjunction: $d$ 3, where $d$ 4 and $d$ 5
For temporal operators: Aggregation is performed over the window, with smooth weighting; as $d$ 6, soft operations converge to the true weighted min/max (Yan et al., 2021).

2. Neural Network Architectures for wSTL-NN

Each wSTL formula can be compiled into a neural network in which each node corresponds to a wSTL subformula and each edge encodes the logical/temporal structure (Li et al., 2022, Yan et al., 2021, Yan et al., 2022, Chen et al., 2022). The general architectural principles are as follows:

Predicate Layer: Each atomic predicate $d$ 7 becomes a neuron, with only threshold $d$ 8 (and optionally $d$ 9) trainable (Li et al., 2022). Predicates are evaluated at each relevant time index.
Temporal Layer: Implements weighted “always”/“eventually” via smooth, differentiable aggregation functions. Temporal windows are parameterized by endpoints $\tilde{\varphi}$ 0 and realized as soft selection vectors, e.g.,

$\tilde{\varphi}$ 1

where $\tilde{\varphi}$ 2 and $\tilde{\varphi}$ 3 provides smoothing (Li et al., 2022).

Boolean (Gate) Layers: Conjunction/disjunction are modeled as soft, sparse activations. Boolean gate matrices are learned empirically, often with straight-through Bernoulli quantization for discrete selection (Li et al., 2022, Chen et al., 2022).
Output Layer: Aggregates the results via conjunction and disjunction layers (in fixed or learned normal forms) to produce the overall wSTL robustness.
Graph-based Extensions: For graph-temporal logic (wGSTL-NN), neurons compute over spatially connected nodes with per-neighbor and per-interval importance weights, leveraging the graph structure (e.g., region, neighbor) in both the input and operator semantics (Baharisangari et al., 2021).

3. Differentiability and Smooth Relaxations

All key operators in wSTL-NN employ differentiable approximations of non-differentiable logical functions to facilitate gradient-based optimization. Sparse-softmax functions smoothly and soundly approximate min/max within the chosen temporal or logical aggregation, ensuring correct sign preservation for classification (Li et al., 2022).

For instance, the soft maximum over a weighted window is

$\tilde{\varphi}$ 4

where $\tilde{\varphi}$ 5 and $\tilde{\varphi}$ 6, with $\tilde{\varphi}$ 7, $\tilde{\varphi}$ 8 (or $\tilde{\varphi}$ 9 if this is zero) (Li et al., 2022).

Boolean combination weights and structural gating variables are often handled using straight-through estimators or soft quantization to maintain end-to-end differentiability while enforcing network sparsity and interpretability (Chen et al., 2022, Fronda et al., 2022).

4. Training, Optimization, and Sparsification

All weights (predicate coefficients, threshold biases, time window endpoints, clause weights, temporal weights, and logic-gate parameters) are learned via back-propagation on a smooth loss function. Common choices are the exponential classification loss,

$a^\top s(k)\le c$ 0

with $a^\top s(k)\le c$ 1 and $a^\top s(k)\le c$ 2 the wSTL robustness (Li et al., 2022), or mean-squared-error for regression-style training (Chen et al., 2022).

Sparsification methods are deployed post-training or during training:

Thresholding: Small normalized weights are set to zero, either by absolute threshold or by retaining the top- $a^\top s(k)\le c$ 3 largest weights (Yan et al., 2021).
Gate-variable Regularization: Each weight is multiplied by a stochastic gate $a^\top s(k)\le c$ 4; the network learns $a^\top s(k)\le c$ 5 and applies $a^\top s(k)\le c$ 6 penalty for overall sparsity. This approach enables pruning with minimal accuracy loss (Yan et al., 2021).

For structure learning, certain frameworks (e.g., Fronda & Abbas (Fronda et al., 2022)) employ differentiable gating blocks at each possible logic/temporal branch, allowing the network to jointly infer formula structure and parameters. Quantized gating vectors determine the final logical structure after training.

5. Interpretation and Extraction of STL Formulas

wSTL-NN uniquely enables extraction of an explicit, readable STL or wSTL formula after training. The final model directly corresponds to a formula in DNF (or other normal form as imposed by the architecture) whose parameters and structure are dictated by the learned weights and gate selections:

Predicate thresholds define atomic predicates.
Time-interval endpoints and temporal weights define the precise windows for $a^\top s(k)\le c$ 7.
Clause and gate weights specify which conjunctions and disjunctions are present and their importance.

Interpretability is supported both by the ability to read the formula and by the monotonic influence of each weight on formula satisfaction. Compactness is achieved by sparsification and grow-and-prune cycles in the network (Chen et al., 2022, Fronda et al., 2022, Li et al., 2022).

6. Representative Applications and Empirical Evaluation

wSTL-NN has been validated in a variety of temporal logic classification tasks:

Time-Series Classification: Models have achieved high classification accuracy (e.g., 99.46% in UCI occupancy detection) and test performance equivalent to or better than classical ML methods, while providing explicit temporal logic rules (Yan et al., 2021, Li et al., 2022).
Fault Diagnosis: TLNN (a wSTL-NN instantiation) enables interpretable and efficient diagnosis of bearing faults, with quantitative robustness scores and formula readability revealing physical mechanisms (Chen et al., 2022).
Time-Incremental Prediction: For tasks where signals are revealed incrementally—such as urban driving or surveillance—wSTL-NN achieves low incremental misclassification rates and fast runtime compared to non-weighted or brute-force baselines (Aasi et al., 2021).
Neuro-symbolic TSC: NSTSC leverages wSTL-NN within decision trees to solve multiclass classification on biosignals and UCR datasets, often within 1-2% of SOTA purely statistical methods, while maintaining full formula-based interpretability (Yan et al., 2022).
Graph-temporal Learning: wGSTL-NN captures spatial dependencies and regional heterogeneity in tasks such as rainfall and COVID-19 event prediction, yielding human-understandable structure and state-of-the-art accuracy (Baharisangari et al., 2021).

7. Key Advantages, Limitations, and Future Directions

Key advantages of wSTL-NN include:

End-to-end differentiability: All formula parameters, including structure and weights, are learned via standard optimization methods.
Interpretability: The final model is a human-readable STL or wSTL formula, facilitating post-hoc analysis, validation, and deployment in safety-critical domains.
Soundness: Differentiable approximations retain the key semantics of STL operators, ensuring correct label sign for classification (Li et al., 2022).
Efficiency: Compact formulas and weight-sharing reduce the search space compared to template enumeration or combinatorial logic synthesis.

Notable limitations include:

For some variants, the formula structure must be specified a priori; fully structure-learned models are more computationally demanding (Yan et al., 2021, Fronda et al., 2022).
As the joint parameter space is highly non-convex, careful initialization and regularization are necessary to avoid degenerate solutions and excessive model complexity (Fronda et al., 2022).
For graph-based scenarios, scalability with respect to number of nodes and multi-hop dependencies remains a challenge (Baharisangari et al., 2021).

Future research directions include:

Automating structure search and formula architecture selection, potentially via reinforcement learning or hierarchical model selection.
Extending to multi-modal, multi-graph, and higher-order spatial-temporal logics.
Incorporating online, one-class, or positive-only learning paradigms.
Integration of learned wSTL-NN formulas into deep reinforcement learning agents for interpretability and constraint satisfaction.

References

(Li et al., 2022) Learning Signal Temporal Logic through Neural Network for Interpretable Classification
(Yan et al., 2021) Neural Network for Weighted Signal Temporal Logic
(Chen et al., 2022) Interpretable Fault Diagnosis of Rolling Element Bearings with Temporal Logic Neural Network
(Aasi et al., 2021) Time-Incremental Learning from Data Using Temporal Logics
(Fronda et al., 2022) Differentiable Inference of Temporal Logic Formulas
(Baharisangari et al., 2021) Weighted Graph-Based Signal Temporal Logic Inference Using Neural Networks
(Yan et al., 2022) Neuro-symbolic Models for Interpretable Time Series Classification using Temporal Logic Description