Distributed Seasonal Temporal Pattern Mining

Updated 22 November 2025

DSTPM is a distributed framework that efficiently mines frequent seasonal temporal patterns from large time series datasets characterized by periodic or bursty events.
It introduces novel seasonality-sensitive support measures and memory-efficient data structures to overcome the limitations of traditional frequent pattern mining approaches.
Empirical evaluations demonstrate 4–5× runtime reductions and near-linear scalability across clusters, highlighting its practical impact on high-volume time series analytics.

Distributed Seasonal Temporal Pattern Mining (DSTPM) is the first distributed framework for mining frequent seasonal temporal patterns (STPs) from massive time series datasets. STPs are temporally ordered patterns characterized by periodic or bursty re-occurrence, as seen across domains such as IoT sensor flows and epidemiological surveillance. Classic frequent pattern mining approaches are unsuitable for STPs, as measures like support and confidence cannot distinguish uniform from seasonal clustering, and anti-monotonicity does not hold. DSTPM introduces new formal definitions, memory-efficient distributed data structures, and theoretically sound pruning routines, providing significant efficiency and scalability gains over sequential baselines (Ho-Long et al., 15 Nov 2025).

1. Formal Framework and Problem Definitions

A time series $T$ over a domain $\mathcal T$ with granularity $G$ is modeled as a sequence $T = x_1, x_2, \dots, x_N$ , $x_i \in \Sigma_T$ , where $\Sigma_T$ is a finite alphabet. Each symbol $\omega$ forms a temporal event $E = (\omega, \{[t_s, t_e]\})$ marking instances where $T$ equals $\omega$ over interval $[t_s, t_e]$ .

Temporal patterns are formalized using Allen’s interval relations $\Re = \{\rightarrow, \succcurlyeq, \between\}$ (follows, contains, overlaps). A pattern $P$ of length $k$ is defined as: $P = \{ (r_{ij}, E_i, E_j) \mid 1 \le i < j \le k, r_{ij} \in \Re \}$

Classical support, $\mathrm{supp}(P,T)$ , quantifies occurrence frequency, but fails to distinguish concentrated, periodic (seasonal) occurrences from uniform ones. Additionally, anti-monotonicity—if $P$ is infrequent then super-patterns of $P$ are infrequent—does not hold for seasonal counts due to situations where subsets may have fewer detected seasons than supersets.

DSTPM defines a novel, seasonality-sensitive support:

The support set $\mathrm{SUP}^P = \{G_1^P, \dots, G_m^P\}$ is the granular timestamps with occurrences of $P$ .
A near-support set $\mathrm{NearSUP}^P$ is a maximal contiguous subsequence of granules where inter-gap does not exceed a threshold $\mathrm{maxPeriod}$ .
A season is a near-support set with density at least $\mathrm{minDensity}$ .
For $P$ to be frequent seasonal: $\mathrm{seasons}(P) \ge \mathrm{minSeason}$ , and inter-season intervals respect $[\mathrm{dist}_{\min}, \mathrm{dist}_{\max}]$ .

To permit effective pruning, DSTPM proposes the anti-monotonic proxy: $\maxSeason(P) = \frac{|\mathrm{SUP}^P|}{\mathrm{minDensity}}$ ensuring $\maxSeason(P') \ge \maxSeason(P)$ for $P' \subseteq P$ , so one can prune $P$ if $\maxSeason(P) < \mathrm{minSeason}$.

2. Distributed Architecture and Data Partitioning

DSTPM operates atop Spark or other MapReduce engines over a cluster of $n$ worker nodes. The input temporal sequence database $\mathcal D_{\mathrm{SEQ}}$ is partitioned by time-granule or event-symbol so that each worker stores a disjoint fragment of data. This enables linear scalability as each node processes only its local partition for candidate generation, support calculation, and pattern verification (Ho-Long et al., 15 Nov 2025).

The core distributed data structure is the Distributed Hierarchical Lookup Hash ( $\mathrm{DHLH}_k$ ), composed for pattern size $k$ as follows:

Table	Maps From	To
$EH_1$	$\omega$	$\mathrm{SUP}^{(\omega)}$
$GH_1$	$\mathrm{SUP}^{(\omega)}$	Instances of $\omega$
$EH_k$	$(E_1,\dots,E_k)$	$(\mathrm{SUP}^{(E_1,\dots,E_k)}, \{\text{candidates}\})$
$PH_k$	Pattern $P$	$\mathrm{SUP}^P$
$GH_k$	$\mathrm{SUP}^{P}$	Relation-supporting instances

For $k=1$ , $DHLH_k$ reduces to event and granule-instance tables. For $k \ge 2$ , three-level indirection supports both efficient candidate assembly and support set computation.

Each worker maintains only its assigned hash table fragments, reducing overhead and enabling parallel candidate support lookups.

3. Core Algorithms and Pruning Strategies

The DSTPM process operates as follows:

Function DSTPM(D_SEQ, maxPeriod, minDensity, distInterval, minSeason)
    (EH_1, GH_1) = MineSingleEvents(D_SEQ)
    For k = 2 to k_max
        (EH_k, PH_k, GH_k) = MineKPatterns(EH_{k-1}, EH_1)
    Return all { P | seasons(P) >= minSeason }
End

Single-Event Mining: Each $(\omega, [t_s, t_e])$ record emits its event key; a distributed ReduceByKey operation aggregates all instances and calculates $\maxSeason(\omega)$. Events passing the $\maxSeason$ threshold are stored. Survivors are post-filtered on seasonality criteria—a second pass builds near-support sets and filters by density and recurrence.

Pattern Mining (k > 1):

Candidates are built via the Cartesian product $\mathrm{EH}_{k-1} \times \mathrm{EH}_1$ , but only retained if their $\maxSeason$ proxy meets the minimum.
Pattern assembly requires, for each event group, assembling valid relation-sets by joining size- $(k-1)$ frequent patterns with the new event, relying on $GH_2$ for 2-event relation supports.
The support for each candidate $P$ is computed as an intersection of all $\mathrm{SUP}^{(E_i, E_j)}$ .
Pruning occurs as soon as the intermediate $\maxSeason$ drops below $\mathrm{minSeason}$ , eliminating infeasible super-patterns early.

Insertion, lookups, and pruning within $DHLH_k$ are designed for constant or $O(k)$ time.

4. Theoretical Foundations and Complexity Analysis

Let $N_T$ be the total number of granules, $M_1$ the number of unique event symbols, $C_k$ the count of candidate $k$ -event groups, $S_k$ the count of frequent $k$ -event patterns, $s = \mathrm{minSeason}$ , and $n$ the number of workers.

Time Complexity:
- Single-event mining takes $O(N_T \log M_1)$ .
- For $k$ -patterns, each of $C_k$ candidates requires:
- Set-intersection: $O(|\mathrm{SUP}^{(E_i, E_j)}|)$
- $\binom{k}{2}$ lookups in $GH_2$
- Support evaluation: $O(|\mathrm{SUP}^{P}|)$
- Final complexity across all workers:
$T(n) = \sum_{k=1}^{k_{\max}} \frac{1}{n} \left[ O(N_T \log M_1) + \sum_{P \in \text{candidates}_k} (k^2 + |\mathrm{SUP}^P|) \right]$
Space Complexity:
- Per worker $\sim O(\sum_{k=1}^{k_{\max}} [C_k + S_k])$
- Cluster-wide memory is linear in total candidate/support set sizes.
Anti-Monotonicity (Pruning):
- By using the $\maxSeason$ proxy, downward closure is restored: $P' \subseteq P \implies \maxSeason(P') \ge \maxSeason(P)$, so candidates can be eliminated safely.

5. Empirical Evaluation and Scalability

DSTPM has been tested on varied real-world datasets:

RE (Renewable Energy): 1,460 granules, 21 sensors, 102 symbols, monthly seasonality.
SC (Smart City Traffic): $5 \times 10^{5}$ granules, 30 streams, 150 symbols, daily/weekly seasonality.
INF (Influenza): 2,628 daily granules, 6 variables, 32 symbols.

Parameter sweeps included $\mathrm{maxPeriod} \in \{0.2\%, \dots, 1.0\%\}$ , $\mathrm{minDensity} \in \{0.5\%, \dots, 1.5\%\}$ , and $\mathrm{minSeason} \in \{4,8,12,16,20\}$ .

DSTPM was evaluated against an adapted sequential PS-growth (APS) baseline that mines itemsets and then assembles relations. Metrics included runtime (s), memory (MB), and speedup ( $\mathrm{APS_{time}}/\mathrm{DSTPM_{time}}$ ):

Dataset	DSTPM (time, mem)	APS (time, mem)	Speedup
RE	1,526 s, 5,500 MB	6,059 s, 11,595 MB	3.97×
SC	1,332 s, 3,832 MB	5,501 s, 8,183 MB	4.13×
INF	1,114 s, 3,210 MB	4,754 s, 7,102 MB	4.27×

DSTPM demonstrates 4–5× reduction in runtime and 2.3× reduction in peak memory on average.

Scalability experiments on synthetic datasets ( $10^6$ granules per series) show nearly linear speedup up to at least 15–20 cluster nodes, with 20 partitions effectively utilizing a 16-node cluster and providing a 12× runtime reduction over single-node execution.

6. Significance and Impact

DSTPM resolves central bottlenecks in large-scale seasonal temporal pattern mining by introducing a distributed, partitioned, and hash-based infrastructure, along with a mathematically justified, anti-monotonic, prunable seasonality proxy. This allows previously intractable problem sizes to be handled efficiently, both in memory and computation time. The framework supports flexible deployment on commodity MapReduce platforms and can handle the exponential combinatorial explosion typical in temporal pattern mining as dataset size and event vocabulary grow. The empirical demonstration of nearly linear scaling and significant resource reduction suggests broad applicability for domains reliant on high-bandwidth, seasonality-driven time series analytics (Ho-Long et al., 15 Nov 2025).

PDF Markdown Chat (Pro)

References (1)

Distributed Seasonal Temporal Pattern Mining (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Distributed Seasonal Temporal Pattern Mining (DSTPM).