Papers
Topics
Authors
Recent
2000 character limit reached

Distributed Seasonal Temporal Pattern Mining

Updated 22 November 2025
  • DSTPM is a distributed framework that efficiently mines frequent seasonal temporal patterns from large time series datasets characterized by periodic or bursty events.
  • It introduces novel seasonality-sensitive support measures and memory-efficient data structures to overcome the limitations of traditional frequent pattern mining approaches.
  • Empirical evaluations demonstrate 4–5Ɨ runtime reductions and near-linear scalability across clusters, highlighting its practical impact on high-volume time series analytics.

Distributed Seasonal Temporal Pattern Mining (DSTPM) is the first distributed framework for mining frequent seasonal temporal patterns (STPs) from massive time series datasets. STPs are temporally ordered patterns characterized by periodic or bursty re-occurrence, as seen across domains such as IoT sensor flows and epidemiological surveillance. Classic frequent pattern mining approaches are unsuitable for STPs, as measures like support and confidence cannot distinguish uniform from seasonal clustering, and anti-monotonicity does not hold. DSTPM introduces new formal definitions, memory-efficient distributed data structures, and theoretically sound pruning routines, providing significant efficiency and scalability gains over sequential baselines (Ho-Long et al., 15 Nov 2025).

1. Formal Framework and Problem Definitions

A time series TT over a domain T\mathcal T with granularity GG is modeled as a sequence T=x1,x2,…,xNT = x_1, x_2, \dots, x_N, xi∈ΣTx_i \in \Sigma_T, where Ī£T\Sigma_T is a finite alphabet. Each symbol ω\omega forms a temporal event E=(ω,{[ts,te]})E = (\omega, \{[t_s, t_e]\}) marking instances where TT equals ω\omega over interval [ts,te][t_s, t_e].

Temporal patterns are formalized using Allen’s interval relations ā„œ={→,≽,≬}\Re = \{\rightarrow, \succcurlyeq, \between\} (follows, contains, overlaps). A pattern PP of length kk is defined as: P={(rij,Ei,Ej)∣1≤i<j≤k,rijāˆˆā„œ}P = \{ (r_{ij}, E_i, E_j) \mid 1 \le i < j \le k, r_{ij} \in \Re \}

Classical support, supp(P,T)\mathrm{supp}(P,T), quantifies occurrence frequency, but fails to distinguish concentrated, periodic (seasonal) occurrences from uniform ones. Additionally, anti-monotonicity—if PP is infrequent then super-patterns of PP are infrequent—does not hold for seasonal counts due to situations where subsets may have fewer detected seasons than supersets.

DSTPM defines a novel, seasonality-sensitive support:

  • The support set SUPP={G1P,…,GmP}\mathrm{SUP}^P = \{G_1^P, \dots, G_m^P\} is the granular timestamps with occurrences of PP.
  • A near-support set NearSUPP\mathrm{NearSUP}^P is a maximal contiguous subsequence of granules where inter-gap does not exceed a threshold maxPeriod\mathrm{maxPeriod}.
  • A season is a near-support set with density at least minDensity\mathrm{minDensity}.
  • For PP to be frequent seasonal: seasons(P)≄minSeason\mathrm{seasons}(P) \ge \mathrm{minSeason}, and inter-season intervals respect [distmin⁔,distmax⁔][\mathrm{dist}_{\min}, \mathrm{dist}_{\max}].

To permit effective pruning, DSTPM proposes the anti-monotonic proxy: $\maxSeason(P) = \frac{|\mathrm{SUP}^P|}{\mathrm{minDensity}}$ ensuring $\maxSeason(P') \ge \maxSeason(P)$ for Pā€²āŠ†PP' \subseteq P, so one can prune PP if $\maxSeason(P) < \mathrm{minSeason}$.

2. Distributed Architecture and Data Partitioning

DSTPM operates atop Spark or other MapReduce engines over a cluster of nn worker nodes. The input temporal sequence database DSEQ\mathcal D_{\mathrm{SEQ}} is partitioned by time-granule or event-symbol so that each worker stores a disjoint fragment of data. This enables linear scalability as each node processes only its local partition for candidate generation, support calculation, and pattern verification (Ho-Long et al., 15 Nov 2025).

The core distributed data structure is the Distributed Hierarchical Lookup Hash (DHLHk\mathrm{DHLH}_k), composed for pattern size kk as follows:

Table Maps From To
EH1EH_1 ω\omega SUP(ω)\mathrm{SUP}^{(\omega)}
GH1GH_1 SUP(ω)\mathrm{SUP}^{(\omega)} Instances of ω\omega
EHkEH_k (E1,…,Ek)(E_1,\dots,E_k) (SUP(E1,…,Ek),{candidates})(\mathrm{SUP}^{(E_1,\dots,E_k)}, \{\text{candidates}\})
PHkPH_k Pattern PP SUPP\mathrm{SUP}^P
GHkGH_k SUPP\mathrm{SUP}^{P} Relation-supporting instances

For k=1k=1, DHLHkDHLH_k reduces to event and granule-instance tables. For k≄2k \ge 2, three-level indirection supports both efficient candidate assembly and support set computation.

Each worker maintains only its assigned hash table fragments, reducing overhead and enabling parallel candidate support lookups.

3. Core Algorithms and Pruning Strategies

The DSTPM process operates as follows:

1
2
3
4
5
6
Function DSTPM(D_SEQ, maxPeriod, minDensity, distInterval, minSeason)
    (EH_1, GH_1) = MineSingleEvents(D_SEQ)
    For k = 2 to k_max
        (EH_k, PH_k, GH_k) = MineKPatterns(EH_{k-1}, EH_1)
    Return all { P | seasons(P) >= minSeason }
End

Single-Event Mining: Each (ω,[ts,te])(\omega, [t_s, t_e]) record emits its event key; a distributed ReduceByKey operation aggregates all instances and calculates $\maxSeason(\omega)$. Events passing the $\maxSeason$ threshold are stored. Survivors are post-filtered on seasonality criteria—a second pass builds near-support sets and filters by density and recurrence.

Pattern Mining (k > 1):

  • Candidates are built via the Cartesian product EHkāˆ’1ƗEH1\mathrm{EH}_{k-1} \times \mathrm{EH}_1, but only retained if their $\maxSeason$ proxy meets the minimum.
  • Pattern assembly requires, for each event group, assembling valid relation-sets by joining size-(kāˆ’1)(k-1) frequent patterns with the new event, relying on GH2GH_2 for 2-event relation supports.
  • The support for each candidate PP is computed as an intersection of all SUP(Ei,Ej)\mathrm{SUP}^{(E_i, E_j)}.
  • Pruning occurs as soon as the intermediate $\maxSeason$ drops below minSeason\mathrm{minSeason}, eliminating infeasible super-patterns early.

Insertion, lookups, and pruning within DHLHkDHLH_k are designed for constant or O(k)O(k) time.

4. Theoretical Foundations and Complexity Analysis

Let NTN_T be the total number of granules, M1M_1 the number of unique event symbols, CkC_k the count of candidate kk-event groups, SkS_k the count of frequent kk-event patterns, s=minSeasons = \mathrm{minSeason}, and nn the number of workers.

  • Time Complexity:
    • Single-event mining takes O(NTlog⁔M1)O(N_T \log M_1).
    • For kk-patterns, each of CkC_k candidates requires:
    • Set-intersection: O(∣SUP(Ei,Ej)∣)O(|\mathrm{SUP}^{(E_i, E_j)}|)
    • (k2)\binom{k}{2} lookups in GH2GH_2
    • Support evaluation: O(∣SUPP∣)O(|\mathrm{SUP}^{P}|)
    • Final complexity across all workers:

    T(n)=āˆ‘k=1kmax⁔1n[O(NTlog⁔M1)+āˆ‘P∈candidatesk(k2+∣SUPP∣)]T(n) = \sum_{k=1}^{k_{\max}} \frac{1}{n} \left[ O(N_T \log M_1) + \sum_{P \in \text{candidates}_k} (k^2 + |\mathrm{SUP}^P|) \right]

  • Space Complexity:

    • Per worker ∼O(āˆ‘k=1kmax⁔[Ck+Sk])\sim O(\sum_{k=1}^{k_{\max}} [C_k + S_k])
    • Cluster-wide memory is linear in total candidate/support set sizes.
  • Anti-Monotonicity (Pruning):
    • By using the $\maxSeason$ proxy, downward closure is restored: $P' \subseteq P \implies \maxSeason(P') \ge \maxSeason(P)$, so candidates can be eliminated safely.

5. Empirical Evaluation and Scalability

DSTPM has been tested on varied real-world datasets:

  • RE (Renewable Energy): 1,460 granules, 21 sensors, 102 symbols, monthly seasonality.
  • SC (Smart City Traffic): 5Ɨ1055 \times 10^{5} granules, 30 streams, 150 symbols, daily/weekly seasonality.
  • INF (Influenza): 2,628 daily granules, 6 variables, 32 symbols.

Parameter sweeps included maxPeriod∈{0.2%,…,1.0%}\mathrm{maxPeriod} \in \{0.2\%, \dots, 1.0\%\}, minDensity∈{0.5%,…,1.5%}\mathrm{minDensity} \in \{0.5\%, \dots, 1.5\%\}, and minSeason∈{4,8,12,16,20}\mathrm{minSeason} \in \{4,8,12,16,20\}.

DSTPM was evaluated against an adapted sequential PS-growth (APS) baseline that mines itemsets and then assembles relations. Metrics included runtime (s), memory (MB), and speedup (APStime/DSTPMtime\mathrm{APS_{time}}/\mathrm{DSTPM_{time}}):

Dataset DSTPM (time, mem) APS (time, mem) Speedup
RE 1,526 s, 5,500 MB 6,059 s, 11,595 MB 3.97Ɨ
SC 1,332 s, 3,832 MB 5,501 s, 8,183 MB 4.13Ɨ
INF 1,114 s, 3,210 MB 4,754 s, 7,102 MB 4.27Ɨ

DSTPM demonstrates 4–5Ɨ reduction in runtime and 2.3Ɨ reduction in peak memory on average.

Scalability experiments on synthetic datasets (10610^6 granules per series) show nearly linear speedup up to at least 15–20 cluster nodes, with 20 partitions effectively utilizing a 16-node cluster and providing a 12Ɨ runtime reduction over single-node execution.

6. Significance and Impact

DSTPM resolves central bottlenecks in large-scale seasonal temporal pattern mining by introducing a distributed, partitioned, and hash-based infrastructure, along with a mathematically justified, anti-monotonic, prunable seasonality proxy. This allows previously intractable problem sizes to be handled efficiently, both in memory and computation time. The framework supports flexible deployment on commodity MapReduce platforms and can handle the exponential combinatorial explosion typical in temporal pattern mining as dataset size and event vocabulary grow. The empirical demonstration of nearly linear scaling and significant resource reduction suggests broad applicability for domains reliant on high-bandwidth, seasonality-driven time series analytics (Ho-Long et al., 15 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Distributed Seasonal Temporal Pattern Mining (DSTPM).