Papers
Topics
Authors
Recent
2000 character limit reached

Selective Temporal Hamming Distance (STH)

Updated 8 December 2025
  • Selective Temporal Hamming Distance (STH) is a metric for discrete event systems that compares state-transition event timeseries while emphasizing specific states and preserving temporal details.
  • It computes similarity by weighting intervals where at least one series is in a state of interest, thereby generalizing classical Hamming and Jaccard metrics to continuous time.
  • Empirical evaluations demonstrate that STH achieves significant speedups and clearer clustering in real-world scenarios such as weather event analysis and sleep-stage annotation.

Selective Temporal Hamming Distance (STH) is a metric designed to compare time series generated by discrete event systems, with a focus on state transitions and event timing. Unlike standard methods that either reduce temporal information via costly resampling or treat event sequences without accounting for state durations, STH operates directly on state-transition event timeseries (STE-ts) and enables selective emphasis on subsets of states while avoiding distortion and inefficiency.

1. Foundations and Notation

STH is formulated for discrete event systems (DES) where the state space S={s1,…,s∣S∣}S = \{s_1, \ldots, s_{|S|}\} defines all possible system states, and T⊆S×ST \subseteq S \times S enumerates allowed state-changing transitions. A state-transition event timeseries (STE-ts) is represented as S=((t0,o0),(t1,o1),…,(tn,on))S = ((t_0, o_0), (t_1, o_1), \ldots, (t_n, o_n)), where each ok∈So_k \in S is maintained over the interval [tk,tk+1)[t_k, t_{k+1}) and transitions are instantaneous, with the condition ok≠ok+1o_k \ne o_{k+1}. When comparing two STE-ts Si,SjS_i, S_j spanning [t0,tend][t_0, t_{end}], the merged set of change-points produces a partition of consecutive, disjoint intervals Jij={I1,I2,...,IM}J_{ij} = \{I_1, I_2, ..., I_M\} with Ik=[τk,τk+1)I_k = [\tau_k, \tau_{k+1}), duration Δk\Delta_k, and states oki,okjo^i_k, o^j_k active over IkI_k for Si,SjS_i, S_j respectively.

States are further partitioned as follows: SIS_I ("states of interest"), SOS_O ("other" states), and SES_E ("excluded"/ambiguous states). A state-similarity function sim:S×S→[0,1]\text{sim}: S \times S \rightarrow [0,1] (defaulting to identity) is used for interval-wise comparison.

2. Formal Definition and Properties

2.1 STH Similarity and Distance

STH restricts attention to intervals where at least one series is in SIS_I and neither is in SES_E. For each interval IkI_k:

  • wkden=1w^{den}_k = 1 if oki∉SEo^i_k \notin S_E and okj∉SEo^j_k \notin S_E and (oki∈SI∨okj∈SI)(o^i_k \in S_I \vee o^j_k \in S_I), else $0$.
  • wknum=sim(oki,okj)â‹…wkdenw^{num}_k = \text{sim}(o^i_k, o^j_k) \cdot w^{den}_k.

Similarity is calculated as:

STHSI,SO(Si,Sj)=∑k=1Mwknum⋅Δk∑k=1Mwkden⋅Δk\text{STH}_{S_I, S_O}(S_i, S_j) = \frac{\sum_{k=1}^M w^{num}_k \cdot \Delta_k}{\sum_{k=1}^M w^{den}_k \cdot \Delta_k}

If ∑wkdenΔk=0\sum w^{den}_k \Delta_k = 0, STH is undefined or set to $0$ by application-dependent convention.

The associated distance is:

STHDSI,SO(Si,Sj)=1−STHSI,SO(Si,Sj)\text{STHD}_{S_I, S_O}(S_i, S_j) = 1 - \text{STH}_{S_I, S_O}(S_i, S_j)

2.2 Relationship to Hamming and Jaccard Metrics

Setting SI=SS_I = S and SE=∅S_E = \emptyset, STH reduces to the normalized temporal Hamming distance (nTHD). In binary state systems with SI={1}S_I = \{1\}, SO={0}S_O = \{0\}, SE=∅S_E = \emptyset and sim\text{sim} as the identity, STH coincides with temporal Jaccard similarity/distance, generalizing the static Jaccard index to continuous time (Marié et al., 1 Dec 2025).

3. Algorithmic Structure and Computational Complexity

The algorithm for STH iterates over all intervals induced by the union of change-points in Si,SjS_i, S_j, computing the numerator and denominator accumulators according to the definitions above. Key steps:

  • Merge change-points, sort, and produce Ï„[0..M]\tau[0..M].
  • For k=0…M−2k = 0 \ldots M-2, determine okio^i_k, okjo^j_k for current interval, update numerator/denominator subject to SI,SES_I, S_E filters.

This yields an overall time complexity of O(ni+nj)O(n_i + n_j), where ni,njn_i, n_j are the event counts in Si,SjS_i, S_j. In comparison, uniform resampling methods operate at O(T⋅F+ni+nj)O(T \cdot F + n_i + n_j) (TT = span, FF = frequency), incurring a rate-dependent precision/speed trade-off and often distortion (Marié et al., 1 Dec 2025).

4. Theoretical Properties

STH shows desirable metric qualities under mild conditions (SE=∅S_E = \emptyset, ∣SO∣≤1|S_O| \leq 1):

  • Non-negativity: STHD≥0\text{STHD} \geq 0 since STH≤1\text{STH} \leq 1.
  • Symmetry: STHD(Si,Sj)=STHD(Sj,Si)\text{STHD}(S_i,S_j) = \text{STHD}(S_j,S_i).
  • Identity of indiscernibles: STHD=0  ⟺  \text{STHD} = 0 \iff all qualifying intervals have equal states.
  • Triangle inequality: Satisfied under the stated conditions (proof in [36]).

STH generalizes classical Hamming and Jaccard: with SI=SS_I=S (all states of interest), STHD is the continuous-time analogue of normalized Hamming distance in the limit of infinitesimal resampling; for binary systems and restricted SIS_I, it recovers temporal Jaccard distance.

5. Practical Application and Empirical Validation

Empirical results demonstrate substantial advantages of STH for pattern mining and clustering in large-scale, non-uniform time series data from diverse domains (Marié et al., 1 Dec 2025).

5.1 Computational Performance

STH achieves speedups from 3.5×3.5\times to 14×14\times over 5-minute resampled normalized Hamming, and up to 4950×4950\times with decreasing resampling periods (30-day random binary series). This is directly attributed to the linear complexity and avoidance of temporal discretization.

5.2 Avoidance of Temporal Distortion

On periodic series with phase shift, STHD returns the exact normalized distance (e.g., $0.6667$ for $1/3$ period shift), whereas resampled metrics may exhibit severe bias, including failure to detect distortion at certain rates.

5.3 Clustering and State Selection

Applications include weather event clustering (US dataset, 8 states) and sleep-stage annotation (W,1,2,3,R,?,E):

  • STH with SI=SS_I=S yields large, undifferentiated clusters.
  • Excluding "Normal" clarifies geographical variation in abnormal events.
  • Focusing SI={Snow,Hail}S_I=\{\text{Snow}, \text{Hail}\} isolates winter-heavy regions.
  • For sleep-stage, STH enables cleaner patient clustering by ignoring ambiguous intervals (SE={?,E}S_E=\{?, E\}), outperforming temporal Jaccard in handling missing data.

6. Example Computation

Consider two STE-ts over [0,5][0,5]:

  • SiS_i: (0,A)→t=1B→t=3A(0,A) \xrightarrow{t=1} B \xrightarrow{t=3} A, resulting intervals [0,1):A; [1,3):B; [3,5):A[0,1):A;\ [1,3):B;\ [3,5):A.
  • SjS_j: (0,A)→t=2C→t=4A(0,A) \xrightarrow{t=2} C \xrightarrow{t=4} A, intervals [0,2):A; [2,4):C; [4,5):A[0,2):A;\ [2,4):C;\ [4,5):A.

Merged change-points produce intervals: [0,1),[1,2),[2,3),[3,4),[4,5)[0,1), [1,2), [2,3), [3,4), [4,5).

Case A (SI={A,B,C},SE=∅S_I = \{A,B,C\}, S_E = \emptyset):

  • Matches on [0,1)[0,1) and [4,5)[4,5) (both in A); mismatch elsewhere.
  • STH = $2/5 = 0.4$, STHD = $0.6$.

Case B (SI={A},SE=∅S_I = \{A\}, S_E = \emptyset):

  • Only intervals with AA in at least one series considered: [0,1),[1,2),[3,4),[4,5)[0,1), [1,2), [3,4), [4,5).
  • STH = $2/4 = 0.5$, STHD = $0.5$.

7. Utility and Broader Significance

Selective Temporal Hamming Distance supports mathematically rigorous comparison of state-transition event timeseries without resampling, integrating full state durations and selective focus on relevant states or exclusion criteria. Generalization to classical Hamming and Jaccard metrics in the continuous-time domain and proven metric properties enable direct use in clustering, kernel methods, and scalable nearest-neighbor search. Robust empirical validation confirms improved precision and efficiency, facilitating large-scale analysis across domains such as weather events and clinical annotations (Marié et al., 1 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Selective Temporal Hamming Distance (STH).