Selective Temporal Hamming Distance (STH)
- Selective Temporal Hamming Distance (STH) is a metric for discrete event systems that compares state-transition event timeseries while emphasizing specific states and preserving temporal details.
- It computes similarity by weighting intervals where at least one series is in a state of interest, thereby generalizing classical Hamming and Jaccard metrics to continuous time.
- Empirical evaluations demonstrate that STH achieves significant speedups and clearer clustering in real-world scenarios such as weather event analysis and sleep-stage annotation.
Selective Temporal Hamming Distance (STH) is a metric designed to compare time series generated by discrete event systems, with a focus on state transitions and event timing. Unlike standard methods that either reduce temporal information via costly resampling or treat event sequences without accounting for state durations, STH operates directly on state-transition event timeseries (STE-ts) and enables selective emphasis on subsets of states while avoiding distortion and inefficiency.
1. Foundations and Notation
STH is formulated for discrete event systems (DES) where the state space defines all possible system states, and enumerates allowed state-changing transitions. A state-transition event timeseries (STE-ts) is represented as , where each is maintained over the interval and transitions are instantaneous, with the condition . When comparing two STE-ts spanning , the merged set of change-points produces a partition of consecutive, disjoint intervals with , duration , and states active over for respectively.
States are further partitioned as follows: ("states of interest"), ("other" states), and ("excluded"/ambiguous states). A state-similarity function (defaulting to identity) is used for interval-wise comparison.
2. Formal Definition and Properties
2.1 STH Similarity and Distance
STH restricts attention to intervals where at least one series is in and neither is in . For each interval :
- if and and , else $0$.
- .
Similarity is calculated as:
If , STH is undefined or set to $0$ by application-dependent convention.
The associated distance is:
2.2 Relationship to Hamming and Jaccard Metrics
Setting and , STH reduces to the normalized temporal Hamming distance (nTHD). In binary state systems with , , and as the identity, STH coincides with temporal Jaccard similarity/distance, generalizing the static Jaccard index to continuous time (Marié et al., 1 Dec 2025).
3. Algorithmic Structure and Computational Complexity
The algorithm for STH iterates over all intervals induced by the union of change-points in , computing the numerator and denominator accumulators according to the definitions above. Key steps:
- Merge change-points, sort, and produce .
- For , determine , for current interval, update numerator/denominator subject to filters.
This yields an overall time complexity of , where are the event counts in . In comparison, uniform resampling methods operate at ( = span, = frequency), incurring a rate-dependent precision/speed trade-off and often distortion (Marié et al., 1 Dec 2025).
4. Theoretical Properties
STH shows desirable metric qualities under mild conditions (, ):
- Non-negativity: since .
- Symmetry: .
- Identity of indiscernibles: all qualifying intervals have equal states.
- Triangle inequality: Satisfied under the stated conditions (proof in [36]).
STH generalizes classical Hamming and Jaccard: with (all states of interest), STHD is the continuous-time analogue of normalized Hamming distance in the limit of infinitesimal resampling; for binary systems and restricted , it recovers temporal Jaccard distance.
5. Practical Application and Empirical Validation
Empirical results demonstrate substantial advantages of STH for pattern mining and clustering in large-scale, non-uniform time series data from diverse domains (Marié et al., 1 Dec 2025).
5.1 Computational Performance
STH achieves speedups from to over 5-minute resampled normalized Hamming, and up to with decreasing resampling periods (30-day random binary series). This is directly attributed to the linear complexity and avoidance of temporal discretization.
5.2 Avoidance of Temporal Distortion
On periodic series with phase shift, STHD returns the exact normalized distance (e.g., $0.6667$ for $1/3$ period shift), whereas resampled metrics may exhibit severe bias, including failure to detect distortion at certain rates.
5.3 Clustering and State Selection
Applications include weather event clustering (US dataset, 8 states) and sleep-stage annotation (W,1,2,3,R,?,E):
- STH with yields large, undifferentiated clusters.
- Excluding "Normal" clarifies geographical variation in abnormal events.
- Focusing isolates winter-heavy regions.
- For sleep-stage, STH enables cleaner patient clustering by ignoring ambiguous intervals (), outperforming temporal Jaccard in handling missing data.
6. Example Computation
Consider two STE-ts over :
- : , resulting intervals .
- : , intervals .
Merged change-points produce intervals: .
Case A ():
- Matches on and (both in A); mismatch elsewhere.
- STH = $2/5 = 0.4$, STHD = $0.6$.
Case B ():
- Only intervals with in at least one series considered: .
- STH = $2/4 = 0.5$, STHD = $0.5$.
7. Utility and Broader Significance
Selective Temporal Hamming Distance supports mathematically rigorous comparison of state-transition event timeseries without resampling, integrating full state durations and selective focus on relevant states or exclusion criteria. Generalization to classical Hamming and Jaccard metrics in the continuous-time domain and proven metric properties enable direct use in clustering, kernel methods, and scalable nearest-neighbor search. Robust empirical validation confirms improved precision and efficiency, facilitating large-scale analysis across domains such as weather events and clinical annotations (Marié et al., 1 Dec 2025).