Hidden Semi-Markov Models (HSMMs)

Updated 8 June 2026

Hidden Semi-Markov Models (HSMMs) are probabilistic models that explicitly specify state duration, overcoming the geometric constraints of traditional HMMs.
They employ advanced inference algorithms, such as generalized forward-backward recursions and spectral methods, to efficiently handle explicit duration modeling.
HSMMs have broad applications ranging from time series segmentation and survival analysis to regime-switching and robotics, offering practical insights into non-Markovian dynamics.

A hidden semi-Markov model (HSMM) is a probabilistic model for sequential data in which the latent process is a semi-Markov chain: the process remains in each hidden state for a random—explicitly modeled—duration before transitioning, emitting possibly complex, state-dependent observations. This explicit duration modeling lifts the geometric-duration constraint of classical hidden Markov models (HMMs), yielding substantially greater flexibility for modeling systems with history-dependent state persistence. Since their introduction, HSMMs have become foundational in time series segmentation, regime-switching modeling, survival analysis, structured sequence learning, and have seen technical developments in inference, representation, covariate conditioning, and nonparametric modeling.

1. Formal Model Structure and Duration Modeling

In an HSMM, the latent process $S_t$ (discrete time) or $X(t)$ (continuous time) resides in some state $i\in\{1,\dots,K\}$ for a state-dependent random "dwell time" or sojourn, whose distribution is specified a priori or learned from data. At the end of the sojourn, the chain makes an instantaneous transition to a new state $j$ (with $j\ne i$ ), with transition probabilities independent of the sojourn duration unless generalized. The key distinction from an HMM is that the dwell-time distribution $d_i(\cdot)$ for each state $i$ is arbitrary—not constrained to be geometric (discrete time) or exponential (continuous time). This non-Markovianity introduces additional dependencies in the latent process, so that the transition probability depends not only on the current state but how long the chain has occupied it.

The state-dwell distribution is often specified parametrically: e.g., shifted Poisson, negative-binomial, Weibull, or nonparametric (as in the empirical or HDP-HSMM setting). In covariate-extended (nonhomogeneous) HSMMs, the hazard or duration law may be linked to time-varying covariates, for instance via proportional hazards:

$g\bigl(q_i(d|X_t)\bigr) = \eta_{0,i}(d) + \beta_i^{\top} X_t,$

where $g$ is typically the complementary log-log link, $\eta_{0,i}(d)$ is a baseline, and $X(t)$ 0 a regression vector (Lagona et al., 2023).

The emission process is conditionally independent noise given the current regime, often allowing structured or high-dimensional likelihoods, such as VAR processes, toroidal densities, or mixtures. The complete-data likelihood for a sequence $X(t)$ 1 factors into initial, transition/dwell, and emission components.

2. Inference Algorithms and Computational Representation

Likelihood-based inference in HSMMs requires summation over all possible segmentations aligned with the observable data, with forward-backward procedures generalized to handle explicit duration modeling. The forward (and backward) variables must index both the latent state and the current sojourn's age or remaining duration. For discrete time, these recursions are quadratic in number of states $X(t)$ 2 and linear in support of the duration distribution $X(t)$ 3:

$X(t)$ 4

with analogous backward recursions (Narimatsu et al., 2016).

The EM algorithm is the canonical approach for frequentist learning:

E-step: compute expectations over the latent (augmented) state/duration indicators using the generalized forward-backward recursions.
M-step: maximize expected complete-data log-likelihood; transition probabilities, duration distribution parameters, and emission parameters typically remain in closed form or are solved by weighted regression and density estimation, e.g., via GLMs for proportional hazards models (Lagona et al., 2023), or mixture models for emissions.

For models with covariate-dependent duration laws, the M-step for hazard regression is a weighted binomial log-likelihood, enabling use of generalized linear models, possibly with complementary log-log link (Lagona et al., 2023). For the Bayesian paradigm, fully specified hierarchical models allow Gibbs or Metropolis-within-Gibbs posteriors, with latent state sequence and durations sampled via block updates or by a dynamic-programming recoding and Viterbi-like recursions (Rojas-Salazar et al., 2020, Rojas-Salazar et al., 2021).

In high-frequency data scenarios exhibiting residual dependence, subsampling in the emission/observation parameter update step has been shown to effectively control estimator variance (Rojas-Salazar et al., 2021).

3. Model Extensions: Nonhomogeneous, Interval, and Nonparametric HSMMs

HSMMs have been generalized along several axes:

Covariate-modulated (nonhomogeneous) HSMMs: Duration distribution parameters (hazards or rates) can depend on covariates, e.g., via proportional hazard or log-linear models. Both parametric and nonparametric specifications exist, with efficient EM or Bayesian inference adapted for covariate-linked hazards (Lagona et al., 2023, Rojas-Salazar et al., 2020, Rojas-Salazar et al., 2021, Koslik, 2024).
Interval modeling: The interval-state HSMM (IS-HSMM) introduces explicit "interval" states for periods of inactivity or missing data, while the interval-length probability HSMM (ILP-HSMM) separately models the distribution of gap lengths, allowing richer representation of silence or inactivity (Narimatsu et al., 2016).
Bayesian nonparametric HSMMs: The HDP-HSMM allows an unbounded state space with a hierarchical Dirichlet process prior over transition measures, integrating explicit-duration modeling with nonparametric complexity control and MCMC algorithms for efficient posterior inference (Johnson et al., 2012).
Higher-order HSMMs: Extensions in which the latent sequence is a higher-order Markov process; permits multi-step dependencies in state transitions and complex, non-renewal patterns in hidden labels (Liao et al., 2020).

4. Applications and Empirical Results

HSMMs have been applied to diverse empirical problems:

Physical and environmental time series: Covariate-modulated duration modeling enables detailed segmentation of physiological or environmental signals, such as heart rate of athletes and phycocyanin concentration in lakes, with substantial improvements in interpretability and detection of external control factors (Rojas-Salazar et al., 2020, Rojas-Salazar et al., 2021).
Movement and behavior analysis: Explicit-duration HSMMs and their autoregressive variants improve segmentation accuracy in human gesture modeling, animal behavior classification, and wearable device streams, particularly when dwell time distributions are non-geometric or emissions are highly overlapping (Hadj-Amar et al., 2023, Ruiz-Suarez et al., 2021).
Speech and sequence segmentation: In synthetic and real datasets including Morse code clustering and speaker diarization, HDP-HSMMs have shown substantial gains in correct state recovery and segmentation by using explicit dwell-time modeling (Johnson et al., 2012).
Regime-switching in financial time series: HSMMs with AR emission structures and flexible dwell times outperform HMMs for volatility clustering and regime detection, with particle filter-based sequential Bayesian algorithms enabling tractable inference in long sequences (Aschermayr et al., 2023).
Robotics and imitation learning: HSMMs support segmentation of complex manipulation demonstrations, with task-parameterized and latent-space variants encoding geometric and subspace invariances central to few-shot robot learning (Tanwani et al., 2018).

5. Computational Strategies and Algorithmic Considerations

The primary computational bottleneck in HSMMs is the explicit duration indexing in forward-backward and Viterbi-type recursions, which introduces an $X(t)$ 5 factor in both run time and memory scaling. Practical implementations use:

State-space expansion: Truncating the maximum dwell time $X(t)$ 6 sufficiently above the empirical maximum ensures accurate marginalization while controlling computational cost. Representing the semi-Markov chain as a Markov chain on an augmented (micro-state) space enables efficient algorithms, with block-matrix structure exploited for sparse or structured representation (Lagona et al., 2023, Hadj-Amar et al., 2020, Koslik, 2024).
Spectral inference: Moment-based spectral algorithms can recover likelihoods and estimate model parameters efficiently in the fully discrete case by leveraging joint counts and low-rank tensor structures, with computational complexity logarithmic in maximum state persistence (Melnyk et al., 2014).
Sequential Monte Carlo: For Bayesian inference with long or streaming data, particle filter-based SMC and Particle Gibbs methods provide unbiased inference and are often orders of magnitude faster than full-recursion-based approaches, at a cost of Monte Carlo error (Aschermayr et al., 2023).

In continuous time, semi-Markov processes generalize the discrete formulation, and recent work extends forward-backward and Viterbi methods to fully continuous settings via integro-differential equations and probability current representations (Engelmann et al., 2022).

6. Theoretical Properties, Model Identifiability, and Information Complexity

The statistical structure of HSMMs enables modeling of non-memoryless dwell times, but introduces identifiability challenges relating to the joint estimation of emission, transition, and duration distributions. Bayesian regularization, priors, and non-local constraints (such as those enforcing separation between geometric and non-geometric dwell families) are central to robust inference and model selection (Hadj-Amar et al., 2023, Hadj-Amar et al., 2020).

The information-theoretic properties of HSMMs have been formalized in the context of causal-state representations: the minimal maximally predictive process representations ("ε-machines") for unifilar HSMMs yield closed-form expressions for entropy rate, statistical complexity, excess entropy, and differential information anatomy rates, elucidating the memory and predictability structure of continuous-time and renewal HSMMs (Marzen et al., 2016).

Model selection—e.g., via Bayesian marginal likelihood/bridge sampling or AIC/BIC—can reliably distinguish duration families and allocate necessary state complexity (Hadj-Amar et al., 2020, Johnson et al., 2012, Koslik, 2024).

7. Extensions and Practical Modeling Considerations

Numerous structural and algorithmic innovations have been proposed within the HSMM framework:

Nonhomogeneous/covariate-dependent dwell modeling (Lagona et al., 2023, Rojas-Salazar et al., 2021, Koslik, 2024),
Interval modeling and state gaps (Narimatsu et al., 2016),
Bayesian nonparametrics/HDPMMs (Johnson et al., 2012),
Higher-order transition chains and latent class decompositions (Liao et al., 2020),
Online/streaming algorithms and sequential Bayesian estimation (Aschermayr et al., 2023),
Hybrid Markov/semi-Markov topologies for complex state networks (Amini et al., 2021),
Nonparametric emission modeling (e.g., B-splines) and penalized regressions (Amini et al., 2021).

These advances enable HSMMs to accommodate absorbing/macro-states, irregular sampling, informative censoring, and regime switching in domains ranging from environmental forecasting and robotics to health prognostics and energy systems.

References

(Lagona et al., 2023): Nonhomogeneous hidden semi-Markov models for toroidal data
(Narimatsu et al., 2016): State Duration and Interval Modeling in Hidden Semi-Markov Model for Sequential Data Analysis
(Hadj-Amar et al., 2023): Bayesian Sparse Vector Autoregressive Switching Models with Application to Human Gesture Phase Segmentation
(Rojas-Salazar et al., 2021, Rojas-Salazar et al., 2020): Covariate-dependent duration HSMMs in high-frequency data
(Johnson et al., 2012): The Hierarchical Dirichlet Process Hidden Semi-Markov Model
(Hadj-Amar et al., 2020): Bayesian Approximations to Hidden Semi-Markov Models
(Melnyk et al., 2014): A Spectral Algorithm for Inference in Hidden Semi-Markov Models
(Koslik, 2024): Inhomogeneous HSMMs with periodic dwell-time structure
(Engelmann et al., 2022): Forward-Backward Latent State Inference for Hidden Continuous-Time semi-Markov Chains
(Marzen et al., 2016): Informational and Causal Architecture of Continuous-time Renewal and Hidden Semi-Markov Processes
(Tanwani et al., 2018): Generalizing Robot Imitation Learning with Invariant Hidden Semi-Markov Models
(Liao et al., 2020): Health Assessment and Prognostics Based on Higher Order Hidden Semi-Markov Models
(Amini et al., 2021): Hhsmm: An R package for hidden hybrid Markov/semi-Markov models
(Aschermayr et al., 2023): Sequential Bayesian Learning for Hidden Semi-Markov Models
(Durante et al., 2017): Backward Approximate Dynamic Programming with Hidden Semi-Markov Stochastic Models in Energy Storage Optimization
(Ruiz-Suarez et al., 2021): Hidden Markov and semi-Markov models: When and why are these models useful for classifying states in time series data