Papers
Topics
Authors
Recent
Search
2000 character limit reached

Neural and Mixture CTMCs

Updated 28 May 2026
  • Neural and Mixture CTMCs are nonparametric extensions of classical CTMCs that use neural networks to learn complex, nonlinear transition rate functions.
  • They replace fixed parametric models with softplus-activated neural mappings, ensuring nonnegativity while capturing arbitrary state and covariate dependencies.
  • Empirical benchmarks show that Neural CTMCs significantly reduce error rates compared to GLM baselines in applications like chemical kinetics, population dynamics, and gene regulation.

Neural and Mixture Continuous-Time Markov Chains (CTMCs) refer to a generalization of classical CTMC frameworks in which the transition rate functions are nonparametrically parameterized, allowing nonlinear dependencies on the current discrete state and covariates. Neural CTMCs employ neural networks to learn the propensity or transition-rate functions directly from fully observed sample paths, with the objective of inferring or accurately reconstructing complex stochastic dynamics underlying systems such as chemical reaction networks, population dynamics, and gene regulatory networks (Reeves et al., 2022).

1. Definition and Mathematical Setup

A CTMC is defined over a (potentially infinite) discrete state space Ω\Omega with piecewise constant sample paths X(t)ΩX(t) \in \Omega and transitions occurring at random times. The primary object is the transition rate λij(x,c)\lambda_{ij}(x,c), representing the instantaneous rate at which the process jumps from state ii to jj given covariates cc, where xx collects state and covariate information. All possible transitions are assumed to correspond to a finite set of known reactions ρ=1,,rρ=1,\ldots,r with prescribed stoichiometric changes ΔSρ\Delta S_{ρ}:

λi,i+ΔSρ(x,c)=αρ(x,c),λij=0 otherwise.\lambda_{i,i+\Delta S_ρ}(x, c) = α_ρ(x, c), \quad \lambda_{ij} = 0 \text{ otherwise}.

The total exit rate from state X(t)ΩX(t) \in \Omega0 is X(t)ΩX(t) \in \Omega1, and the sojourn time in each state is exponentially distributed with this rate. The process is constructed such that for infinitesimal X(t)ΩX(t) \in \Omega2,

X(t)ΩX(t) \in \Omega3

The state vector may comprise multiple species counts X(t)ΩX(t) \in \Omega4 and continuous covariates X(t)ΩX(t) \in \Omega5, concatenated as X(t)ΩX(t) \in \Omega6.

2. Neural Parameterization of Rate Functions

Unlike mass-action or linear log-linear forms, Neural CTMCs do not impose any fixed parametric dependency. Each X(t)ΩX(t) \in \Omega7 is parameterized as an output of a neural network, enabling arbitrary nonlinear dependence on both the discrete state and covariates. Collectively,

X(t)ΩX(t) \in \Omega8

and each component is modeled as a neural function with nonlinearity and output constraining (X(t)ΩX(t) \in \Omega9) to guarantee nonnegativity:

  • In mass-action kinetics: λij(x,c)\lambda_{ij}(x,c)0
  • In Neural CTMCs: λij(x,c)\lambda_{ij}(x,c)1, with λij(x,c)\lambda_{ij}(x,c)2 a neural network, λij(x,c)\lambda_{ij}(x,c)3 softplus.

This methodological extension enables the model to represent non-mass-action kinetics, encompassing a broader class of stochastic processes observed in empirical domains (Reeves et al., 2022).

3. Neural Network Architectures and Implementation

Two principal network configurations are described for λij(x,c)\lambda_{ij}(x,c)4:

  • Population Dynamics Example (Lotka–Volterra, λij(x,c)\lambda_{ij}(x,c)5, no covariates):
    • Input: λij(x,c)\lambda_{ij}(x,c)6
    • Fully connected neural network, 5 layers of 128 neurons with SELU activation, final output layer of λij(x,c)\lambda_{ij}(x,c)7 neurons (one per reaction) with softplus.
  • Chemical Reaction Network Example (2 species + covariate λij(x,c)\lambda_{ij}(x,c)8, λij(x,c)\lambda_{ij}(x,c)9):
    • Input passed to a dense layer (3ii096), reshaped to (3ii132) array
    • Batch-wise 1D convolution across rows (10 filters of size 4), SELU activation
    • Flattened and passed through dense layers (290ii232ii332), then output layer (32ii4), softplus.

No additional regularization (e.g., dropout or weight decay) was employed beyond SELU/nonlinearity. Neural weights were set using framework default initialization. These choices ensure that the model has sufficient representational capacity to model complex propensities from observed trajectories.

4. Likelihood-Based Training and Loss Function

Given fully observed trajectories, the negative log-likelihood (NLL) is derived exactly from the CTMC process:

  • Let ii5 denote the dwell time in state ii6 before the ii7th reaction ii8.
  • The single-step likelihood is

ii9

  • The trajectory-wise NLL becomes

jj0

or, regrouped by reaction jj1,

jj2

where jj3 is the number of occurrences of reaction jj4, jj5 the pre-reaction states.

Gradient-based optimization (using, e.g., Adam) is then performed on this NLL, requiring only jj6 cost per iteration, a critical advantage for systems with large or infinite state spaces as no explicit jj7 matrix is constructed (Reeves et al., 2022).

5. Training Workflow and Computational Aspects

The supervised estimation procedure for N-CTMC can be summarized as:

  1. Extract all tuples jj8 from observed trajectories.
  2. Group by reaction jj9 to build state matrices cc0 and dwell-time vectors cc1.
  3. Initialize neural parameters cc2 with default framework initialization.
  4. Iterate until convergence:
    • Evaluate the networks cc3 for all states and reactions.
    • Compute the loss per reaction as cc4.
    • Aggregate and compute total loss across reactions.
    • Use automatic differentiation to obtain gradients, and update parameters.
    • Monitor loss change or gradient norm for stopping.

Gradients propagate efficiently through all network branches and sum operations. Efficient vectorization by reaction type enables scalability for complex reaction networks.

6. Empirical Benchmarks and Comparative Results

N-CTMCs were empirically evaluated against log-linear generalized linear model (GLM) baselines of the form cc5 on multiple synthetic systems:

  • Birth–Death Process with Covariate: Across cc6–cc7 transitions, N-CTMC mean absolute error (N-MAE) matches that of a counting-based MLE baseline, with both outperforming the GLM approach for sufficiently large datasets.
  • Population Dynamics (Lotka–Volterra Type, cc8 reactions): With increasing numbers of trajectories (cc9, xx0, xx1), N-CTMC error (MAE, MSE) decreases with data volume and surpasses log-linear GLM baselines by over an order of magnitude at xx2 trajectories.
  • Chemical Reaction Network with Temperature Covariate: For xx3–xx4 trajectories, N-CTMC achieves low MAE/W-MAE and MSE/W-MSE, while GLM baselines exhibit large bias, especially on states rarely visited in training.
Setting N-CTMC MAE GLM MAE N-CTMC MSE GLM MSE
Birth–Death (500k transitions) xx5 xx6
Population (10,000 trj., 9 rxn) xx7 xx8 xx9 ρ=1,,rρ=1,\ldots,r0
ChemNet (500 trj., 4 rxn) ρ=1,,rρ=1,\ldots,r1 ρ=1,,rρ=1,\ldots,r2 ρ=1,,rρ=1,\ldots,r3 ρ=1,,rρ=1,\ldots,r4

In all cases, N-CTMC converges to ground truth at a rate comparable to counting-based MLE (which is only feasible when the state space is not prohibitively large), and demonstrates substantial improvement over log-linear models, particularly in denser state-regions and in settings with nonlinear effects or covariates (Reeves et al., 2022).

7. Significance and Generalization

By parameterizing CTMC transition rates with neural functions, N-CTMCs replace the explicit ρ=1,,rρ=1,\ldots,r5 rate matrix with a shared, state-dependent mapping ρ=1,,rρ=1,\ldots,r6. This parameterization enables the learning of arbitrarily complex, nonlinear, and covariate-dependent transition structures. The framework generalizes both mass-action and generalized linear models and applies naturally in birth–death processes, open-loop CTMC control, chemical kinetics with non-mass-action propensities, and temperature- or context-dependent population models. The ability to fit such models directly from fully observed trajectories, with rigorous likelihood-based gradient learning and empirical performance shown across diverse regimes, positions N-CTMCs as a scalable and expressive alternative to previous parametric approaches (Reeves et al., 2022).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Neural and Mixture CTMCs.