Neural and Mixture CTMCs
- Neural and Mixture CTMCs are nonparametric extensions of classical CTMCs that use neural networks to learn complex, nonlinear transition rate functions.
- They replace fixed parametric models with softplus-activated neural mappings, ensuring nonnegativity while capturing arbitrary state and covariate dependencies.
- Empirical benchmarks show that Neural CTMCs significantly reduce error rates compared to GLM baselines in applications like chemical kinetics, population dynamics, and gene regulation.
Neural and Mixture Continuous-Time Markov Chains (CTMCs) refer to a generalization of classical CTMC frameworks in which the transition rate functions are nonparametrically parameterized, allowing nonlinear dependencies on the current discrete state and covariates. Neural CTMCs employ neural networks to learn the propensity or transition-rate functions directly from fully observed sample paths, with the objective of inferring or accurately reconstructing complex stochastic dynamics underlying systems such as chemical reaction networks, population dynamics, and gene regulatory networks (Reeves et al., 2022).
1. Definition and Mathematical Setup
A CTMC is defined over a (potentially infinite) discrete state space with piecewise constant sample paths and transitions occurring at random times. The primary object is the transition rate , representing the instantaneous rate at which the process jumps from state to given covariates , where collects state and covariate information. All possible transitions are assumed to correspond to a finite set of known reactions with prescribed stoichiometric changes :
The total exit rate from state 0 is 1, and the sojourn time in each state is exponentially distributed with this rate. The process is constructed such that for infinitesimal 2,
3
The state vector may comprise multiple species counts 4 and continuous covariates 5, concatenated as 6.
2. Neural Parameterization of Rate Functions
Unlike mass-action or linear log-linear forms, Neural CTMCs do not impose any fixed parametric dependency. Each 7 is parameterized as an output of a neural network, enabling arbitrary nonlinear dependence on both the discrete state and covariates. Collectively,
8
and each component is modeled as a neural function with nonlinearity and output constraining (9) to guarantee nonnegativity:
- In mass-action kinetics: 0
- In Neural CTMCs: 1, with 2 a neural network, 3 softplus.
This methodological extension enables the model to represent non-mass-action kinetics, encompassing a broader class of stochastic processes observed in empirical domains (Reeves et al., 2022).
3. Neural Network Architectures and Implementation
Two principal network configurations are described for 4:
- Population Dynamics Example (Lotka–Volterra, 5, no covariates):
- Input: 6
- Fully connected neural network, 5 layers of 128 neurons with SELU activation, final output layer of 7 neurons (one per reaction) with softplus.
- Chemical Reaction Network Example (2 species + covariate 8, 9):
- Input passed to a dense layer (3096), reshaped to (3132) array
- Batch-wise 1D convolution across rows (10 filters of size 4), SELU activation
- Flattened and passed through dense layers (290232332), then output layer (324), softplus.
No additional regularization (e.g., dropout or weight decay) was employed beyond SELU/nonlinearity. Neural weights were set using framework default initialization. These choices ensure that the model has sufficient representational capacity to model complex propensities from observed trajectories.
4. Likelihood-Based Training and Loss Function
Given fully observed trajectories, the negative log-likelihood (NLL) is derived exactly from the CTMC process:
- Let 5 denote the dwell time in state 6 before the 7th reaction 8.
- The single-step likelihood is
9
- The trajectory-wise NLL becomes
0
or, regrouped by reaction 1,
2
where 3 is the number of occurrences of reaction 4, 5 the pre-reaction states.
Gradient-based optimization (using, e.g., Adam) is then performed on this NLL, requiring only 6 cost per iteration, a critical advantage for systems with large or infinite state spaces as no explicit 7 matrix is constructed (Reeves et al., 2022).
5. Training Workflow and Computational Aspects
The supervised estimation procedure for N-CTMC can be summarized as:
- Extract all tuples 8 from observed trajectories.
- Group by reaction 9 to build state matrices 0 and dwell-time vectors 1.
- Initialize neural parameters 2 with default framework initialization.
- Iterate until convergence:
- Evaluate the networks 3 for all states and reactions.
- Compute the loss per reaction as 4.
- Aggregate and compute total loss across reactions.
- Use automatic differentiation to obtain gradients, and update parameters.
- Monitor loss change or gradient norm for stopping.
Gradients propagate efficiently through all network branches and sum operations. Efficient vectorization by reaction type enables scalability for complex reaction networks.
6. Empirical Benchmarks and Comparative Results
N-CTMCs were empirically evaluated against log-linear generalized linear model (GLM) baselines of the form 5 on multiple synthetic systems:
- Birth–Death Process with Covariate: Across 6–7 transitions, N-CTMC mean absolute error (N-MAE) matches that of a counting-based MLE baseline, with both outperforming the GLM approach for sufficiently large datasets.
- Population Dynamics (Lotka–Volterra Type, 8 reactions): With increasing numbers of trajectories (9, 0, 1), N-CTMC error (MAE, MSE) decreases with data volume and surpasses log-linear GLM baselines by over an order of magnitude at 2 trajectories.
- Chemical Reaction Network with Temperature Covariate: For 3–4 trajectories, N-CTMC achieves low MAE/W-MAE and MSE/W-MSE, while GLM baselines exhibit large bias, especially on states rarely visited in training.
| Setting | N-CTMC MAE | GLM MAE | N-CTMC MSE | GLM MSE |
|---|---|---|---|---|
| Birth–Death (500k transitions) | 5 | — | 6 | — |
| Population (10,000 trj., 9 rxn) | 7 | 8 | 9 | 0 |
| ChemNet (500 trj., 4 rxn) | 1 | 2 | 3 | 4 |
In all cases, N-CTMC converges to ground truth at a rate comparable to counting-based MLE (which is only feasible when the state space is not prohibitively large), and demonstrates substantial improvement over log-linear models, particularly in denser state-regions and in settings with nonlinear effects or covariates (Reeves et al., 2022).
7. Significance and Generalization
By parameterizing CTMC transition rates with neural functions, N-CTMCs replace the explicit 5 rate matrix with a shared, state-dependent mapping 6. This parameterization enables the learning of arbitrarily complex, nonlinear, and covariate-dependent transition structures. The framework generalizes both mass-action and generalized linear models and applies naturally in birth–death processes, open-loop CTMC control, chemical kinetics with non-mass-action propensities, and temperature- or context-dependent population models. The ability to fit such models directly from fully observed trajectories, with rigorous likelihood-based gradient learning and empirical performance shown across diverse regimes, positions N-CTMCs as a scalable and expressive alternative to previous parametric approaches (Reeves et al., 2022).