Papers
Topics
Authors
Recent
2000 character limit reached

RNA Velocity: Modeling Gene Expression Dynamics

Updated 10 January 2026
  • RNA velocity is a quantitative model that estimates transcriptional dynamics and cell fate transitions by analyzing unspliced and spliced RNA counts.
  • It integrates kinetic modeling, Bayesian inference, and network-based approaches to overcome challenges like latent time resolution and computational scalability.
  • Applications include mapping differentiation trajectories in development, disease modeling, and designing targeted interventions using control-theoretic principles.

RNA velocity is a quantitative model for gene expression dynamics that leverages single-cell RNA sequencing (scRNA-seq) data to infer the direction and rate of cell-state transitions. By distinguishing between unspliced (nascent) and spliced (mature) transcript counts at the single-cell level, it reconstructs underlying kinetic parameters and vector flows in gene expression space, serving as a foundational framework for mapping cell fate, lineage transitions, and cycling dynamics. Recent methodological advances address limitations of classical and widely-adopted approaches, incorporating rigorous Bayesian inference, network-theoretic modeling, and mathematically principled landscape construction.

1. Differential Kinetic Models and RNA Velocity Estimation

RNA velocity quantifies transcriptional dynamics by modeling the rates of transcription, splicing, and degradation for each gene. For cell ii and gene gg, spliced (sigs_i^g) and unspliced (uigu_i^g) counts are observed. The canonical kinetic ODEs are: dudt=αβu,dsdt=βuγs,\frac{du}{dt} = \alpha - \beta u, \qquad \frac{ds}{dt} = \beta u - \gamma s, with α\alpha as the transcription rate, β\beta the splicing rate, γ\gamma the degradation rate, and the instantaneous RNA velocity given by vigβuigγsigv_i^g \approx \beta u_i^g - \gamma s_i^g (Jia et al., 2023).

Bayesian frameworks, such as BayVel, further refine this estimation by modeling raw count data via a hierarchical Negative-Binomial structure, incorporating latent group and subgroup variables and fixing global scaling invariances (e.g., βg=1\beta_g=1, average capture efficiency λc=1\lambda_c=1). Velocity is derived for each posterior sample as vk,r,g=βguk,r,gγgsk,r,gv_{k,r,g} = \beta_g u_{k,r,g}^\sim - \gamma_g s_{k,r,g}^\sim (Sabbioni et al., 6 May 2025).

2. Vector Field Construction and Natural Helmholtz–Hodge Decomposition

Discrete cell-wise velocities vi{v_i} are mapped onto a continuous vector field F(x):ΩRnRnF(x): \Omega \subset \mathbb{R}^n \rightarrow \mathbb{R}^n (after projection via dimension reduction such as UMAP). The Natural Helmholtz–Hodge Decomposition (nHHD) theorem guarantees a unique partition for sufficiently smooth F(x)F(x) within a bounded domain: F(x)=Fg(x)+Fc(x)+Fh(x),F(x) = F_g(x) + F_c(x) + F_h(x), where Fg=U(x)F_g = -\nabla U(x) (gradient, curl-free), Fc=×A(x)F_c = \nabla \times A(x) (rotational, divergence-free), and FhF_h harmonic (ΔFh=0\Delta F_h = 0). This decomposition enables explicit distinction of cell differentiation (gradient descent in U(x)U(x)) and cell cycling (rotation in A(x)A(x)) in the Waddington landscape metaphor (Jia et al., 2023). Computationally, U(x)U(x) and A(x)A(x) are identified by solving Poisson-type PDEs or by line integration and convolution with Green's functions on appropriate grids.

3. Network-Based Extensions: Intracellular GRNs and Intercellular Coupling

Recent developments generalize RNA velocity models to account for regulatory context. In a multi-gene, multi-cell system, transcription rates are modulated by an intracellular gene regulatory network (GRN) specified by activating (W+W^+) and repressing (WW^-) matrices. The regulatory input for gene gg is

Rg(s)=κ+q=1ngWgq+sqκ+q=1ngWgqsq,R_g(s) = \frac{\kappa + \sum_{q=1}^{n_g} W_{gq}^{+} s^q}{\kappa + \sum_{q=1}^{n_g} W_{gq}^{-} s^q},

where κ\kappa is a regularization constant. The full spatially-coupled dynamic system incorporates cell–cell adjacency AA and a coupling coefficient cc: duigdt=αigRg(si)βiguig,dsigdt=βiguigγigsig+cj=1ncAij(sjgsig)\frac{du_i^g}{dt} = \alpha_i^g R_g(s_i) - \beta_i^g u_i^g, \quad \frac{ds_i^g}{dt} = \beta_i^g u_i^g - \gamma_i^g s_i^g + c \sum_{j=1}^{n_c} A_{ij}(s_j^g - s_i^g) Equilibrium existence and global stability can be established via spectral radius criteria and Lyapunov functions. The algebraic connectivity λ2(L)\lambda_2(L) of the Laplacian L=DAL = D - A determines consensus strength across the cell population (Hou et al., 3 Jan 2026).

4. Algorithmic and Statistical Identification: Time-Scale Fixation and Uncertainty

RNA velocity is subject to nonidentifiability under time-rescaling (tκtt \to \kappa t, αα/κ\alpha \to \alpha/\kappa, etc.), demanding post-hoc resolution of latent time parameters. Rational fixation recovers global latent times tct_c and true splicing rates βg\beta_g via constrained maximum likelihood, using either multiplicative (tcgβg1=tc+ϵcgt_{cg} \beta_g^{-1} = t_c + \epsilon_{cg}) or additive (tcg=tcβg+ϵcgt_{cg} = t_c \beta_g + \epsilon_{cg}) noise models. Solutions involve Perron eigenvector computation in gene-time matrices (Li et al., 2023).

Bayesian methods further provide principled uncertainty quantification for kinetic rates, latent times, and velocities, propagating posterior samples to credible intervals and posterior predictive diagnostics (Sabbioni et al., 6 May 2025). EM-based frequentist approaches use observed and missing information matrices, expressed via the SEM (Supplemented EM) decomposition, to obtain asymptotic covariance estimates.

5. Downstream Analysis: Transition Kernels, Bandwidth Selection, and Pseudotemporal Distances

To infer developmental trajectories, velocity-induced random walks are constructed via Gaussian-cosine kernels in spliced-count space. The Markov chain transition kernel Pε,nP_{\varepsilon,n} is tuned for operator convergence; optimal kernel bandwidth is determined by balancing variance (O(n1/2εd/4)O(n^{-1/2} \varepsilon^{-d/4})) and bias (O(ε)O(\sqrt{\varepsilon})), yielding εopt=n2/(d+2)\varepsilon_{\rm opt} = n^{-2/(d+2)} (Li et al., 2023).

Pseudotemporal ordering employs mean first-hitting times in these Markov chains. For cell ii and target cluster AA,

kiA=E[min{n0:XnA}X0=i]k_i^A = \mathbb{E}[\min \{ n \ge 0: X_n \in A \} | X_0 = i]

is computed recursively or by linear solvers. For bifurcating or cyclic systems, taboo sets HH enforce trajectory restriction, adjusting hitting time computation for lineage-specific transitions.

6. Applications, Comparative Evaluation, and Control-Theoretic Implications

RNA velocity approaches have been validated on synthetic and real scRNA-seq datasets, including dentate gyrus and hematopoiesis (gradient-dominated flows, minimal cycling), and pancreatic development (simultaneous gradient and rotational flows in cycling progenitors) (Jia et al., 2023, Sabbioni et al., 6 May 2025). Bayesian methods, notably BayVel, outperform deterministic ODE-fitting methods (scVelo) in statistical accuracy and uncertainty quantification, especially under simulation.

Control-theoretic formulations allow for targeted intervention in single-cell GRNs and multicellular systems. Drug-regimen optimization, expressible as minimum-time Hamiltonian problems, yields explicit bang–bang control policies, leveraging system reachability as determined via Lie bracket analysis. These frameworks suggest robust estimation and rational perturbation design for spatial transcriptomics and disease-modeling contexts (Hou et al., 3 Jan 2026).

7. Limitations, Practical Workflow, and Outlook

Although recent RNA velocity methodologies provide rigorous frameworks for kinetic inference and landscape construction, they pose significant computational burden (e.g., MCMC convergence over thousands of genes/cells), identifiability challenges (latent time determination, subgrouping), and dependence on model assumptions (e.g., piecewise-constant ON/OFF transcription, kernel choice). Practical workflows proceed from raw count acquisition, grouping/subgrouping, prior specification, batch inference, and downstream visualization in embedding space. Model selection is performed via WAIC and careful evaluation of clustering granularity.

These advances offer enhanced quantification of developmental potency and cycling strength, accurate mapping of bifurcations and cell-fate transitions, and a testbed for perturbation-optimized control in biological systems. The unified dual-landscape and network-consensus frameworks situate RNA velocity as an essential analytic tool for dynamic, high-dimensional transcriptomic systems.

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to RNA Velocity Approach.