Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Nonlinear memory in cell division dynamics across species (2408.14564v1)

Published 26 Aug 2024 in q-bio.QM and physics.bio-ph

Abstract: Regulation of cell growth and division is essential to achieve cell-size homeostasis. Recent advances in imaging technologies, such as mother machines" for bacteria or yeast, have allowed long-term tracking of cell-size dynamics across many generations, and thus have brought major insights into the mechanisms underlying cell-size control. However, understanding the governing rules of cell growth and division within a quantitative dynamical-systems framework remains a major challenge. Here, we implement and apply a framework that makes it possible to infer stochastic differential equation (SDE) models with Poisson noise directly from experimentally measured time series for cellular growth and divisions. To account for potential nonlinear memory effects, we parameterize the Poisson intensity of stochastic cell division events in terms of both the cell's current size and its ancestral history. By applying the algorithm to experimentally measured cell size trajectories, we are able to quantitatively evaluate the linear one-step memory hypothesis underlying the popularsizer",adder", andtimer" models of cell homeostasis. For Escherichia coli and Bacillus subtilis bacteria, Schizosaccharomyces pombe yeast and Dictyostelium discoideum amoebae, we find that in many cases the inferred stochastic models have a substantial nonlinear memory component. This suggests a need to reevaluate and generalize the currently prevailing linear-memory paradigm of cell homeostasis. More broadly, the underlying inference framework is directly applicable to identify quantitative models for stochastic jump processes in a wide range of scientific disciplines.

Summary

  • The paper introduces a framework to infer stochastic differential equation (SDE) models from time-series data of jump processes, designed to detect nonlinear memory in systems like cell division.
  • Applying the framework reveals significant nonlinear memory of mother size in species like E. coli and Dictyostelium discoideum, which goes beyond conventional linear-memory cell division models.
  • This data-driven inference approach is broadly applicable to model diverse stochastic jump processes in various fields, demonstrated with examples from healthcare data and online activity.

This paper (2408.14564) introduces a practical framework for inferring stochastic differential equation (SDE) models with inhomogeneous Poisson noise directly from time-series data of discontinuous jump processes. The primary motivation is to analyze cell growth and division dynamics, which involve continuous growth interrupted by discrete division events. The framework is specifically designed to detect and quantify nonlinear memory effects, which are often not captured by conventional cell homeostasis models like "sizer", "adder", and "timer".

The core idea is to model cell size dynamics sts_t using an SDE:

dst=g(st)dth(st)dN(t)ds_t = g(s_t)dt - h(s_{t^-})dN(t)

where:

  • g(st)g(s_t) is the deterministic growth rate function.
  • h(st)h(s_{t^-}) is the deterministic cut size upon division, representing the size decrease of the mother cell.
  • dN(t)dN(t) is a Poisson counting process that signals division events. The rate of this process, λ\lambda, is history-dependent, λ(st,st,st,...)\lambda(s_t, s_t^*, s_t^{**}, ...), where sts_t is the current size, sts_t^* is the mother cell size (size at the last division), sts_t^{**} is the grandmother size, and so on, capturing potential memory effects across generations.

The framework infers the functional forms of gg, hh, and λ\lambda from experimental cell-size trajectories.

  1. Inference of g(s)g(s) and h(s)h(s): These are inferred using standard linear regression on the continuous growth phases and the size jumps at division, respectively. For biological data, a linear growth rate g(s)=g0+g1sg(s) = g_0 + g_1s and a linear cut size h(s)=h0+h1sh(s) = h_0 + h_1s were found sufficient in many cases.
  2. Inference of λ\lambda: This is the central contribution. To model the potentially nonlinear and memory-dependent division rate λ(st,st)\lambda(s_t, s_t^*), the paper proposes inferring lnλ\ln \lambda by expanding it in terms of a set of orthogonal basis functions θij(st,st)=ϕi(st)ψj(st)\theta_{ij}(s_t, s_t^*) = \phi_i(s_t)\psi_j(s_t^*), where ϕi\phi_i and ψj\psi_j are orthogonal polynomials constructed directly from the empirical distributions of sts_t and sts_t^*. The coefficients wijw_{ij} of this expansion:

    lnλ(st,st)=i,jwijϕi(st)ψj(st)\ln\lambda(s_t, s_t^*) = \sum_{i,j} w_{ij} \phi_i(s_t)\psi_j(s_t^*)

    are inferred using sparse Bayesian inference. This involves maximizing the posterior probability P(wdata)P(w|\text{data}), which combines a likelihood function derived from the Poisson process and a sparsity-promoting Gaussian prior on the weights wijw_{ij}. An Expectation-Maximization (EM) algorithm is used to iteratively estimate the weights and the prior variances.

  3. Model Selection: To avoid overfitting and select the most parsimonious model, a modified Bayesian Information Criterion (BIC) is used. Models with fewer, but more informative, terms in the basis expansion of lnλ\ln\lambda are favored.

Applying this framework to mother-machine data for various species:

  • Escherichia coli [10]: Found exponential growth and symmetric division. The inferred division rate λ(st,st)\lambda(s_t, s_t^*) shows a significant nonlinear dependence on the mother size sts_t^*. This indicates that cells with smaller mother sizes tend to divide faster relative to their size than predicted by linear-memory models, suggesting a mechanism to correct size deviations more aggressively for smaller cells. Analysis of the joint distribution of mother and grandmother sizes confirms substantial memory beyond one generation, although adding two-generation memory did not significantly improve the model's BIC score for E. coli, suggesting one-generation memory is sufficient for the observed dynamics.
  • Schizosaccharomyces pombe [13]: Found linear growth and symmetric division. The inferred λ(st,st)\lambda(s_t, s_t^*) shows a weaker, more nearly linear dependence on sts_t^*, consistent with previous studies suggesting a sizer-like mechanism with weak memory. Singular value decomposition of the mother-grandmother size distribution indicates much weaker memory compared to E. coli.
  • Dictyostelium discoideum [11]: Similar to E. coli, shows strong nonlinear memory of the mother size.
  • Bacillus subtilis [12]: Exhibits weaker nonlinear memory, similar to S. pombe.

The paper proposes quantifying the degree of nonlinear memory by fitting a quadratic curve s(s)α1s+α2(s)2s(s^*) \sim \alpha_1 s^* + \alpha_2 (s^*)^2 to the boundary in the (s,s)(s, s^*) plane where the division rate λ\lambda transitions from low (growth) to high (division). Linear-memory models correspond to α2=0\alpha_2=0. By plotting (α1,α2)(\alpha_1, \alpha_2) for different species, they show that E. coli and D. discoideum fall outside the region of conventional linear models (sizer, adder, timer), highlighting their substantial nonlinear memory.

Practical Applications and Implementation:

  • General Model Discovery: The core inference framework is generic and not limited to cell division. It can be applied to any system generating stochastic time series with discrete jump events where the rate of jumps may depend on past states.
  • Examples Beyond Biology (SI Appendix): The framework is demonstrated on:
    • Stack Overflow badge acquisition history (user activity data).
    • Clinical visit history in an ICU (healthcare data).
    • Earthquake occurrences (geoscience data).
    • For these examples, the jump rate (badge acquisition rate, visit rate, earthquake rate) is modeled as depending on the waiting time since the last event and the waiting time between the two previous events. The framework successfully identifies sparse models that capture the statistics of these discrete events.
  • Implementation Details:
    • The approach relies on constructing orthogonal basis functions from the specific dataset's empirical distributions, which helps in capturing the relevant dynamics efficiently.
    • Sparse Bayesian inference provides a principled way to handle noisy data and avoid overfitting by penalizing model complexity (number of terms in the basis expansion).
    • The use of the EM algorithm for hyperparameter inference and L-BFGS for optimization are standard numerical techniques.
    • Model selection using a modified BIC allows for automated discovery of the best-fitting parsimonious model.
    • The framework is shown to be robust to the choice of basis functions and regularization methods (Lasso, Ridge, Elastic Net).
    • The paper mentions that the framework can be integrated with deep learning techniques (e.g., using neural networks to approximate lnλ\ln\lambda and Adam optimizer), providing flexibility and the potential to leverage established machine learning libraries.

Implementation Considerations:

  • Data Requirements: The framework requires high-resolution time-series data tracking the relevant state variables (e.g., cell size) and clearly identifiable jump events (e.g., divisions). Enough data is needed to reliably construct empirical distributions and orthogonal basis functions.
  • Preprocessing: Data needs preprocessing to filter out anomalies (like chaining in cell data) and identify jump times and associated pre-jump states (mother size, etc.).
  • Computational Cost: While sparse Bayesian inference and EM are generally efficient for moderate numbers of basis functions, the computational cost can increase with the complexity of the chosen basis and the size of the dataset. Scaling to very high-dimensional state spaces or extremely large datasets might require leveraging GPU acceleration or distributed computing, particularly if integrating with deep learning.
  • Basis Function Choice: While the paper shows robustness to different basis types, selecting appropriate basis functions or their functional form (e.g., polynomial degree, kernel width) might require some domain knowledge or empirical tuning.
  • Validation: The paper emphasizes validating the learned model by simulating it and comparing its statistics (e.g., division size distribution, generation time distribution, memory correlations) against the original experimental data.

In summary, this paper provides a powerful and flexible data-driven approach for discovering governing SDEs for stochastic jump processes. Its application to cell division data reveals previously underappreciated nonlinear memory effects across species, offering a richer understanding of cell size control. The generic formulation and demonstrated applicability to diverse real-world datasets highlight its broad potential beyond biological research.

X Twitter Logo Streamline Icon: https://streamlinehq.com