Papers
Topics
Authors
Recent
2000 character limit reached

Conservative Q-learning Framework (CQL)

Updated 23 December 2025
  • CQL is a reinforcement learning algorithm that employs conservative value estimates to reduce overestimation bias in offline settings.
  • It uses regularization techniques to maintain robust Q-value predictions, thereby improving decision-making in out-of-distribution scenarios.
  • Benchmark tests show that CQL enhances stability and reliability in offline RL tasks, making it valuable for safety-critical applications.

In our investigation of the pulsational stability of quasi‐stars—the massive, radiation‐pressure supported envelopes that surround rapidly accreting black‐hole seeds—we have identified a well‐defined “Quasi‐Star Instability Strip” in the Hertzsprung–Russell (HR) diagram. This instability domain is the locus of envelope models that excite global radial oscillations, and it offers a natural explanation for the century‐scale, hysteretic variability recently observed in the strongly lensed Little Red Dot R2211-RX1 (hereafter RX1), while leaving hotter objects such as R2211-RX2 (RX2) quiescent.

  1. Definition of the Instability Strip By computing evolutionary sequences of non–rotating quasi‐stars with total masses M=104,2×104,5×104,105,M_\star = 10^4,\,2\times10^4,\,5\times10^4,\,10^5, and 2×105M2\times10^5\,M_\odot using a modified version of MESA, we traced their loci in the luminosity–temperature plane (Figure 1). Along each track, we performed a non‐adiabatic linear stability analysis with GYRE, searching for unstable radial modes (=0\ell=0). The outcome is that all models with effective temperatures

    log10(Teff/K)3.71(Teff5000 ⁣ ⁣5200 K)\log_{10}\bigl(T_{\rm eff}/{\rm K}\bigr)\gtrsim 3.71\quad \bigl(T_{\rm eff}\gtrsim5000\!-\!5200\text{ K}\bigr)

exhibit only damped oscillations ((ω)<0\Im(\omega)<0), while cooler models become unstable. We therefore define the blue edge of the Quasi‐Star Instability Strip at

Teff, blueedge5000 ⁣ ⁣5200 K,T_{\rm eff,\ blue\,edge}\simeq5000\!-\!5200\ {\rm K},

essentially independent of MM_\star in the range 104 ⁣ ⁣105M10^4\!-\!10^5\,M_\odot. As the envelope cools along the Hayashi‐like portion of the track, it enters the colored region of Figure 2, where the fundamental radial mode (np=1n_p=1) is driven. A red edge—beyond which envelopes become so cool and extended that our envelope models terminate numerically—lies further to the right in Figure 2, but quasistatic models spend only a short time there before dynamical effects set in.

  1. Driving Physics: the κ-Mechanism in H and He Ionization Zones The root of the instability is the classical κ‐mechanism. At each layer in the envelope, one can define the differential work per unit mass,

    dWdm  =  12 ⁣[δpδ ⁣(1ρ)],\frac{dW}{dm}\;=\;\frac12\,\Re\!\Bigl[\delta p\,\delta\!\Bigl(\frac{1}{\rho}\Bigr)^*\Bigr]\,,

which, when integrated from the inner boundary to the surface, yields the total work WW. Positive contributions (driving) occur in zones where an increase in temperature during compression raises the opacity:

(lnκ/lnT)ρ  >  0dWdm>0.\Bigl(\partial\ln\kappa/\partial\ln T\Bigr)_\rho \;>\;0\quad\Longrightarrow\quad \frac{dW}{dm}>0.

In quasi‐stars, we find two ionization zones of importance: * The hydrogen recombination zone around T104T\sim10^4 K produces a sharp local peak in dW/dmdW/dm (Figure 3), driven by the steep opacity cliff in H I recombination. * The deeper He II ionization zone at T4×104T\sim4\times10^4 K provides the majority of the net positive work (positive WW), because this layer spans a larger fraction of the envelope mass and is less subject to non‐adiabatic damping near the surface. Consequently, quasi‐stars behave as super‐massive analogues of Cepheids, with driving dominated by the κ‐mechanism in the He II zone, rather than by the iron bump that is important in classical massive stars.

  1. Non–Adiabatic Stability Analysis with GYRE and MESA We extracted hydrostatic snapshots from MESA at various points along each evolutionary track and imported them into GYRE v8.1. Adopting the “Coughlin” inner boundary radius RiR_i—the base of the hydrostatic envelope where convection becomes sonic—we solved the fully non‐adiabatic pulsation equations for radial modes (=0\ell=0). In dimensionless form, the eigenfrequency ω\omega satisfies coupled equations for the radial displacement ξr(r)\xi_r(r), perturbations to pressure δp\delta p, density δρ\delta\rho, and luminosity δL\delta L, along with appropriate mechanical and thermal boundary conditions. Growth rates are identified by the imaginary part of the eigenfrequency:

    ω=(ω)+i(ω),(ω)>0    unstable.\omega = \Re(\omega)+i\,\Im(\omega),\qquad \Im(\omega)>0\;\Longrightarrow\;\text{unstable}.

Equivalently, one may define the dimensionless growth rate

η  =  (ω)(ω),\eta \;=\; \frac{\Im(\omega)}{\Re(\omega)}\,,

or convert to a physical growth timescale τgrowthτdyn/(ω)\tau_{\rm growth}\sim\tau_{\rm dyn}/\Im(\omega). We focus on the two lowest‐order p-modes, the fundamental (np=1n_p=1) and the first overtone (np=2n_p=2).

  1. Quantitative Mode Properties Across the Strip Because quasi‐stars have enormous radii R103R_\star\sim10^3 AU and masses M104 ⁣ ⁣105MM_\star\sim10^4\!-\!10^5\,M_\odot, their dynamical timescales

    τdyn  =  R3GM1020 yr.\tau_{\rm dyn} \;=\;\sqrt{\frac{R_\star^3}{G M_\star}} \sim 10\text{–}20\ {\rm yr}.

Correspondingly, in models near the center of the strip (e.g.\ M=105M, Teff4700M_\star=10^5\,M_\odot,\ T_{\rm eff}\simeq4700 K), GYRE yields * Fundamental mode: (ω)0.91, (ω)+0.039\Re(\omega)\approx0.91,\ \Im(\omega)\approx+0.039 → period P12πτdyn(ω)73 yr,P_1 \approx \frac{2\pi\,\tau_{\rm dyn}}{\Re(\omega)} \simeq 73\text{ yr}, growth timescale

τgrowthτdyn(ω)273 yr.\tau_{\rm growth}\approx \frac{\tau_{\rm dyn}}{\Im(\omega)}\simeq 273\text{ yr}.

* First overtone: (ω)3.10, (ω)0.021\Re(\omega)\approx3.10,\ \Im(\omega)\approx -0.021 → period

P22πτdyn(ω)21.5 yr,P_2 \approx \frac{2\pi\,\tau_{\rm dyn}}{\Re(\omega)} \simeq 21.5\text{ yr},

marginally damped in linear theory. However, non‐linear hydrodynamic runs in MESA confirm the overtone to be driven, yielding P223.5P_2\simeq23.5 yr.

Across the full mass grid, the fundamental‐mode period varies from 20\sim20 yr at M=104MM_\star=10^4\,M_\odot up to 180\gtrsim180 yr at M=2×105MM_\star=2\times10^5\,M_\odot (Figure 2, color‐coded contours). The first overtone generally emerges as unstable only at slightly lower TeffT_{\rm eff} than the blue edge, with periods in the 10\sim10–$30$ yr range (Figure 4, bottom panel).

  1. Mapping the Instability Strip on the HR Diagram Figure 2 presents the HR diagram with evolutionary tracks for M=104 ⁣ ⁣2×105MM_\star=10^4\!-\!2\times10^5\,M_\odot and the overlaid instability strip (shaded region). The blue edge at logTeff3.71\log T_{\rm eff}\simeq3.71 is shown as a thick blue curve; to its right, models carry colored symbols (with symbol shape indicating mass) in the unstable domain, whereas models at higher TeffT_{\rm eff} appear as open white circles to denote stability. Solid black lines are iso‐period contours (labeled in years). We also mark two observational points:
    • R2211-RX2 at Teff5000T_{\rm eff}\approx5000 K, L3×107LL\simeq3\times10^7\,L_\odot, lying just on the stable side of the blue edge (no variability),
    • R2211-RX1 at Teff4000T_{\rm eff}\approx4000 K, L5×107LL\simeq5\times10^7\,L_\odot, deep inside the unstable strip (exhibiting century‐scale pulsations).

Hydrodynamic MESA runs (star symbols) confirm that a model just hotter than the blue edge remains stable over centuries, whereas one cooler by 300\sim300 K develops large‐amplitude pulsations with periods matching the linear predictions (Figure 5).

  1. Astrophysical Implications: Pulsation-Driven Superwinds The discovery of pulsational instability has two major consequences for quasi‐star evolution. First, the violent, global radial motions—especially once surface velocities approach Mach numbers 1\gtrsim1—inevitably launch shocks into the envelope. These shocks are expected to couple efficiently to the outer layers, driving a “superwind” analogous to those in mass‐losing super–AGB and red supergiant stars. Preliminary estimates suggest mass‐loss rates M˙wind102 ⁣ ⁣1Myr1\dot M_{\rm wind}\gtrsim10^{-2}\!-\!1\,M_\odot\,{\rm yr}^{-1}, potentially comparable to or exceeding the infall rate from the host galaxy.

Second, this pulsation‐driven feedback provides a natural regulator of the quasi‐star phase and the final mass of the black‐hole seed. If the wind overcomes the accretion supply, it will strip the envelope on timescales much shorter than the nominal thermal timescale, truncating black‐hole growth. Conversely, very high infall rates may allow the quasi‐star to remain cool and deep in the instability strip, leading to even stronger pulsations and winds.

Finally, the absence of short-period (months–years) stochastic flickering in Little Red Dots can be understood as a consequence of their enormous radius (R103R_\star\sim10^3 AU) and small pressure‐scale height (HP1H_P\sim1 AU). Local convective or eruptive cells, which dominate variability in red supergiants, are suppressed in disk‐integrated light by a factor (HP/R)1\sim(H_P/R_\star)\ll1, leaving only coherent, global =0\ell=0 pulsations visible.

In summary, the Quasi‐Star Instability Strip provides a unified framework for interpreting the variability—or lack thereof—of Little Red Dots. Its blue edge at Teff5000 ⁣ ⁣5200T_{\rm eff}\simeq5000\!-\!5200 K explains why RX2 is stable, while the redward extension accounts for RX1’s 30\sim30-year cycle. The underlying κ-mechanism driving functionally super–Cepheid oscillations promises to be a potent source of envelope mass loss, thereby regulating the lifetime of the quasi‐star phase and the ultimate mass of the first supermassive black‐hole seeds.

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Conservative Q-learning Framework (CQL).