Conservative Q-learning Framework (CQL)
- CQL is a reinforcement learning algorithm that employs conservative value estimates to reduce overestimation bias in offline settings.
- It uses regularization techniques to maintain robust Q-value predictions, thereby improving decision-making in out-of-distribution scenarios.
- Benchmark tests show that CQL enhances stability and reliability in offline RL tasks, making it valuable for safety-critical applications.
In our investigation of the pulsational stability of quasi‐stars—the massive, radiation‐pressure supported envelopes that surround rapidly accreting black‐hole seeds—we have identified a well‐defined “Quasi‐Star Instability Strip” in the Hertzsprung–Russell (HR) diagram. This instability domain is the locus of envelope models that excite global radial oscillations, and it offers a natural explanation for the century‐scale, hysteretic variability recently observed in the strongly lensed Little Red Dot R2211-RX1 (hereafter RX1), while leaving hotter objects such as R2211-RX2 (RX2) quiescent.
- Definition of the Instability Strip
By computing evolutionary sequences of non–rotating quasi‐stars with total masses
and
using a modified version of MESA, we traced their loci in the luminosity–temperature plane (Figure 1). Along each track, we performed a non‐adiabatic linear stability analysis with GYRE, searching for unstable radial modes (). The outcome is that all models with effective temperatures
exhibit only damped oscillations (), while cooler models become unstable. We therefore define the blue edge of the Quasi‐Star Instability Strip at
essentially independent of in the range . As the envelope cools along the Hayashi‐like portion of the track, it enters the colored region of Figure 2, where the fundamental radial mode () is driven. A red edge—beyond which envelopes become so cool and extended that our envelope models terminate numerically—lies further to the right in Figure 2, but quasistatic models spend only a short time there before dynamical effects set in.
- Driving Physics: the κ-Mechanism in H and He Ionization Zones
The root of the instability is the classical κ‐mechanism. At each layer in the envelope, one can define the differential work per unit mass,
which, when integrated from the inner boundary to the surface, yields the total work . Positive contributions (driving) occur in zones where an increase in temperature during compression raises the opacity:
In quasi‐stars, we find two ionization zones of importance: * The hydrogen recombination zone around K produces a sharp local peak in (Figure 3), driven by the steep opacity cliff in H I recombination. * The deeper He II ionization zone at K provides the majority of the net positive work (positive ), because this layer spans a larger fraction of the envelope mass and is less subject to non‐adiabatic damping near the surface. Consequently, quasi‐stars behave as super‐massive analogues of Cepheids, with driving dominated by the κ‐mechanism in the He II zone, rather than by the iron bump that is important in classical massive stars.
- Non–Adiabatic Stability Analysis with GYRE and MESA
We extracted hydrostatic snapshots from MESA at various points along each evolutionary track and imported them into GYRE v8.1. Adopting the “Coughlin” inner boundary radius —the base of the hydrostatic envelope where convection becomes sonic—we solved the fully non‐adiabatic pulsation equations for radial modes (). In dimensionless form, the eigenfrequency satisfies coupled equations for the radial displacement , perturbations to pressure , density , and luminosity , along with appropriate mechanical and thermal boundary conditions. Growth rates are identified by the imaginary part of the eigenfrequency:
Equivalently, one may define the dimensionless growth rate
or convert to a physical growth timescale . We focus on the two lowest‐order p-modes, the fundamental () and the first overtone ().
- Quantitative Mode Properties Across the Strip
Because quasi‐stars have enormous radii AU and masses , their dynamical timescales
Correspondingly, in models near the center of the strip (e.g.\ K), GYRE yields * Fundamental mode: → period growth timescale
* First overtone: → period
marginally damped in linear theory. However, non‐linear hydrodynamic runs in MESA confirm the overtone to be driven, yielding yr.
Across the full mass grid, the fundamental‐mode period varies from yr at up to yr at (Figure 2, color‐coded contours). The first overtone generally emerges as unstable only at slightly lower than the blue edge, with periods in the –$30$ yr range (Figure 4, bottom panel).
- Mapping the Instability Strip on the HR Diagram
Figure 2 presents the HR diagram with evolutionary tracks for and the overlaid instability strip (shaded region). The blue edge at is shown as a thick blue curve; to its right, models carry colored symbols (with symbol shape indicating mass) in the unstable domain, whereas models at higher appear as open white circles to denote stability. Solid black lines are iso‐period contours (labeled in years). We also mark two observational points:
- R2211-RX2 at K, , lying just on the stable side of the blue edge (no variability),
- R2211-RX1 at K, , deep inside the unstable strip (exhibiting century‐scale pulsations).
Hydrodynamic MESA runs (star symbols) confirm that a model just hotter than the blue edge remains stable over centuries, whereas one cooler by K develops large‐amplitude pulsations with periods matching the linear predictions (Figure 5).
- Astrophysical Implications: Pulsation-Driven Superwinds The discovery of pulsational instability has two major consequences for quasi‐star evolution. First, the violent, global radial motions—especially once surface velocities approach Mach numbers —inevitably launch shocks into the envelope. These shocks are expected to couple efficiently to the outer layers, driving a “superwind” analogous to those in mass‐losing super–AGB and red supergiant stars. Preliminary estimates suggest mass‐loss rates , potentially comparable to or exceeding the infall rate from the host galaxy.
Second, this pulsation‐driven feedback provides a natural regulator of the quasi‐star phase and the final mass of the black‐hole seed. If the wind overcomes the accretion supply, it will strip the envelope on timescales much shorter than the nominal thermal timescale, truncating black‐hole growth. Conversely, very high infall rates may allow the quasi‐star to remain cool and deep in the instability strip, leading to even stronger pulsations and winds.
Finally, the absence of short-period (months–years) stochastic flickering in Little Red Dots can be understood as a consequence of their enormous radius ( AU) and small pressure‐scale height ( AU). Local convective or eruptive cells, which dominate variability in red supergiants, are suppressed in disk‐integrated light by a factor , leaving only coherent, global pulsations visible.
In summary, the Quasi‐Star Instability Strip provides a unified framework for interpreting the variability—or lack thereof—of Little Red Dots. Its blue edge at K explains why RX2 is stable, while the redward extension accounts for RX1’s -year cycle. The underlying κ-mechanism driving functionally super–Cepheid oscillations promises to be a potent source of envelope mass loss, thereby regulating the lifetime of the quasi‐star phase and the ultimate mass of the first supermassive black‐hole seeds.