Papers
Topics
Authors
Recent
Search
2000 character limit reached

Segment Hysteresis RL for ISAC Systems

Updated 29 January 2026
  • Segment Hysteresis Based Reinforcement Learning is a novel algorithm that integrates a hysteresis mechanism to stabilize segment assignments in SWAN-ISAC systems.
  • It employs an Advantage Actor-Critic framework within an MDP formulation to jointly optimize transmit beamforming, segment selection, and antenna positioning.
  • The hysteresis mechanism mitigates rapid assignment oscillations and non-stationarity, yielding up to 70% higher reward ceilings compared to fixed-update strategies.

Segment Hysteresis Based Reinforcement Learning (SHRL) is a reinforcement learning algorithm specifically designed for integrated sensing and communication (ISAC) optimization within segmented waveguide-enabled pinching-antenna array (SWAN) systems. SHRL addresses the challenge of dynamic segment selection—critical for both communication throughput and sensing performance—by introducing a hysteresis mechanism that governs when segment allocations are updated, mitigating instability and improving overall system reward. The algorithm is formalized within a Markov decision process (MDP) framework, jointly optimizing transmit beamforming, segment selection, and antenna positioning (Gao et al., 28 Jan 2026).

1. Markov Decision Process Formulation

SHRL frames the SWAN-ISAC control problem as an MDP comprising the following components:

  • State Space: At timestep tt, the state is st=[hkc,t,hks,t,ψt1,ϕt1]s_t = [h_{k_c,t},\, h_{k_s,t},\, \psi_{t-1},\, \phi_{t-1}], where hkc,th_{k_c,t} and hks,th_{k_s,t} denote instantaneous channel state information (CSI) for all KcK_c communication users and KsK_s sensing targets, ψt1\psi_{t-1} is the previous pinching-antenna pose vector, and ϕt1\phi_{t-1} encodes the last segment selection assignments.
  • Action Space: The agent outputs a tuple at=[ψt,ϕt,Wt]a_t = [\psi_t,\, \phi_t,\, W_t], with ψtRM×N×2\psi_t \in \mathbb{R}^{M\times N\times 2} (antenna positions), ϕtRM\phi_t \in \mathbb{R}^M (raw segment-selection logits), and WtCKc×MW_t \in \mathbb{C}^{K_c\times M} (beamforming weights). Segment assignments are executed as ϕ~t\tilde\phi_t after processing through the hysteresis gate.
  • Transition Dynamics: The system updates via (ψt,ϕ~t,Wt)(\psi_t,\, \tilde\phi_t,\, W_t), yielding new CSI, illumination, and data-rate metrics, then transitioning to st+1s_{t+1}. Channel evolution follows deterministic (e.g., line-of-sight) or stochastic models.
  • Reward Function:

rt=kc=1KcRkc,tks=1KsI{Γks,t<Γ~}r_t = \sum_{k_c=1}^{K_c} R_{k_c,t} - \sum_{k_s=1}^{K_s} \mathbb{I}\{\Gamma_{k_s,t} < \tilde\Gamma\}

where Rkc,t=Blog2(1+mhkc,mTsm2mmhkc,mTsm2+σ2)R_{k_c,t} = B\log_2\left(1 + \frac{\sum_m |h_{k_c,m}^T s_m|^2}{\sum_{m\ne m'} |h_{k_c,m}^T s_{m'}|^2 + \sigma^2}\right) is the communication rate for user kck_c, and Γks,t\Gamma_{k_s,t} represents the expected illumination power for sensing target ksk_s; Γ~\tilde\Gamma is a threshold, and violations incur a unit penalty per target.

2. Segment Hysteresis Mechanism

The defining feature of SHRL is the segment hysteresis mechanism. Rather than immediately reassigning segments upon every change in the policy network’s output logits ϕt\phi_t, SHRL applies a probabilistic or threshold-based gate. This mechanism stabilizes assignments and filters out minor or spurious fluctuations, thus suppressing rapid remapping:

  • Probabilistic Gate: For each segment mm:

ϕ~m,t={ϕm,t,with probability pupdate ϕ~m,t1,otherwise\tilde\phi_{m,t} = \begin{cases} \phi_{m,t}, & \text{with probability } p_{\text{update}} \ \tilde\phi_{m,t-1}, & \text{otherwise} \end{cases}

where pupdate(0,1)p_{\text{update}}\in(0,1) regulates update aggressiveness. Smaller values yield more persistent segment allocations.

  • Threshold (Dead-band) Variant:

ϕ~m,t={ϕm,t,ϕm,tϕ~m,t1δhys ϕ~m,t1,otherwise\tilde\phi_{m,t} = \begin{cases} \phi_{m,t}, & |\phi_{m,t} - \tilde\phi_{m,t-1}| \ge \delta_{\text{hys}} \ \tilde\phi_{m,t-1}, & \text{otherwise} \end{cases}

This enforces assignment updates only for changes exceeding a hysteresis gap δhys\delta_{\text{hys}}.

This mechanism avoids structural oscillation in segment-to-antenna mapping, which alleviates non-stationarity in the environment as perceived by the RL agent.

3. Learning Algorithm and Update Rules

SHRL employs the Advantage Actor–Critic (A2C) framework:

  • Actor (πθ(as)\pi_\theta(a|s)): Parameterized by θ\theta, maps states to actions (ψt,ϕt,Wt)(\psi_t,\, \phi_t,\, W_t).
  • Critic (Vψ(st)V_\psi(s_t)): Parameterized by ψ\psi, estimates the value function.

Update rules are as follows:

  • Critic Loss:

LV=(Vψ(st)R^t)2,R^t=rt+γVψ(st+1)\mathcal{L}_V = \big(V_\psi(s_t) - \hat{R}_t\big)^2, \qquad \hat{R}_t = r_t + \gamma V_\psi(s_{t+1})

  • Actor Loss:

Lπ=logπθ(atst)A^t,A^t=R^tVψ(st)\mathcal{L}_\pi = -\log \pi_\theta(a_t|s_t) \cdot \hat{A}_t, \qquad \hat{A}_t = \hat{R}_t - V_\psi(s_t)

Gradients with respect to θ\theta and ψ\psi update the respective networks toward higher-reward policies.

Pseudocode for main steps (abridged):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Initialize θ, ψ; set memory tildeφ (e.g. uniform)
for episode=1N_episodes:
  reset environment, observe s
  for t=0T_max:
    aₜ = π_θ(sₜ)
    extract raw logits φₜ from a
    for each segment m:
      with prob. p_update:
        tildeφ_{m,t} = φ_{m,t}
      else:
        tildeφ_{m,t} = tildeφ_{m,t-1}
    execute (ψₜ, tildeφₜ, Wₜ) in env
    observe rₜ, s_{t+1}
    compute hatRₜ = rₜ + γ V_ψ(s_{t+1})
    compute Aₜ = hatRₜ - V_ψ(sₜ)
    update critic via _ψ (V_ψ(sₜ) - hatRₜ)²
    update actor via _θ [ -log π_θ(aₜ|sₜ)·Aₜ ]
  end for
end for
return θ*, ψ*

4. Hyperparameters and Training Protocol

SHRL is trained using the following key hyperparameters and procedural setup:

Parameter Value/Range Description
Actor LR (αactor\alpha_{actor}) 1 ⁣× ⁣1041\!\times\!10^{-4} Step size for actor network
Critic LR (αcritic\alpha_{critic}) 1 ⁣× ⁣1031\!\times\!10^{-3} Step size for critic network
Discount (γ\gamma) 0.99 Return discount factor
Hysteresis pupdatep_{update} {0.05, 0.1, 0.2} Grid searched
Exploration Additive Gaussian noise Applied to outputs during training
Episode length Tmax=100T_{max}=100 Per episode or until fixed geometry eval
Convergence detection << 10310^{-3} reward improv/200 episodes Via moving-average (window=50)

Episodes are initialized with randomized user and target positions. Convergence is declared when the moving-average reward plateaus under a specified threshold for a sustained interval.

5. Comparative Performance and Ablation

The effectiveness of SHRL is evaluated against several baselines: standard A2C (without hysteresis), SPRL (periodic segment-update every 5 steps), PPO (proximal policy optimization), and a random action agent.

Key evaluation highlights:

  • SHRL achieved the highest final reward (≈13.2), outperforming A2C (≈9.9), SPRL (≈7.7), PPO (≈6.4), and random policies (≈3.0).
  • SHRL converged within approximately 4000 episodes, exhibiting the smallest reward variance.
  • The hysteresis mechanism prevented the reward collapses and non-stationarity seen in standard A2C/PPO.
  • For waveguide configuration M=3M=3, Lwg=40mL_{wg}=40\,\mathrm{m}, SHRL yielded 13.15 bps/Hz for rate and 1.09 ⁣× ⁣1051.09\!\times\!10^{-5} for illumination, outperforming all tested alternatives. For increased LwgL_{wg}, SHRL continued to improve the rate-illumination trade-off.
  • Compared to A2C, SHRL achieved approximately 30% higher final rate and required about 50% fewer episodes to converge.
  • Compared to SPRL with fixed update intervals, SHRL’s adaptive hysteresis delivered a ~70% higher reward ceiling and smoother training.

6. Significance and Implications

Segment Hysteresis Based Reinforcement Learning introduces a principled module for stabilizing assignment dynamics in RL-based resource control for ISAC, particularly when assignment policies are sensitive to minor environmental or agent perturbations. The probabilistic hysteresis gate, governed by pupdatep_{update}, effectively reduces environmental non-stationarity as perceived by the RL agent. This yields efficient convergence and robust performance, particularly relevant in complex multi-objective wireless systems where aggressive segment remapping is inefficient or detrimental (Gao et al., 28 Jan 2026).

A plausible implication is that similar hysteresis primitives may generalize to other RL-controlled resource allocation problems plagued by instability or oscillatory structural decisions.

SHRL represents a step beyond baseline A2C and PPO approaches by incorporating memory into the segment assignment policy via hysteresis. This approach is specifically tailored for SWAN-ISAC system design but may be extensible to other domains requiring persistent control actions or delayed adaptation. Future research may investigate the generalization of hysteresis-based RL modules to broader classes of resource allocation, the exploration of adaptive or learned hysteresis parameters, and integration with more advanced actor-critic architectures.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Segment Hysteresis Based Reinforcement Learning (SHRL) Algorithm.