SCLAR Framework: Cross-Layer Wireless Networks
- SCLAR Framework is a rigorous cross-layer metric that combines physical-layer (SINR and capacity) and MAC-layer (collision and jamming) outcomes in wireless networks.
- It employs a ResNet-based deep Q-network to autonomously learn optimal transmission policies, achieving significant performance gains and rapid convergence in multi-cell uplink scenarios.
- The design couples channel quality and transmission success, ensuring spectral efficiency while facing challenges like high offline training requirements and limited real-time adaptability.
The Sum Cross-Layer Achievable Rate (SCLAR) framework provides a rigorous, cross-layer metric and methodology for managing channel access in time-slotted uplink wireless networks subject to stochastic transmission schedules and adversarial jamming. Targeting multi-cell and multi-user environments, SCLAR explicitly couples physical-layer modulation, coding, and signal-to-interference-plus-noise ratio (SINR) with medium access control (MAC)-layer outcomes such as successful transmission, collision, and interference. Leveraging deep reinforcement learning (DRL), specifically residual network-based Deep Q-Networks (ResDQN), SCLAR enables intelligent user equipment to autonomously learn transmission policies that maximize network-wide throughput under hostile and uncertain conditions (Basit et al., 20 Jan 2025, Basit et al., 2024).
1. Mathematical Definition of SCLAR
At its core, SCLAR is defined in terms of physical-layer channel capacity, MAC-layer success, and a summation across users and time slots. For user equipment in cell during slot :
- The instantaneous post-SIC SINR is
- The physical-layer capacity (in bits/s/Hz) is
- The MAC-layer success rate is the probability that user 's packet is successfully received in slot (i.e., no collision, no jamming).
- The cross-layer achievable rate (CLAR) is
- The sum cross-layer achievable rate (SCLAR) for cell over a frame of slots (and aggregated across the UEs) is
This composite figure of merit incentivizes physical-layer spectral efficiency only if transmission is MAC-successful; packets lost to collisions or jamming contribute zero.
2. System and Channel Model
SCLAR is conceived for a multi-cell, time-slotted uplink network with the following structure:
- Each of cells contains a cluster head (CH) equipped with antennas and single-antenna user equipments (UEs)—one intelligent UE (iUE) and predefined UEs (pUEs).
- Transmissions are organized into frames indexed by , each with time slots .
- pUE transmission schedules are i.i.d. Bernoulli random variables with parameter : .
- Malicious jammers ( in each cell) follow periodic on/off activation over slots, independent of UE scheduling.
- Channel vectors for UEs and jammers experience small-scale Rayleigh fading and quasi-static spatial variation, with no large-scale fading modeled.
- The received signal at CH is
- Detection at the CH uses linear receivers (), followed by matched-filter successive interference cancellation (SIC).
3. SCLAR Maximization as a POMDP/MDP
The SCLAR maximization problem seeks the optimal transmit policies (binary slot allocation) for all UEs, maximizing
subject to and unknown jammer actions . Here is the CLAR vector of UE over slots.
This dynamic scenario is naturally formulated as a partially observed Markov decision process (POMDP):
- Agent: The iUE in each cell.
- Action space: for each slot.
- Observation: After each slot, a 6-element one-hot encoding signals "idle," "busy," "success," "collision," "jammed-UE," or "jammed-pUE."
- State: Concatenation, over all UEs (pUEs+iUE), of their .
- Reward: Network-level,
where are scaled CLARs, and sets reward magnitudes based on outcomes ("good", "excellent", "bad", "worst").
The transition model, incorporating all pUE and jammer schedules, is unknown; learning proceeds from observed transitions and rewards.
4. Deep Reinforcement Learning: ResNet-Based Q-Learning
A ResNet-based DQN (“ResDQN”) is used to approximate the optimal action-value function :
- Input: State vector (-dimensional, e.g., 39).
- Architecture: Five residual blocks in series, each with two fully-connected 32-unit ReLU layers and skip connections. The output is passed to two 128-unit dense ReLU layers, then to a two-unit output for -values (one per action).
- Loss function: Mean-squared Bellman temporal difference (TD) error,
- Training protocol:
- Choose via -greedy on .
- Apply , observe .
- Store in .
- Sample minibatch from ; update via gradient descent.
- Target network updated via soft update .
- Decay as learning proceeds.
- 3. Return at convergence.
Key DRL hyperparameters include learning rate , discount factor $0.99$, replay buffer size , minibatch size $64$, and soft update factor (Basit et al., 20 Jan 2025, Basit et al., 2024).
5. Performance Evaluation and Comparative Results
SCLAR framework performance has been benchmarked using:
- Simulation settings: Multi-cell, multi-UE (1 iUE + 10–35 pUEs per cell), 2–5 jammers, frame sizes 5–30 slots.
- Baselines: Fully connected DQN (FC-DQN), GRU-DQN, network-aware UE (omniscient optimum), tabular Q-learning (Basit et al., 2024).
- Key metrics: Instantaneous and average SCLAR, cumulative reward, convergence rate, and training loss.
Observed outcomes include:
- ResDQN achieves within a few percent of the omniscient optimum SCLAR across all frame sizes.
- Outperforms FC-DQN and GRU-DQN by 15–25% in final SCLAR and by 20–35% in convergence speed. For example, in (Basit et al., 2024), average SCLAR (bits/s/Hz) over slots 21–100: Tabular Q = 5.2, FC-DNN DQN = 11.8, ResDNN DQN = 18.7.
- Demonstrated robustness to increased numbers of pUEs and jammers; convergence to optimal reward in episodes.
- Training loss declines monotonically; learning produces judicious action patterns that avoid collision and jamming.
6. Insights, Limitations, and Prospects
The SCLAR framework exhibits the following properties:
- Cross-layer reward formulation directly couples physical-layer channel quality (via SINR) and MAC-layer success, enabling agents to adapt without explicit coordination for coexistence.
- ResNet skip connections in the Q-network benefit training by facilitating identity mappings and more stable policy updates in partially observed MDPs.
- Principal limitations are the need for substantial offline training, high memory requirements due to large replay buffers, and limited adaptability to rapid online environment changes due to partial observability.
- Current design assumes a fixed number of iUEs per cell and a static frame structure; scenario generalization requires further development.
Extension avenues include multi-agent DRL for multiple iUEs and inter-cell coordination, continuous action spaces for joint time-slot and power allocations, transfer learning for mobility or variable topologies, and integration with reconfigurable intelligent surfaces for enhanced anti-jamming capabilities (Basit et al., 20 Jan 2025).
7. Relationship to Related Work
The SCLAR framework unifies and advances DRL-based channel access initiatives by providing a mathematically rigorous, physically grounded, and simulation-validated cross-layer performance metric. Direct comparison with DRL alternatives, including tabular Q-learning and fully-connected DNNs, demonstrates that the addition of residual connectivity in the Q-network architecture is crucial for robust and rapid learning in jam-prone, partially observed settings (Basit et al., 20 Jan 2025, Basit et al., 2024). SCLAR's general methodology supports diverse wireless scenarios subject to adversarial interference, limited feedback, and stringent coexistence requirements.