Weighted MMSE-DDPG for PA Placement & Beamforming
- The paper presents a hybrid framework combining the classical WMMSE method with DDPG to jointly optimize PA placement and beamforming in blockage-rich environments.
- It leverages deterministic obstacle modeling and a black-box DDPG actor to adaptively deploy pinching antennas while satisfying rate and power constraints.
- Empirical results demonstrate rapid convergence and improved sum-rate throughput, highlighting the framework's potential for advanced indoor wireless network design.
Weighted Minimum Mean Square Error Integrated Deep Deterministic Policy Gradient (WMMSE-DDPG) is a hybrid optimization framework designed for the joint placement and beamforming of pinching-antenna (PA) systems in indoor wireless environments that feature line-of-sight (LoS) blockages. The approach leverages the deterministic modeling of obstacles and integrates the classical Weighted Minimum Mean Square Error (WMMSE) methodology with the Deep Deterministic Policy Gradient (DDPG) algorithm from deep reinforcement learning, effectively addressing the non-smooth transitions induced by binary blockage conditions. The interplay allows for adaptive, blockage-aware deployment of PAs and beam patterns that maximize throughput while respecting rate and power constraints (Xie et al., 3 Jan 2026).
1. Problem Formulation and Reformulation
The primary objective is the maximization of aggregate user sum-rate under physical and quality-of-service (QoS) constraints, specifically:
subject to , , and , where denotes the horizontal positions of PAs and the beamformers.
This is reformulated as a WMMSE minimization:
subject to and . The mean-square error per user 0 captures the PA‐dependent channel characteristics:
1
where 2 encodes the blockage-aware channel structure through the deterministic LoS-blockage indicator 3.
2. Integration of DDPG for Non-Smooth Placement Optimization
The non-smooth, discontinuous dependence of 4 on LoS connectivity (due to binary blockages) renders gradient-based policy optimization ineffective. DDPG, a model-free off-policy actor-critic algorithm for continuous action spaces, is used to treat the PA placement as a black-box control task. The DDPG module defines:
- State Space 5: 6, representing user coordinates, obstacle centers, obstacle radii, and optionally, previous PA positions 7.
- Action Space 8: 9, with the actor outputting continuous waveguide positions for each PA.
- Reward Function:
0,
where 1 applies a soft penalty for sub-threshold rates.
3. WMMSE Algorithm as a Beamforming Subroutine
For any fixed PA configuration 2, beamforming is solved via standard WMMSE iterations internal to each DDPG step:
- Equalizer Update:
3
- Weight Update:
4, with 5
- Beamformer Update:
6, subject to dual variable updates enforcing constraints: 7,
8
Iterations continue until convergence of 9 and rates 0.
4. DDPG Network Architecture and Training Procedure
The WMMSE-DDPG scheme employs neural networks for both actor and critic:
- Actor 1:
- Input size 2
- Two hidden layers (256 ReLU units each)
- Output: 3 continuous logits, mapped to PA positions via 4
- Critic 5:
- Input: concatenated 6
- Two hidden layers (256 ReLU units each)
- Output: scalar Q-value
Training per DDPG step:
- Critic update:
7
8
- Actor update:
9
0
Gaussian or Ornstein–Uhlenbeck noise is added for exploration. The reward is episodic and essentially single-step (contextual bandit), with or without target networks.
5. Algorithm Pseudocode and Workflow
The high-level workflow proceeds as follows:
- Initialize actor 1, critic 2, and replay buffer 3.
- For each episode:
- Observe obstacle and user layout (4).
- Actor outputs noisy position proposal 5.
- Construct blockage-aware channels 6.
- Run WMMSE to solve 7.
- Compute and store one-step reward 8.
- Sample minibatch from 9 and update critic and actor networks.
- Optionally update target networks.
- Repeat for 0.
Parameter settings from the source indicate 1, batch size 2–3, and 4 outputs enforcing spatial bounds.
6. Empirical Convergence and Implementation Considerations
Simulation experiments demonstrate rapid convergence of both actor and critic losses, with the sum-rate reward stabilizing after several thousand gradient steps. Key practical strategies include:
- Pre-computation of obstacle blocking maps for candidate 5 values to accelerate the channel model.
- Warm-starting the WMMSE beamforming subroutine using the previous solution to reduce computation.
- Penalty annealing on 6 in 7 for stricter QoS enforcement over training.
Handling abrupt changes in LoS connectivity—when small positional shifts toggle blockage—relies on the black-box nature of the WMMSE-DDPG alternation, enabling the DDPG actor to learn non-smooth policies via experience replay and activation squashing.
7. Context and Significance
WMMSE-DDPG transforms PA deployment in blockage-rich indoor wireless environments. By encapsulating the optimal, rapidly-convergent WMMSE beamforming within a DDPG agent, the methodology enables direct learning of PA placement policies that exploit the deterministic obstacle layout. Notably, simulation results in the referenced work (Xie et al., 3 Jan 2026) indicate significant improvements in system throughput and LoS connectivity over baseline approaches. Additionally, pinching-antenna systems can harness physical obstacles to attenuate co-channel interference, thus converting blockages from liabilities into strategic assets. The framework addresses the core optimization challenge of jointly allocating spatial and signal-processing resources in presence of discrete, combinatorial environmental effects. A plausible implication is its applicability to other blockage-aware network design problems where gradient-based methods are defective due to non-smooth physical constraints.