Papers
Topics
Authors
Recent
Search
2000 character limit reached

Weighted MMSE-DDPG for PA Placement & Beamforming

Updated 10 January 2026
  • The paper presents a hybrid framework combining the classical WMMSE method with DDPG to jointly optimize PA placement and beamforming in blockage-rich environments.
  • It leverages deterministic obstacle modeling and a black-box DDPG actor to adaptively deploy pinching antennas while satisfying rate and power constraints.
  • Empirical results demonstrate rapid convergence and improved sum-rate throughput, highlighting the framework's potential for advanced indoor wireless network design.

Weighted Minimum Mean Square Error Integrated Deep Deterministic Policy Gradient (WMMSE-DDPG) is a hybrid optimization framework designed for the joint placement and beamforming of pinching-antenna (PA) systems in indoor wireless environments that feature line-of-sight (LoS) blockages. The approach leverages the deterministic modeling of obstacles and integrates the classical Weighted Minimum Mean Square Error (WMMSE) methodology with the Deep Deterministic Policy Gradient (DDPG) algorithm from deep reinforcement learning, effectively addressing the non-smooth transitions induced by binary blockage conditions. The interplay allows for adaptive, blockage-aware deployment of PAs and beam patterns that maximize throughput while respecting rate and power constraints (Xie et al., 3 Jan 2026).

1. Problem Formulation and Reformulation

The primary objective is the maximization of aggregate user sum-rate under physical and quality-of-service (QoS) constraints, specifically:

maxΨ,Pm=1Mlog2(1+SINRm(Ψ,P))\max_{\Psi, P} \sum_{m=1}^M \log_2 (1 + \mathrm{SINR}_m(\Psi, P))

subject to RmRt mR_m \geq R_t ~\forall m, mpm2Pt\sum_m \|p_m\|^2 \leq P_t, and Ψ[0,Lx]K\Psi \in [0, L_x]^K, where Ψ\Psi denotes the horizontal positions of KK PAs and P=[p1,...,pM]P = [p_1, ..., p_M] the beamformers.

This is reformulated as a WMMSE minimization:

minP,u,wJ(P,u,w;Ψ)m=1M[wmem(P;Ψ,um)logwm]\min_{P,u,w} J(P, u, w; \Psi) \equiv \sum_{m=1}^M \left[ w_m e_m(P; \Psi, u_m) - \log w_m \right]

subject to emδm=2Rte_m \leq \delta_m = 2^{-R_t} and mpm2Pt\sum_m \|p_m\|^2 \leq P_t. The mean-square error per user RmRt mR_m \geq R_t ~\forall m0 captures the PA‐dependent channel characteristics:

RmRt mR_m \geq R_t ~\forall m1

where RmRt mR_m \geq R_t ~\forall m2 encodes the blockage-aware channel structure through the deterministic LoS-blockage indicator RmRt mR_m \geq R_t ~\forall m3.

2. Integration of DDPG for Non-Smooth Placement Optimization

The non-smooth, discontinuous dependence of RmRt mR_m \geq R_t ~\forall m4 on LoS connectivity (due to binary blockages) renders gradient-based policy optimization ineffective. DDPG, a model-free off-policy actor-critic algorithm for continuous action spaces, is used to treat the PA placement as a black-box control task. The DDPG module defines:

  • State Space RmRt mR_m \geq R_t ~\forall m5: RmRt mR_m \geq R_t ~\forall m6, representing user coordinates, obstacle centers, obstacle radii, and optionally, previous PA positions RmRt mR_m \geq R_t ~\forall m7.
  • Action Space RmRt mR_m \geq R_t ~\forall m8: RmRt mR_m \geq R_t ~\forall m9, with the actor outputting continuous waveguide positions for each PA.
  • Reward Function:

mpm2Pt\sum_m \|p_m\|^2 \leq P_t0,

where mpm2Pt\sum_m \|p_m\|^2 \leq P_t1 applies a soft penalty for sub-threshold rates.

3. WMMSE Algorithm as a Beamforming Subroutine

For any fixed PA configuration mpm2Pt\sum_m \|p_m\|^2 \leq P_t2, beamforming is solved via standard WMMSE iterations internal to each DDPG step:

  • Equalizer Update:

mpm2Pt\sum_m \|p_m\|^2 \leq P_t3

  • Weight Update:

mpm2Pt\sum_m \|p_m\|^2 \leq P_t4, with mpm2Pt\sum_m \|p_m\|^2 \leq P_t5

  • Beamformer Update:

mpm2Pt\sum_m \|p_m\|^2 \leq P_t6, subject to dual variable updates enforcing constraints: mpm2Pt\sum_m \|p_m\|^2 \leq P_t7,

mpm2Pt\sum_m \|p_m\|^2 \leq P_t8

Iterations continue until convergence of mpm2Pt\sum_m \|p_m\|^2 \leq P_t9 and rates Ψ[0,Lx]K\Psi \in [0, L_x]^K0.

4. DDPG Network Architecture and Training Procedure

The WMMSE-DDPG scheme employs neural networks for both actor and critic:

  • Actor Ψ[0,Lx]K\Psi \in [0, L_x]^K1:
    • Input size Ψ[0,Lx]K\Psi \in [0, L_x]^K2
    • Two hidden layers (256 ReLU units each)
    • Output: Ψ[0,Lx]K\Psi \in [0, L_x]^K3 continuous logits, mapped to PA positions via Ψ[0,Lx]K\Psi \in [0, L_x]^K4
  • Critic Ψ[0,Lx]K\Psi \in [0, L_x]^K5:
    • Input: concatenated Ψ[0,Lx]K\Psi \in [0, L_x]^K6
    • Two hidden layers (256 ReLU units each)
    • Output: scalar Q-value

Training per DDPG step:

  • Critic update:

Ψ[0,Lx]K\Psi \in [0, L_x]^K7

Ψ[0,Lx]K\Psi \in [0, L_x]^K8

  • Actor update:

Ψ[0,Lx]K\Psi \in [0, L_x]^K9

Ψ\Psi0

Gaussian or Ornstein–Uhlenbeck noise is added for exploration. The reward is episodic and essentially single-step (contextual bandit), with or without target networks.

5. Algorithm Pseudocode and Workflow

The high-level workflow proceeds as follows:

  1. Initialize actor Ψ\Psi1, critic Ψ\Psi2, and replay buffer Ψ\Psi3.
  2. For each episode:
    • Observe obstacle and user layout (Ψ\Psi4).
    • Actor outputs noisy position proposal Ψ\Psi5.
    • Construct blockage-aware channels Ψ\Psi6.
    • Run WMMSE to solve Ψ\Psi7.
    • Compute and store one-step reward Ψ\Psi8.
    • Sample minibatch from Ψ\Psi9 and update critic and actor networks.
    • Optionally update target networks.
  3. Repeat for KK0.

Parameter settings from the source indicate KK1, batch size KK2–KK3, and KK4 outputs enforcing spatial bounds.

6. Empirical Convergence and Implementation Considerations

Simulation experiments demonstrate rapid convergence of both actor and critic losses, with the sum-rate reward stabilizing after several thousand gradient steps. Key practical strategies include:

  • Pre-computation of obstacle blocking maps for candidate KK5 values to accelerate the channel model.
  • Warm-starting the WMMSE beamforming subroutine using the previous solution to reduce computation.
  • Penalty annealing on KK6 in KK7 for stricter QoS enforcement over training.

Handling abrupt changes in LoS connectivity—when small positional shifts toggle blockage—relies on the black-box nature of the WMMSE-DDPG alternation, enabling the DDPG actor to learn non-smooth policies via experience replay and activation squashing.

7. Context and Significance

WMMSE-DDPG transforms PA deployment in blockage-rich indoor wireless environments. By encapsulating the optimal, rapidly-convergent WMMSE beamforming within a DDPG agent, the methodology enables direct learning of PA placement policies that exploit the deterministic obstacle layout. Notably, simulation results in the referenced work (Xie et al., 3 Jan 2026) indicate significant improvements in system throughput and LoS connectivity over baseline approaches. Additionally, pinching-antenna systems can harness physical obstacles to attenuate co-channel interference, thus converting blockages from liabilities into strategic assets. The framework addresses the core optimization challenge of jointly allocating spatial and signal-processing resources in presence of discrete, combinatorial environmental effects. A plausible implication is its applicability to other blockage-aware network design problems where gradient-based methods are defective due to non-smooth physical constraints.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Weighted Minimum Mean Square Error Integrated Deep Deterministic Policy Gradient (WMMSE-DDPG).