Targeted Beam Alignment Strategy (2PHTS)

Updated 4 February 2026

Targeted beam alignment is an algorithmic framework using multi-armed bandit theory and phased search to efficiently pinpoint optimal beams in high-dimensional spaces.
It exploits angular correlation and heteroscedastic reward modeling to greatly reduce alignment latency and measurement overhead compared to exhaustive search methods.
The 2PHTS approach demonstrates a 4x–20x reduction in probe count and over 97% beam detection accuracy in practical mmWave systems.

A targeted beam alignment strategy refers to an algorithmic framework for efficiently and reliably identifying optimal beam directions in wireless, optical, or particle beam delivery systems. Central to these strategies are mechanisms that minimize alignment latency and measurement overhead while attaining high probability of correct beam selection, often by exploiting the correlation and structure in angular domains or beam codebooks. Contemporary approaches blend pure-exploration multi-armed bandit theory, heteroscedastic reward modeling, and phased search procedures, as exemplified by the Two-Phase Heteroscedastic Track-and-Stop (2PHTS) algorithm (Wei et al., 2022). These techniques are particularly salient in milimeter-wave systems where the beam space is large and probing resources are scarce, but are applicable in broader alignment contexts including mmWave, THz, and general high-dimensional settings.

1. Problem Formulation: Beam Alignment as Structured Pure Exploration

Targeted beam alignment is cast as a pure-exploration multi-armed bandit (MAB) problem. The transmitter is equipped with a fixed analog beamforming codebook $\mathcal{C} = \{ \mathbf{f}_0,\ldots,\mathbf{f}_{K-1} \}$ , each $\mathbf{f}_k$ representing a unit-norm $N$ -antenna beam. Sequentially probing an arm $k$ means transmitting a pilot on $\mathbf{f}_k$ and receiving

$R(\mathbf{f}_k) = | \sqrt{p} \mathbf{h}^\mathsf{H} \mathbf{f}_k + n |^2 \sim \mathcal{N}(\mu_k, 2\sigma^2 \mu_k),$

with mean $\mu_k = p |\mathbf{h}^\mathsf{H} \mathbf{f}_k|^2$ and noise $n \sim \mathcal{CN}(0,\sigma^2)$ . The objective is $(\delta,J)$ -PAC: to find $k^* = \arg\max_k \mu_k$ such that $P(k^\pi = k^*) \ge 1-\delta$ while minimizing the total number of required probes $\tau$ . The method leverages local angular correlation, recognizing that beams within a window $J$ have similar expected rewards.

2. Metric Model and Confidence Bounds: Heteroscedasticity and KL Analysis

Unlike classic MAB settings with homoscedastic Gaussian noise, beam alignment rewards exhibit heteroscedasticity:

Each base-arm $k$ yields $R_k \sim \mathcal{N}(\mu_k, 2\sigma^2 \mu_k)$ .
Super-arms $S$ (sets of $J$ consecutive beams) aggregate rewards: $R_S \sim \mathcal{N}(\mu_S, 2\sigma^2 \mu_S)$ with $\mu_S = p |\mathbf{h}^\mathsf{H} (\sum_{k \in S} \mathbf{f}_k)|^2$ .

Discrimination between arms is quantified by the KL divergence of heteroscedastic Gaussians,

$D_{HG}(\mu_i,\mu_j) = \frac{1}{2} \ln\frac{\mu_j}{\mu_i} + \frac{\mu_i-\mu_j}{2\mu_j} + \frac{(\mu_j-\mu_i)^2}{4\sigma^2 \mu_j} - \frac{1}{2}.$

Stopping while controlling confidence uses $\beta(t,\delta,\alpha) = \ln(\alpha t/\delta)$ .

3. Two-Phase Track-and-Stop Procedure

The 2PHTS strategy partitions alignment into two sequential phases:

Phase I: Beam-Set (Super-arm) Selection

Group $K$ beams into $G = K/J$ non-overlapping sets $S_g$ , each of length $J$ .
Apply Heteroscedastic Track-and-Stop (HT&S) to select $g^*$ $g^{*}$ :
- Track number of pulls $T_g$ , empirical means $\hat{\mu}_g$ .
- Pull least-tested super-arms or allocate based on optimal strategy.
- Stop when the discrimination variable $Z(t)$ between best and alternatives exceeds the bound $\beta(t, \delta_1, 1)$ .
- Select the super-arm with the highest empirical mean.

Phase II: Beam Identification Within Set

Form candidate set $S_f = S_{g^*} \cup$ neighbor (the adjacent set with higher $\hat{\mu}$ ), total $\le 2J$ beams.
Reapply HT&S with risk $\delta_2 = \delta - \delta_1$ to select $k^*$ .
Output $k^*$ as the recommended beam.

This phased structure exploits local smoothness and beam grouping, reducing the probe space from $K$ to $G + 2J$ .

4. Theoretical Guarantees and Optimality

2PHTS admits the following performance bounds:

Lower bound: For any $(\delta,J)$ -PAC algorithm,

$\mathbb{E}[\tau] \ge c^*(\nu) \ln\frac{1}{4\delta},$

where $c^*(\nu)^{-1} = \sup_{w \in \Delta_K} \inf_{u \in \text{Alt}(\nu)} \sum_{k=1}^K w_k D_{HG}(\mu_k, \mu_k^u)$ .

Upper bound: $\limsup_{\delta \to 0} \mathbb{E}[\tau]/\ln(1/\delta) \le c_u^*(s) + c_u^*(b)$ , separating cost for super-arm and base-arm phases.

Crucially, the phase decomposition yields order-of-magnitude reduction in probes compared to exhaustive search.

5. Practical Implementation, Parameters, and Simulation Results

Parameters:

$\delta$ is split between phases ( $\delta_1 + \delta_2 = \delta$ ).
Choice of $J$ determined by codebook ( $J = 2\lceil K/N\rceil - 1$ for $N$ antennas).
Overlapping variants in Phase II use $2J$ arms to prevent boundary effects.

Simulation, as reported in (Wei et al., 2022):

$N=64$ Tx antennas, $K=120$ beams, $J=3$ .
Channels with $L=3$ paths; noise $\sigma^2 = -80$ dBm.
Across synthetic and "ray-tracing city" scenarios, 2PHTS requires $4\times$ – $20\times$ fewer probes than exhaustive or vanilla track-and-stop, achieving $>97\%$ beam-detection probability.

6. Underlying Principles: Correlation Exploitation, Heteroscedasticity, and Latency Reduction

Targeted strategies leverage:

Angular correlation: spatially proximate beams sharing high reward structure render wide-area sweeps unnecessary.
Heteroscedastic modeling: measurement noise scales with mean reward, allowing sharper confidence intervals.
Phased search: coarse localization in super-arm space, refined identification, minimizes expected probe count for target confidence.

In practical mmWave contexts, where coherence time ( $\sim 35\,\mu$ s) permits $>10^4$ slots, 2PHTS typically uses $O(10^2)$ – $O(10^3)$ slots for full alignment.

7. Significance and Extensions

This paradigm extends to hierarchical codebooks, side-information pre-filtering, and adaptive grouping for non-uniform angular statistics. The method generalizes to broader pure-exploration problems in high-dimensional search, provided the reward topology admits sufficient local smoothness. Hyperparameter tuning (e.g., $\alpha$ in $\beta(t,\delta,\alpha)$ , initial pulls, overlapping window size) is implementation-driven; rigorous bounds serve as design guidance for practical system deployment.

The targeted beam alignment strategy, as exemplified by 2PHTS, marks an intersection of multi-armed bandit theory, angular correlation modeling, and phased search logic that achieves provable latency minimization and high-confidence operation in wide-band beamforming systems (Wei et al., 2022).

Markdown Report Issue Upgrade to Chat

References (1)

Fast Beam Alignment via Pure Exploration in Multi-armed Bandits (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Targeted Beam Alignment Strategy.