Squint in Wireless, Learning & Robotics

Updated 4 July 2026

Squint is a multifaceted topic covering beam squint in wideband arrays, a second-order, parameter-free online learning algorithm, and a visual soft actor–critic method in robotics.
In communications, squint introduces frequency-dependent deviations in beam direction, affecting spectral efficiency and prompting novel codebook and beamforming designs.
In learning and robotics, Squint achieves adaptive second-order regret control and fast sim-to-real transfer via resolution squinting and optimized visual pipelines.

Squint denotes several technical concepts in current arXiv literature. In wireless communications it most often refers to beam squint, the frequency-dependent shift of a beam’s main lobe or focal point in wideband arrays. In online learning it names a second-order, parameter-free algorithm for prediction with expert advice and later extensions for changing environments and global second-order control. In robotics it names a visual Soft Actor–Critic method engineered for fast wall-clock training and zero-shot sim-to-real transfer (Cai et al., 2017, Neuteboom et al., 2022, Almuzairee et al., 24 Feb 2026).

1. Technical senses of the term

In the literature represented here, the term is used in three distinct ways.

Domain	Meaning	Representative arXiv source
Wideband wireless/array processing	Beam squint: frequency-dependent beam or focus displacement	(Cai et al., 2017)
Online learning	Squint: second-order expert-advice algorithm and later variants	(Neuteboom et al., 2022)
Sim-to-real robotics	Squint: visual SAC method with “resolution squinting”	(Almuzairee et al., 24 Feb 2026)

The first usage is by far the broadest. It spans switched-beam codebooks, hybrid beamforming, THz and sub-THz systems, near-field XL-MIMO, IRS design, integrated sensing and communications, and wideband OTFS. The second usage is specific to sequential decision-making with expert advice, where Squint is defined through a mixture over learning rates and a second-order potential. The third usage is a proper algorithm name in reinforcement learning, where “resolution squinting” denotes a deliberate render-then-downsample observation pipeline rather than any electromagnetic effect (Cai et al., 2017, Luo, 3 Mar 2026, Almuzairee et al., 24 Feb 2026).

2. Beam squint as a wideband array phenomenon

In phased arrays, beam squint arises because the phase shifters are typically fixed for the carrier frequency and do not realize the exact time delays required at other frequencies. For a ULA with $N$ antennas, spacing $d$ , incident angle $\theta$ , and frequency $f$ , the steering vector is

$a(f,\theta)=\bigl[1,e^{j2\pi f(d\sin\theta)/c},\dots,e^{j2\pi f(N-1)(d\sin\theta)/c}\bigr]^T.$

If the beamformer is fixed at $f_c$ to focus at $\theta_F$ , the phase shifts are

$\beta_n=-2\pi f_c\,d\,(n-1)\sin\theta_F/c.$

Using the virtual-angle notation $\psi=\sin\theta$ and $\xi=f/f_c$ , the maximum gain occurs when $d$ 0, so the effective pointing direction is $d$ 1, and the squint in virtual angle is

$d$ 2

For small fractional offset, $d$ 3, with $d$ 4 (Cai et al., 2017).

An equivalent formulation in wideband OFDM uses the subcarrier frequencies

$d$ 5

and the equivalent spatial angle

$d$ 6

The dimensionless squint factor is

$d$ 7

so $d$ 8. This makes explicit that wider bandwidths cause different subcarriers to “see” different steering directions even when the array geometry is fixed (Yu et al., 2021).

The same mechanism appears beyond ULAs. In UPAs, the normalized array gain becomes the product of two Dirichlet terms, one per spatial dimension, and the paper on THz communications summarizes the severity through a Beam Squint Ratio,

$d$ 9

With half-wavelength spacing and fixed total $\theta$ 0, the BSR is minimized by a square UPA, and the paper states

$\theta$ 1

In a separate wideband large-scale MIMO analysis, the closed-form beam-squint ratio for a ULA is

$\theta$ 2

which again scales linearly with antenna count and fractional bandwidth (Ma et al., 2023, Ma et al., 2022).

Near-field wideband systems generalize angular squint to spatial squint. A phase-shifter vector that focuses at $\theta$ 3 for $\theta$ 4 only perfectly focuses the central subcarrier; the remaining subcarriers shift to different $\theta$ 5. In near-field ISAC work this is described as a continuous spatial trajectory of beam foci across subcarriers, and in wideband XL-MIMO it is the basis for controllable beam squint with true-time-delay lines (Luo et al., 2023, Lei et al., 2024).

Performance degradation follows directly from this frequency dependence. For a single-path OFDM channel, the wideband capacity with beam squint,

$\theta$ 6

satisfies

$\theta$ 7

with strict inequality except at broadside. In fixed-size codebooks with idealized beams, the average spectral efficiency for small $\theta$ 8 satisfies

$\theta$ 9

so the drop is linear in the squint factor (Cai et al., 2017, Yu et al., 2021).

3. Compensation and suppression in communication systems

A first line of work treats beam squint as a codebook-design problem. In switched-beam systems, each beam $f$ 0 is assigned a coverage set

$f$ 1

and the objective is to minimize codebook size subject to complete angular coverage. The resulting beam-alignment procedure starts at broadside, finds coverage edges by binary search, mirrors beams symmetrically, and continues until the target interval is covered. The paper reports that, for $f$ 2, $f$ 3 GHz, and $f$ 4 GHz, the squint-aware design yields up to $f$ 5 higher minimum capacity; at the same operating point, roughly $f$ 6 beams are needed rather than $f$ 7 if squint is ignored. It also identifies a supremum $f$ 8 beyond which no finite codebook can satisfy the capacity constraint (Cai et al., 2017).

A second line assumes the codebook size is fixed and optimizes beam shapes. One formulation samples the expanded angular interval seen under squint, introduces weights $f$ 9, and maximizes a weighted sum of $a(f,\theta)=\bigl[1,e^{j2\pi f(d\sin\theta)/c},\dots,e^{j2\pi f(N-1)(d\sin\theta)/c}\bigr]^T.$ 0 subject to $a(f,\theta)=\bigl[1,e^{j2\pi f(d\sin\theta)/c},\dots,e^{j2\pi f(N-1)(d\sin\theta)/c}\bigr]^T.$ 1. Because these constraints are non-convex, the paper applies the Concave–Convex Procedure, linearizes each quadratic form around the current iterate, and solves the resulting convex program in CVX. The reported outcome is that enlarged coverage alone only partially recovers edge-subcarrier rates, whereas the full optimization slows the spectral-efficiency degradation; at $a(f,\theta)=\bigl[1,e^{j2\pi f(d\sin\theta)/c},\dots,e^{j2\pi f(N-1)(d\sin\theta)/c}\bigr]^T.$ 2, the proposed codebook recovers $a(f,\theta)=\bigl[1,e^{j2\pi f(d\sin\theta)/c},\dots,e^{j2\pi f(N-1)(d\sin\theta)/c}\bigr]^T.$ 3 more spectral efficiency than DFT, and the design guideline given is $a(f,\theta)=\bigl[1,e^{j2\pi f(d\sin\theta)/c},\dots,e^{j2\pi f(N-1)(d\sin\theta)/c}\bigr]^T.$ 4 at worst-case bandwidth (Yu et al., 2021).

Hybrid beamforming generalizes the mitigation problem to array architecture. For THz UPAs, one proposed design builds the frequency-flat analog combiner from the dominant eigenvectors of the subcarrier-averaged sample covariance and then applies phase-only projection. Because the analog part is derived from all subcarriers rather than only the carrier, it is less sensitive to squint. The same study emphasizes that a square UPA is intrinsically more robust than a ULA of the same aperture. In switch-based HBF, the reported contrast is sharper: the expected array gain of PS-based beamforming decreases monotonically with BSR and approaches $a(f,\theta)=\bigl[1,e^{j2\pi f(d\sin\theta)/c},\dots,e^{j2\pi f(N-1)(d\sin\theta)/c}\bigr]^T.$ 5, whereas the switch-based approximation is

$a(f,\theta)=\bigl[1,e^{j2\pi f(d\sin\theta)/c},\dots,e^{j2\pi f(N-1)(d\sin\theta)/c}\bigr]^T.$ 6

which stays above $a(f,\theta)=\bigl[1,e^{j2\pi f(d\sin\theta)/c},\dots,e^{j2\pi f(N-1)(d\sin\theta)/c}\bigr]^T.$ 7 when many antennas are connected. Under $a(f,\theta)=\bigl[1,e^{j2\pi f(d\sin\theta)/c},\dots,e^{j2\pi f(N-1)(d\sin\theta)/c}\bigr]^T.$ 8 GHz, $a(f,\theta)=\bigl[1,e^{j2\pi f(d\sin\theta)/c},\dots,e^{j2\pi f(N-1)(d\sin\theta)/c}\bigr]^T.$ 9, $f_c$ 0, and bandwidth up to $f_c$ 1 GHz, the proposed SW-HBF is reported to achieve over $f_c$ 2 higher spectral efficiency than $f_c$ 3-bit FC-PS-HBF and energy-efficiency gains exceeding $f_c$ 4 at $f_c$ 5 dB in the single-user case (Ma et al., 2023, Ma et al., 2022).

Other mitigations use diversity, delay elements, or geometric reconfiguration. A constant-modulus beamformer designed by semidefinite relaxation can switch between direct beamforming and Alamouti STBC based on the rank structure of the relaxed solution; at $f_c$ 6 GHz and $f_c$ 7, the reported band-edge gain loss is reduced from $f_c$ 8 dB to $f_c$ 9 dB, with throughput gains of $\theta_F$ 0 at $\theta_F$ 1 GHz bandwidth and $\theta_F$ 2 at $\theta_F$ 3 GHz bandwidth. Delay-adjustable metasurfaces for THz IRS communications choose per-element phase shifts and time delays to cancel the affine-in-frequency phase terms; in the representative $\theta_F$ 4 GHz, $\theta_F$ 5 GHz, $\theta_F$ 6 setting, the beam-gain ripple is reported as less than $\theta_F$ 7 in the far field and restored to within $\theta_F$ 8 of the peak in the near field. Wideband near-field suppression via movable antennas formulates a max–min analog-gain problem over antenna positions and solves it with an SGDA-based block-coordinate procedure, producing an almost perfectly flat gain curve across $\theta_F$ 9– $\beta_n=-2\pi f_c\,d\,(n-1)\sin\theta_F/c.$ 0 GHz. A related THz design jointly optimizes analog beamforming and 3D rotation; its reported minimum-gain improvement is $\beta_n=-2\pi f_c\,d\,(n-1)\sin\theta_F/c.$ 1 dB versus no rotation (Liu et al., 2018, Hao et al., 2022, Zhu et al., 2024, Xie et al., 11 Mar 2025).

Beam squint can also couple with other wideband impairments. In massive MIMO-OTFS, beam squint and Doppler squint form a doubly-squint effect. The cited work derives a peak-index-based channel estimator and a hybrid precoder with TTD, phase-shifter, and OTFS-domain compensation, and reports that the proposed design outperforms Doppler-only, delay–phase, and conventional PDMA precoding in achievable delay–Doppler-grid rate (Duan et al., 11 Apr 2025).

4. Beam squint as a sensing and localization resource

A recurrent theme in recent ISAC work is that beam squint need not only be mitigated; it can be engineered as a frequency-domain scanner. In a wideband massive-MIMO OFDM system with TTD lines, one design chooses the phase-shifter setting $\beta_n=-2\pi f_c\,d\,(n-1)\sin\theta_F/c.$ 2 and delays $\beta_n=-2\pi f_c\,d\,(n-1)\sin\theta_F/c.$ 3 so that the beam points to $\beta_n=-2\pi f_c\,d\,(n-1)\sin\theta_F/c.$ 4 at $\beta_n=-2\pi f_c\,d\,(n-1)\sin\theta_F/c.$ 5 and to $\beta_n=-2\pi f_c\,d\,(n-1)\sin\theta_F/c.$ 6 at $\beta_n=-2\pi f_c\,d\,(n-1)\sin\theta_F/c.$ 7. The resulting main-lobe direction satisfies

$\beta_n=-2\pi f_c\,d\,(n-1)\sin\theta_F/c.$ 8

so the subcarriers sweep monotonically across the desired angular interval. If inter-element spacing is enlarged beyond $\beta_n=-2\pi f_c\,d\,(n-1)\sin\theta_F/c.$ 9, beam split creates additional lobes and expands the sensing range. The claimed operational consequence is that $\psi=\sin\theta$ 0 frequency-domain beams can be transmitted within a single OFDM symbol, reducing over-the-air training time by roughly a factor of $\psi=\sin\theta$ 1; with one extra intersection repeat for beam-split disambiguation, only two OFDM symbols are needed (Xu et al., 2022).

Near-field ISAC uses an analogous idea in joint angle–range space. For a phase-only near-field beamformer designed at $\psi=\sin\theta$ 2 and reference frequency $\psi=\sin\theta$ 3, the squinted focus on subcarrier $\psi=\sin\theta$ 4 follows

$\psi=\sin\theta$ 5

With TTDs, the start and end points of the trajectory can be fixed at chosen anchors $\psi=\sin\theta$ 6 and $\psi=\sin\theta$ 7, allowing the system to “draw” a trajectory of beam foci across the near-field volume. Localization then reduces to identifying the peak-power subcarrier and, in higher-accuracy variants, combining multiple sweeps and phase differences. The reported CBS-Low method reduces beam sweeps by $\psi=\sin\theta$ 8, while CBS-High with $\psi=\sin\theta$ 9 sweeps yields angle RMSE $\xi=f/f_c$ 0 and range RMSE $\xi=f/f_c$ 1 m at $\xi=f/f_c$ 2 dB SNR (Luo et al., 2023).

Wideband XL-MIMO localization extends this idea further by combining controllable beam squint and deep learning. The cited formulation derives CRBs for joint angle–range estimation under spatial non-stationarity, then proposes a three-stage CBS-based beam-training procedure: coarse angle, angular refinement by subcarrier grouping, and iterative range refinement. A ConvNeXt model then consumes the measurements and coarse estimates and regresses $\xi=f/f_c$ 3 directly. The reported performance is centimeter-level accuracy, with $\xi=f/f_c$ 4 cm and $\xi=f/f_c$ 5 rad at $\xi=f/f_c$ 6 dB (Lei et al., 2024).

These results correct a common oversimplification in beam-squint discussions. The phenomenon is indeed harmful to communication gain and rate under carrier-designed phase-only beamforming, but the same frequency dependence becomes informative when subcarriers are deliberately assigned distinct directions or focal points. This is the central methodological bridge between the mitigation literature and the sensing/localization literature (Xu et al., 2022, Luo et al., 2023, Lei et al., 2024).

5. Squint in online learning with expert advice

In online learning, Squint is a second-order algorithm for the expert problem. At round $\xi=f/f_c$ 7, the learner chooses $\xi=f/f_c$ 8, observes losses $\xi=f/f_c$ 9, and incurs $d$ 00. The instantaneous regret to expert $d$ 01 is

$d$ 02

with cumulative regret $d$ 03 and second-order term $d$ 04. The defining Squint potential is

$d$ 05

and the original update is

$d$ 06

As summarized in later notes, this produces a simultaneous $d$ 07-quantile regret guarantee in terms of the variance of the $d$ 08-quantile expert (Luo, 3 Mar 2026).

The 2022 changing-environment extension, Squint-CE, begins from the observation that a conventional black-box meta-wrapper destroys Squint’s favorable second-order behavior: the induced overhead $d$ 09 dominates the sublinear advantages coming from variance adaptation. Squint-CE therefore intertwines Squint’s surrogate reduction with a single layer of exponential-weights meta-combination over geometric intervals. For every contiguous interval $d$ 10, it guarantees

$d$ 11

where

$d$ 12

In big- $d$ 13 form, the bound is

$d$ 14

so the changing-environment version preserves second-order dependence on interval variance up to logarithmic factors (Neuteboom et al., 2022).

A 2026 note proposes a simple variant that replaces the expert-specific $d$ 15 by a single global $d$ 16. The algorithm still computes

$d$ 17

but then updates $d$ 18, where $d$ 19 is chosen as the root of

$d$ 20

with

$d$ 21

The resulting $d$ 22-quantile regret bound has the same form as the original except that it depends on the global $d$ 23 rather than $d$ 24, and the note states that it resembles the guarantee obtained by Freund et al. for a variant of NormalHedge (Luo, 3 Mar 2026).

Within this literature, the main conceptual distinction is therefore between per-expert and global second-order control, and between static and changing-environment regret. The term “Squint” refers to the family of algorithms built around the same potential-based, learning-rate-mixture construction rather than to a single fixed update rule (Neuteboom et al., 2022, Luo, 3 Mar 2026).

6. Squint in fast visual reinforcement learning for robotics

In robotics, Squint is an off-policy, vision-based actor–critic algorithm built on Soft Actor–Critic and designed to minimize wall-clock training time in massively parallel GPU simulation while transferring zero-shot to a real $d$ 25 DoF SO-101 robot arm. The high-level loop uses $d$ 26 parallel ManiSkill3 environments at $d$ 27 Hz, renders wrist-camera RGB images at $d$ 28, downsamples them to $d$ 29, appends proprioception, and stores transitions in a GPU-resident replay buffer of size $d$ 30 M. After each environment step it performs $d$ 31 gradient updates of a shared two-layer CNN encoder, two C51-style distributional critics and their EMA targets, a stochastic Gaussian policy, and an entropy temperature $d$ 32 (Almuzairee et al., 24 Feb 2026).

The defining design choice is resolution squinting. The observation is not rendered directly at low resolution; instead the simulator renders at $d$ 33 and applies area-interpolation downsampling to $d$ 34,

$d$ 35

The paper attributes two effects to this pipeline: lower compute for the two-layer CNN encoder and natural anti-aliasing that preserves object shape under heavy domain randomization. This component is coupled with LayerNorm after every linear layer in actor and critic heads, a tuned update-to-data ratio of roughly $d$ 36, and a systems stack using torch.compile, CUDA Graphs, mixed-precision $d$ 37 convolutions, and an entirely on-GPU replay buffer. The implementation is reported to achieve more than a $d$ 38 end-to-end speed-up over a naive off-policy visual agent (Almuzairee et al., 24 Feb 2026).

The learning objectives retain standard SAC structure but add a distributional critic loss. The critic minimizes a soft Bellman residual, the actor minimizes

$d$ 39

the temperature is adapted with a target-entropy loss, and each critic also minimizes a C51 cross-entropy to a projected Bellman target distribution. The encoder itself is deliberately small: two $d$ 40 convolutional layers with channels $d$ 41, ReLU activations, and batch size $d$ 42 (Almuzairee et al., 24 Feb 2026).

Empirically, Squint is evaluated on eight SO-101 manipulation tasks—Reach Cube, Reach Can, Lift Cube, Lift Can, Place Cube, Place Can, Stack Cube, Stack Can—with heavy visual and physical domain randomization. Policies are trained for $d$ 43 minutes on a single NVIDIA RTX 3090 GPU. The reported simulation mean success rate over all eight tasks after $d$ 44 minutes is $d$ 45, and most tasks converge in under $d$ 46 minutes. In the real world, the zero-shot success rate across $d$ 47 trials is $d$ 48, which the paper describes as a $d$ 49 absolute improvement over state-to-visual DAgger at $d$ 50 when accounting for the time to train its state-based expert. A visual robustness ablation reports that removing color jitter reduces success from $d$ 51 to $d$ 52 (Almuzairee et al., 24 Feb 2026).

In this usage, “Squint” is not related to beam steering or expert-advice regret. It is the proper name of a visual SAC system whose central innovations are parallel simulation, a distributional critic, anti-aliased low-resolution observations, tuned update-to-data ratio, and GPU-level implementation choices (Almuzairee et al., 24 Feb 2026).