Turbo-Muon: Beam Compression & ML Optimization

Updated 7 December 2025

Turbo-Muon is a family of techniques that compress muon beam phase space by 10¹⁰, producing ultra-cold, high-brightness beams for precision particle physics applications.
It also introduces a machine learning preconditioning method that accelerates Newton-Schulz orthogonalization, reducing computational iterations and speeding up convergence.
Additionally, Turbo-Muon methods extend to FPGA-based muon trigger systems, achieving sub-microsecond latency and high spatial resolution for real-time track reconstruction.

Turbo-Muon refers to a family of high-efficiency, high-brightness muon manipulation concepts, divided into two principal research streams: (1) an advanced suite of phase-space compression and extraction technologies to produce ultra-cold, ultrabright low-energy muon beams for particle physics applications, building on the muCool methodology; and (2) recent developments in large-scale orthogonality-based optimization for machine learning, specifically the Turbo-Muon optimizer, which accelerates Newton-Schulz orthogonalization via a matrix preconditioning scheme. Both lines draw on the core principle of efficient transformation of input distributions—physical or algorithmic—into tightly controlled, application-optimized states.

1. Turbo-Muon in Physical Muon Beam Compression

Turbo-Muon sources are based on an integrated series of physical processes that compress the six-dimensional phase space of conventional surface muon beams by $\sim10^{10}$ while maintaining a net transmission efficiency of $10^{-3}$ , as realized in the muCool program at PSI (Antognini et al., 2018, Belosevic et al., 2019, Bao et al., 2014). The process can be decomposed into several stages:

Stopping: Surface muons ( $p \approx 11$ –13 MeV/c, intensity $O(10^7\,\mu^+/s)$ ) are injected into a helium gas cell inside a $5\,\mathrm{T}$ solenoidal magnet. Approximately 1% of muons stop in the active region, yielding a stopping efficiency $O(10^{-2})$ .
Transverse Compression: A cryogenic helium cell ( $T=4{-}12\,$ K, $p\sim1{-}10$ mbar) with a vertical density gradient is subjected to crossed electric ( $E_x=E_y\approx 1$ kV/cm) and magnetic ( $B=5$ T, $\hat{z}$ ) fields. The position-dependent collision frequency $\nu(y)$ produces a drift of the muon swarm that collapses its $y$ -extent from $\pm15$ mm to a few mm in $\lesssim 2\,\mu$ s.
Longitudinal Compression: In a subsequent room-temperature, low-pressure ( $\sim$ 5 mbar) He cell, an axial electric field ( $E_z\approx\pm50$ V/cm) focuses muons along $z$ to within $\lesssim 1$ mm from an initial distribution of $20$ cm, also in $\lesssim2\,\mu$ s.
Extraction and Re-acceleration: The fully-compressed muon packet is extracted via an orifice (diameter $\sim1$ mm) into vacuum using an $E_y\times B$ drift. Muons are then re-accelerated to keV energies for downstream use (Antognini et al., 2018, Belosevic et al., 2019).

The net result is a beam with $10^3$ – $10^4$ bunched $\mu^+$ per pulse, normalized emittance $\sim$ 0.1 mm mrad, and energy spread $\sim$ 1 eV at kHz rates, suitable for high-precision spectroscopy, $\mu$ SR, or as a muon collider front end.

2. Physical Principles and Design Parameters

The phase-space compression mechanism exploits the drift velocity of muons in crossed electric and magnetic fields, in the presence of high-frequency $\mu$ –He collisions:

$\vec v_D = \frac{\mu |\vec E|}{1+\omega^2/\nu^2} \left[\hat{E} + \frac{\omega}{\nu}\,\hat{E}\times\hat{B} + \frac{\omega^2}{\nu^2}\,(\hat{E}\cdot\hat{B})\hat{B} \right],$

where $\mu = \frac{e}{m\,\nu}$ is the mobility, $\omega = \frac{eB}{m}$ the cyclotron frequency, and the regime $\omega \gg \nu$ ensures drift predominantly along $\hat{E}\cdot\hat{B}$ (longitudinal compression) or $\hat{E}\times\hat{B}$ (transverse steering).

Transverse Stage: Gas-density gradients ( $\nu(y)$ ) and field control ensure compression trajectories converging into a sub-mm cross-section. The duration and magnitude of field gradients are tuned so that the majority of muons survive until extraction, given $\tau_{\mu^+}=2.2\,\mu$ s.
Longitudinal Stage: The V-shaped potential $V(z)$ , implemented by stepped electrodes, generates a focusing drift along $z$ ; simulation and experiment verify compression from $L_i\approx200$ mm to $L_f \lesssim 6$ mm within $t_{\mathrm{comp}}\sim2{-}4\mu$ s (Bao et al., 2014).
Extraction: Electrostatic acceleration and high-vacuum orifices minimize charge-exchange and enable beam transport into precision experiments.

The table below summarizes the core parameters:

Stage	Gas/Temp./Pressure	Fields	Compression Effect
Transverse	He, 4–12 K, 1–10 mbar (gradient)	$E_x=E_y\approx1$ kV/cm, $B=5$ T	$\Delta y$ : $\sim$ 20 mm $\to$ 1 mm
Longitudinal	He, 300 K, 5 mbar	$E_z\approx\pm50$ V/cm, $B=5$ T	$\Delta z$ : 200 mm $\to<$ 6 mm
Extraction	He→vac., orifice 1–2 mm	$E_y\times B$ drift, HV columns	Spot $\sigma_{x,y}<1$ mm

3. Experimental Demonstration and Performance

Transverse and longitudinal compression, as well as extraction, have been demonstrated using surface-muon beams at Paul Scherrer Institute:

Transverse compression: $\Delta y$ reduction by $>10\times$ within a drift length of 35 mm and time $\lesssim2\,\mu$ s (Belosevic et al., 2019).
Longitudinal compression: A swarm originally $L_i\approx200$ mm is focused to $L_f\lesssim6$ mm with $>90\%$ survival (neglecting post-extraction losses) (Bao et al., 2014). Timing and spatial profiles match full GEANT4 simulations for the relevant field and gas parameters.
Extraction demonstration: Muon packets are observed to drift perpendicularly to $B$ and pass through a 1 mm orifice with eV energy spreads, then re-accelerated to preserve phase-space quality. Projected beams achieve normalized transverse emittance $0.1{-}1$ mm mrad and energy spread $<1$ eV (Antognini et al., 2018).

The total phase-space reduction is $10^{10}$ ; overall transmission is $10^{-3}$ , primarily limited by stopping efficiency and in-gas survival.

4. Engineering and Operational Challenges

Turbo-Muon sources at scale require solutions to several technical challenges (Antognini et al., 2018):

High-voltage, fast-rise pulsing in high magnetic fields and gaseous environments.
Continuous, ppm-level helium purity to suppress Mu formation and contamination.
High-throughput differential pumping to protect ultra-high vacuum acceleration stages.
Sub-100 μm field alignment precision to minimize aberrations.
Thermal management and voltage standoff at cryogenic temperatures.
Scaling from $10^4$ μ $^+$ /s (single-cell tests) to $10^7$ – $10^8$ μ $^+$ /s for routine operation, avoiding sparking and space-charge.

These aspects are critical for integration into large-scale facilities, future high-luminosity muon colliders, and next-generation precision experiments needing ultra-cold beams.

5. Turbo-Muon Algorithms in Orthogonality-Based Optimization

Turbo-Muon also denotes a matrix preconditioning technique that accelerates the convergence of orthogonality-based optimizers (e.g., Muon) in large-scale machine learning (Boissin et al., 4 Dec 2025). Orthogonality constraints on gradients or weight updates (requiring $Q^\top Q=I$ ) are enforced by projecting onto the polar factor via Newton-Schulz (NS) iterations. The standard NS scheme is computationally expensive for large $n$ ( $\mathcal{O}(n^3)$ per iteration) and typically requires $5$–$9$ iterations for error $\varepsilon\sim10^{-3}$ .

Turbo-Muon introduces an “almost-orthogonal layer” (AOL) diagonal preconditioner: $P = \mathrm{diag}\Bigl(\sum_j |X_0^\top X_0|_{ij}\Bigr)^{-1/2},\qquad X_1 = X_0 P$ where $X_0$ is the input matrix. Preconditioning reduces the Gram matrix condition number, leading to smaller initial error for NS iterations and effectively enabling one full NS iteration to be dropped for constant accuracy. Empirical results demonstrate $2.2$– $2.8\times$ speedup in the NS subroutine for $n=8192$ , with $5$– $10\%$ net improvement in end-to-end model training runtime—without any need for hyperparameter adjustment. The core descent property is preserved, and integration into PyTorch workflows requires only minimal modification to the optimizer's step logic. The method is available at https://github.com/thib-s/flash-newton-schulz.

6. Turbo-Muon in Real-Time Muon Trigger and Track Reconstruction

A further usage of Turbo-Muon is in FPGA-based muon trigger demonstrators for high-energy physics detectors (Migliorini et al., 2021). In this context, Turbo-Muon refers to a low-latency, pipelined pipeline that integrates compact neural networks and analytical methods for online drift-tube muon track reconstruction:

Hit grouping and de-multiplexing: Buffering and unique assembly of macro-cell hits from raw digitized data streams.
Filtering Neural Network (F-NN): 16 input coarse timestamps processed to select the best spatially separated hits; implemented as a small (20-unit) quantized NN.
Disambiguation Neural Network (D-NN): Assigns left/right unambiguous association to each selected hit; also a compact quantized NN.
Time pedestal/mean-timer stage: Resolves the global $t_0$ and track inclination $\varphi$ analytically using precomputed combinatorial formulae.
Track parameter extraction: Fixed-point least-squares fitting for local track slope and intercept.

The entire pipeline offers $<1\,\mu$ s end-to-end latency at $40$ MHz clock, per-macrocell spatial resolution $\lesssim250\,\mu$ m, efficiency $>99\%$ , and minimal FPGA resource footprint. In both simulation and cosmic-ray data, ghost rate is $<1\%$ and timing resolution is $3$–$4.1$ ns, vastly outperforming pure analytical implementations at high rates. Extensions to multi-chamber tiling and vertical integration are under way for high-luminosity LHC upgrades.

7. Impact and Prospects

Turbo-Muon methodologies, in both beam physics and algorithmic optimization, have substantially advanced state-of-the-art in their respective domains:

In particle physics, Turbo-Muon sources will enable new regimes in $\mu^+$ beam brightness, collimation, and timing, directly impacting muon $g-2$ , EDM, muonium spectroscopy, and planned muon collider injectors. The $10^{10}$ phase-space compression at $O(10^{-3})$ efficiency unlocks orders of magnitude gain in usable cold muon flux, providing $\sim10^5$ – $10^6 \mu^+/$ s at sub-eV energies (Antognini et al., 2018, Belosevic et al., 2019). The methodology sets a template for gas-based cooling and high-field extraction in other particle species.
In optimization, Turbo-Muon’s preconditioning accelerates orthogonality-based training across vision and LLMs by a consistent factor, establishing a computational “drop-in” for fast NS-like matrix operations and stimulating further research in generalized preconditioning and matrix manifold optimization (Boissin et al., 4 Dec 2025).
In real-time signal processing, Turbo-Muon FPGA demonstrators validate hybrid neural/analytic pipelines for sub-microsecond, high-fidelity event characterization in high-background environments, with planned deployment in major detector upgrades (Migliorini et al., 2021).

Collectively, Turbo-Muon approaches exemplify the fusion of advanced mathematical, experimental, and computing techniques in pursuit of precision and efficiency at both the physical and algorithmic frontiers.