Structure-Aware Scaled Multiplexing

Updated 4 July 2026

Structure-Aware Scaled Multiplexing is a design paradigm that exploits inherent channel structures (like optical modes or delay–scale geometry) to enable scalable and efficient signal routing.
It employs tailored control mechanisms—such as mode-specific switching and phase-decoupled processing—to bound interference and manage decoding complexity.
Its practical realizations span integrated photonics, DSP, and computational systems, demonstrating marked improvements in throughput and resource efficiency.

to=arxiv_search.search 彩神争霸可以json {"query":"all:\"Integrated switch for simultaneous mode-division multiplexing (MDM) and wavelength-division multiplexing (WDM)\" OR id:(Stern et al., 2015)","max_results":5,"sort_by":"relevance"} to=arxiv_search.search 彩经彩票json {"query":"all:\"Wavelength-Selective Switches for Mode-Division Multiplexing: Scaling and Performance Analysis\" OR id:(Ho et al., 2014)","max_results":5,"sort_by":"relevance"} to=arxiv_search.search 玩大发快三json {"query":"\"structure-aware\" multiplexing arXiv photonics LLM serving", "max_results": 10, "sort_by": "submittedDate"} Structure-aware scaled multiplexing is a design paradigm in which multiplexing capacity is increased by exploiting the internal structure of the objects being multiplexed—optical modes and wavelengths, mode-coupling patterns, parity-check locality, delay–scale channel geometry, inference phases, or multiplexed input streams—rather than treating all channels as interchangeable. In the cited literature, the term consistently denotes two linked operations: first, identifying structural coordinates that are either orthogonal or only sparsely coupled; second, assigning control, routing, decoding, or scheduling mechanisms that are matched to those coordinates. The consequence is that scaling is achieved by multiplying along structured dimensions while keeping interference, control overhead, or decoding complexity bounded through locality, sparsity, or phase-specific control (Stern et al., 2015, Li et al., 2023, Cui et al., 20 Apr 2025, Li et al., 4 Jul 2025).

1. Conceptual basis

In the cited works, structure-awareness is not a synonym for mere multiplexing density. It refers to exploiting a physically or algorithmically meaningful basis in which the multiplexed system becomes easier to control. In integrated photonics, the relevant basis may be the modal eigenstates TE0 and TE1 together with wavelength channels, so that throughput scales as $C = M \times W \times R$ (Stern et al., 2015). In scaled mode-selective switching for graded-index fiber, the relevant structure is the spatial extent and coupling behavior of Laguerre–Gaussian mode groups, captured by the scaling factor $K$ and by mode-coupling matrices (Ho et al., 2014). In spatially coupled coding for SDM, the structural basis is the decomposition into sub-blocks with local checks and coupled checks, so that only the minimal extrinsic information required by the coupled edges must cross decoder boundaries (Li et al., 2023).

A similar pattern appears in computing systems. In PD-multiplexed LLM serving, the exploited structure is the prefill/decode phase split together with persistent KV-cache locality; scaling then comes from in-place, phase-decoupled compute partition on shared GPUs rather than from simple disaggregation (Cui et al., 20 Apr 2025). In PruMUX, the relevant axes are the multiplexing factor $m$ and structured sparsity $s$ , with throughput modeled as $T(m,s) \approx m \cdot S_{\mathrm{prune}}(s) \cdot T_0$ (Su et al., 2023). In AFDM over wideband doubly-dispersive channels, the structure is the delay–scale geometry induced by time-scaling, which becomes sparse in the DAF domain after suitable chirp design and CPP/CPS insertion (Li et al., 4 Jul 2025).

Domain	Structural basis	Scaling mechanism
Integrated photonics	Spatial modes and wavelengths	$C = M \times W \times R$ (Stern et al., 2015)
MDM WSS	Mode size and coupling structure	Optical scaling via factor $K$ (Ho et al., 2014)
SC-LDPCL for SDM	Sub-block locality and coupled checks	Helper/window width bounded by locality (Li et al., 2023)
AFDM	Delay–scale sparsity in DAF domain	Sparse path-aligned support (Li et al., 4 Jul 2025)
LLM serving	Prefill/decode phases and KV locality	Phase-decoupled multiplexing on shared GPUs (Cui et al., 20 Apr 2025)
PruMUX	Multiplexing factor and structured sparsity	Compound throughput scaling (Su et al., 2023)

A plausible implication is that structure-aware scaled multiplexing is best understood as a systems principle rather than a domain-specific technique: identify the right coordinates, transform into them if necessary, process there, and only then recombine.

2. Optical and photonic realizations

The most explicit photonic instantiation appears in the silicon 1×2 switch for simultaneous MDM and WDM. Its core idea is to convert each multimode signal temporarily into the single-mode TE0 domain, process each mode–wavelength lane with single-mode ring resonators and heaters, and then reconvert to the original mode. The device uses a 930 nm multimode bus, a 450 nm single-mode waveguide, and phase matching between TE1 in the bus and TE0 in the single-mode guide at $n_{\mathrm{eff}} \approx 2.46$ . With $M = 2$ modes, $W = 2$ wavelengths, and $K$ 0 Gbps NRZ, it demonstrates $K$ 1 Gbps per multimode input/output, intermodal crosstalk below $K$ 2 dB, BER below $K$ 3 for separately routed channels, and a switch area below $K$ 4 mm $K$ 5 (Stern et al., 2015). The important structural point is that the design does not attempt direct multimode switching with a single element; it separates by mode, processes uniformly in single mode, and recombines.

A different optical realization appears in wavelength-selective switches for mode-division multiplexing over graded-index fiber. There the design problem is not per-mode access on a high-index-contrast chip, but scaling a single-mode WSS so that multimode beams with larger effective radii can be switched with preserved passband behavior. The analysis is expressed through a mode-clipping model and mode-coupling matrices. In systems with substantial mode coupling, all modes at a given wavelength must be switched as a unit to preserve MIMO assumptions and minimize ROADM port count. For a graded-index fiber with five mode groups and 50-GHz spacing, the one-sided bandwidth can vary by up to $K$ 6 GHz, and different optical scaling strategies trade off port count, pixel pitch, and grating dispersion (Ho et al., 2014). This work establishes an important boundary condition: structure-awareness can require either finer per-mode access or coarser mode-as-a-unit handling, depending on whether the physical platform suppresses or randomizes modal coupling.

Broadband on-chip mode conversion provides another realization. The three-mode converter and multiplexer based on cascaded symmetric Y-junctions, a 4×4 MMI, and a single switchable phase shifter exploits symmetry-controlled supermode synthesis: in-phase and anti-phase combinations at the arms of a symmetric Y-junction generate specific stem modes. With subwavelength grating engineering, the device reports simulated simultaneous insertion loss below $K$ 7 dB over $K$ 8 nm and simultaneous crosstalk below $K$ 9 dB over $m$ 0 nm, while supporting TE0, TE1, TE2, and switchable TE3 selection through one switchable phase shifter (González-Andrade et al., 2023). The scaling rule is explicit: for $m$ 1 modes, the number of junction stages satisfies $m$ 2.

Few-mode-fiber interfacing extends the same principle to chip-to-fiber mode synthesis. The integrated multichannel silicon mode multiplexer for FMFs combines a two-dimensional MMGC, compact mode size converters based on a subwavelength Mikaelian lens, adiabatic directional couplers, and eight thermo-optic phase shifters. It selectively launches eight spatial and polarization channels with measured peak efficiencies of $m$ 3 dB for LP01, $m$ 4 dB for LP11a, $m$ 5 dB for LP11b, and $m$ 6 dB for LP21b, while the MMGC and MSC block occupies only $m$ 7 $m$ 8m $m$ 9 (Zhou et al., 2023). Here the structure being exploited is the degeneracy and polarization diversity of LP mode groups in weakly guiding circular FMFs.

At the level of free-space structured light, the R–D–R cascade for multiplexed vector beam conversion shows that static structured matter can satisfy three arbitrary input–output relations simultaneously. The device consists of a retarder, a horizontal-axis diattenuator, and a second retarder, all spatially varying per pixel. Its accessible nondepolarizing Mueller-matrix family has enough degrees of freedom to satisfy three independent mappings, but a fourth arbitrary mapping generally over-constrains the design family. This makes passive TDM and passive WDM simultaneously possible within one static element, and the paper demonstrates generation and conversion of Stokes skyrmions through this framework (Zhang et al., 28 Dec 2025).

3. Coding, modulation, and channel-aware signal processing

In communication theory, structure-aware scaled multiplexing appears most clearly when the channel or code admits a sparse or local representation. SC-LDPCL for SDM maps each spatial channel to a sub-block in a coupled LDPC chain, with local checks confined to one sub-block and a fraction $s$ 0 of checks serving as coupled checks to neighboring sub-blocks. The resulting band-diagonal parity-check matrix supports separate decoding, full joint decoding, and semi-joint variants such as SJ, SJVar, and SJ-HD. For the regular $s$ 1 ensemble, separate decoding requires about $s$ 2 dB to reach BER $s$ 3, SJ with $s$ 4 requires about $s$ 5 dB, SJVar about $s$ 6 dB, SJ-HD about $s$ 7 dB, and joint decoding about $s$ 8 dB (Li et al., 2023). The central structural idea is that only the extrinsic messages associated with coupled checks need to traverse decoder boundaries; the total system does not require raw-stream exchange or monolithic joint processing.

Principal-mode processing in multimode SDM uses an analogous strategy at the receiver front end. By diagonalizing the Wigner–Smith or transfer-matrix delay operator, the system identifies principal modes whose eigenvectors are frequency-invariant to first order. In the reported 50-km, 12-mode, 33-GBd, 16-QAM scenario, this yields more than $s$ 9 channel-memory reduction and allows operation with only $T(m,s) \approx m \cdot S_{\mathrm{prune}}(s) \cdot T_0$ 0 optical front-ends rather than all $T(m,s) \approx m \cdot S_{\mathrm{prune}}(s) \cdot T_0$ 1 modes, while maintaining constellation SNR close to the SVD benchmark (Barbosa et al., 2022). The complexity reduction follows the transition from naive $T(m,s) \approx m \cdot S_{\mathrm{prune}}(s) \cdot T_0$ 2 equalization to $T(m,s) \approx m \cdot S_{\mathrm{prune}}(s) \cdot T_0$ 3 with $T(m,s) \approx m \cdot S_{\mathrm{prune}}(s) \cdot T_0$ 4, which the paper summarizes as a realistic $T(m,s) \approx m \cdot S_{\mathrm{prune}}(s) \cdot T_0$ 5– $T(m,s) \approx m \cdot S_{\mathrm{prune}}(s) \cdot T_0$ 6 DSP complexity reduction.

Structure-aware modulation for multiuser superposition is represented by S-MUST. Instead of superposing full complex constellations in an undifferentiated way, S-MUST scales the in-phase and quadrature components independently via CPACs, so each user sees two scalar PAM problems rather than one 2D QAM detection problem. This enables IQ separation, lower-complexity SIC, and in the Cat.3 design a modulo-based parallel interference cancellation based on co-prime quantization. The reported system improves user fairness relative to conventional MUST, with a stated $T(m,s) \approx m \cdot S_{\mathrm{prune}}(s) \cdot T_0$ 7 spectral efficiency enhancement in symmetric conditions (Fang et al., 2018). The structural insight is that the legacy QAM alphabet already contains an internal decomposition into two independent 1D channels, and the multiplexing rule is designed to preserve that decomposition at the receiver.

AFDM over wideband doubly-dispersive channels extends the same logic to time-scaling. The wideband channel is modeled by path-dependent delay–scale kernels $T(m,s) \approx m \cdot S_{\mathrm{prune}}(s) \cdot T_0$ 8 rather than narrowband Doppler shifts, and AFDM uses chirp-periodic prefix and suffix to restore periodicity under pulse widening and shortening. In the DAF domain, each physical path contributes a narrow, affine support band whose location is determined by the delay $T(m,s) \approx m \cdot S_{\mathrm{prune}}(s) \cdot T_0$ 9, scale $C = M \times W \times R$ 0, and Doppler term $C = M \times W \times R$ 1. Chirp parameter optimization prevents overlap among these bands, and the CD-D-OAMP detector exploits sparsity in the time domain together with symbol priors in the DAF domain. Simulations in underwater acoustic and THz settings show that AFDM with the optimized chirp parameters outperforms OFDM, OCDM, OTFS, and AFDM with narrowband chirp design (Li et al., 4 Jul 2025).

4. Computational and inference-system realizations

In LLM serving, structure-aware scaled multiplexing is organized around the two-phase structure of inference. Drift’s PD-multiplexing decouples prefill and decode compute on shared GPUs while preserving in-place KV-cache reuse. It creates independent GreenContext partitions for prefill and decode, chooses among pre-created SM splits such as $C = M \times W \times R$ 2, $C = M \times W \times R$ 3, $C = M \times W \times R$ 4, and $C = M \times W \times R$ 5 on A100-SXM4-80GB, and uses adaptive gang scheduling, contention-free modeling, and SLO-aware dispatch. The reported evaluation shows an average $C = M \times W \times R$ 6 throughput improvement, up to $C = M \times W \times R$ 7, over state-of-the-art baselines while consistently meeting SLO targets under complex LLM workloads (Cui et al., 20 Apr 2025). The structure-aware element lies in recognizing that decode attention is memory-bound while other kernels are compute-bound, so spatial co-execution can be arranged with limited contention.

Tropical addresses the same prefill/decode dichotomy from a different architectural angle. It treats TTFT and TPOT as separate SLOs, maintains separate prefill and decode queues, and admits prefills to multiplexing workers only when TPOT slack and HBM thresholds permit. This yields a hybrid between non-disaggregated and disaggregated serving: it reduces prefill queuing without sacrificing decode smoothness. On InternLM-20B with Mooncake traces, Tropical achieves up to $C = M \times W \times R$ 8 more requests within $C = M \times W \times R$ 9 SLO attainment, improves P90 TTFT by up to $K$ 0 versus disaggregated serving, and delivers up to $K$ 1 improvement in P90 TPOT versus non-disaggregated serving while maintaining the same P90 TTFT (Ma et al., 15 Jun 2026). The key structural control variable is not a fixed partition, but the slack budget attached to decode iterations.

PruMUX applies the same general principle to transformer inference throughput. It combines DataMUX, which packs $K$ 2 equal-length inputs into one sequence using fixed Gaussian-coded masks, with CoFi structured pruning over layers, heads, hidden dimensions, and FFN dimensions. The multiplexing layer and demultiplexing layer preserve the Transformer core, while the pruned hidden dimension is co-pruned in the demultiplexer. Across GLUE tasks, the reported throughput gains over BERT-base are $K$ 3– $K$ 4 on MNLI, $K$ 5– $K$ 6 on QNLI, $K$ 7– $K$ 8 on QQP, and $K$ 9– $n_{\mathrm{eff}} \approx 2.46$ 0 on SST-2, depending on the accuracy threshold (Su et al., 2023). The scaling mechanism is explicitly two-axis: multiplex many inputs into one pass, then shorten that pass through structured sparsity.

5. Comparative interpretations and recurrent trade-offs

A recurring misconception is that structure-aware multiplexing always implies finer-grained control over every individual channel. The optical literature shows that this is contingent on the coupling regime. In silicon multimode waveguides with large index contrast $n_{\mathrm{eff}} \approx 2.46$ 1, TE0 and TE1 can be accessed selectively through phase-matched conversion and single-mode processing (Stern et al., 2015). In long-haul mode-division multiplexed WSSs with substantial mode coupling, by contrast, all modes at a given wavelength must be switched together, not independently (Ho et al., 2014). The structural unit of control is therefore platform-dependent.

A second misconception is that locality means complete separation. SC-LDPCL does not advocate purely independent decoders; it advocates confining most checks locally and limiting global exchange to the coupled checks that actually carry extrinsic information (Li et al., 2023). Likewise, Drift does not isolate prefill and decode into separate instances; it decouples compute while preserving in-place memory sharing (Cui et al., 20 Apr 2025). Tropical similarly avoids permanent role separation and instead uses slack-gated opportunistic co-location (Ma et al., 15 Jun 2026). The common design pattern is not isolation, but selective coupling.

A third recurrent issue is the trade-off between channel count and physical realizability. The R–D–R vector-beam framework can satisfy three arbitrary mappings simultaneously, but a fourth arbitrary mapping generally exceeds the physically realizable subset accessible to the cascade (Zhang et al., 28 Dec 2025). The three-mode Y-junction architecture scales with $n_{\mathrm{eff}} \approx 2.46$ 2, but doing so requires more Y-junctions and more phase-conditioning elements (González-Andrade et al., 2023). The optical and algorithmic literature therefore converges on the same conclusion: scaling by structure is powerful precisely because it is constrained by the geometry, dispersion, or locality of the underlying medium.

6. Scalability limits and future directions

The scalability of structure-aware scaled multiplexing is never cost-free. In the silicon MDM/WDM switch, total ring count scales with $n_{\mathrm{eff}} \approx 2.46$ 3, the number of independent heaters equals the number of rings, and thermal tuning power was reported up to about $n_{\mathrm{eff}} \approx 2.46$ 4 mW total for resonance alignment (Stern et al., 2015). In scaled multimode WSS design, the factor $n_{\mathrm{eff}} \approx 2.46$ 5 governs not only beam size but also port count, SLM pitch, Fourier optics dimensions, and passband compression; Design I–IV differ primarily in how these penalties are distributed (Ho et al., 2014). In the Y-junction mode-converter architecture, the number of Y-junctions in a full binary tree grows as $n_{\mathrm{eff}} \approx 2.46$ 6, while fixed and switchable phase shifters also grow with stage count (González-Andrade et al., 2023).

Coding and DSP systems exhibit analogous constraints. SC-LDPCL keeps per-target interconnect bounded by helper depth $n_{\mathrm{eff}} \approx 2.46$ 7 or window size $n_{\mathrm{eff}} \approx 2.46$ 8, not by the total number of modes $n_{\mathrm{eff}} \approx 2.46$ 9, but performance improves only gradually toward the joint-decoding bound as $M = 2$ 0 increases (Li et al., 2023). Principal-mode MIMO-DSP scales by keeping $M = 2$ 1, but this presupposes stable estimation of the delay operator and periodic reconfiguration of optical mappings (Barbosa et al., 2022). AFDM retains sparsity only when the chirp parameter $M = 2$ 2, blocklength $M = 2$ 3, and maximum scale $M = 2$ 4 satisfy explicit feasibility conditions; excessively large $M = 2$ 5 or severe time-scaling broadens the per-path support and erodes the sparse advantage (Li et al., 4 Jul 2025).

In computational systems, practical limits arise from state, memory, and calibration. Drift pre-creates GreenContext groups and records CUDA Graphs per batch size and context, incurring $M = 2$ 6 MB of graph-recording memory overhead across eight GPUs for both 8B and 70B models (Cui et al., 20 Apr 2025). Tropical is limited by collapse of decode slack under extreme burstiness and by the stateful constraint that decode workers cannot be reassigned arbitrarily without KV consequences (Ma et al., 15 Jun 2026). PruMUX encounters instability or unacceptable accuracy degradation at some high-sparsity, high-multiplexing operating points, such as $M = 2$ 7 for several tasks (Su et al., 2023).

These limits suggest a common future direction. A plausible implication is that the next stage of structure-aware scaled multiplexing will depend less on discovering new multiplexing axes than on learning how to co-optimize structural transformations, sparse control, and calibration overhead. The cited works already point toward that trajectory: tunable couplers and adaptive heater bias in integrated photonics (Stern et al., 2015), dispersion-engineered passive WDM in structured matter (Zhang et al., 28 Dec 2025), predictor-guided adaptive GPU partitioning (Cui et al., 20 Apr 2025), and task-specific meta-selection of multiplexing/pruning points in Auto-PruMUX (Su et al., 2023). Across domains, the same principle remains intact: scaling is most effective when the system is first rewritten in the coordinates in which it is naturally sparse, orthogonal, or local.