Mixed-Scheme & Intra-Layer Allocation

Updated 30 November 2025

Mixed-scheme and intra-layer allocation are methodologies that optimize resource distribution across and within layers using advanced mathematical and heuristic models.
They enable joint optimization in diverse applications, such as FPGA quantization, distributed training in deep neural networks, and network-coded communications.
Empirical results demonstrate significant improvements in throughput, reduced search times, and better resource utilization in modern heterogeneous systems.

Mixed-scheme and intra-layer allocation refer to rigorous methodologies for optimizing resource distribution and computational scheduling across and within layers of complex systems—ranging from distributed deep learning pipelines, memory-mapped architectures, and network-coded communications to FPGA model deployment. These approaches combine, unify, or specialize allocations across both the inter-layer (between layers or macro-scheduling) and intra-layer (within a single layer or fine-grained scheduling) dimensions, often leveraging advanced mathematical programming or heuristics. Mixed-scheme allocation typically signifies simultaneous use of multiple schemes or resource types; intra-layer allocation focuses on distributing resources or computation at fine granularity inside individual layers or substructures.

1. Foundational Concepts and Taxonomy

Mixed-scheme allocation encompasses techniques that simultaneously engage multiple resource modalities, coding schemes, or quantization strategies. For example, mixed-scheme quantization on FPGAs splits multiply-accumulate (MAC) operations within a layer across both DSP-bound (fixed-point) and LUT-bound (power-of-two or SPoT) arithmetic to fully exploit hardware heterogeneity (Chang et al., 2020). In distributed deep learning, mixed-scheme methods unify the global placement of layers across pipelines (inter-layer) with the selection of per-layer parallel strategies (intra-layer) such as data, tensor, and fully-sharded data parallelism (Lin et al., 2023).

Intra-layer allocation specifically refers to the fine-grained distribution of resources or task scheduling within a single layer, channel group, or memory region. For instance, intra-layer multi-precision quantization assigns bitwidths at the row or channel level rather than entire layers (Chang et al., 2020). In neural network training, intra-layer task splitting allows separate scheduling and execution of weight-gradient and activation-gradient computations within a layer, decoupling their dependencies and increasing parallelism (Unnikrishnan et al., 2021).

2. Unified Optimization Frameworks for Deep Learning

The UniAP framework (Lin et al., 2023) exemplifies state-of-the-art joint mixed-scheme and intra-layer allocation through a mixed-integer quadratic programming (MIQP) model. UniAP explicitly encodes both inter-layer decisions (placement of layers on pipeline stages) and intra-layer choices (parallelization scheme selection for each layer) as binary variables:

$P_{ui} \in \{0,1\}$ : Indicates assignment of layer $u$ to pipeline stage $i$ (inter-layer allocation).
$S_{uk} \in \{0,1\}$ : Encodes selection of intra-layer parallel scheme $k$ for layer $u$ .

Quadratic terms such as $P_{ui}P_{vi} (S_u^{T} R_{uv} S_v)$ naturally capture the interaction between inter-layer and intra-layer allocations, e.g., sharding alignment and local/remote reshard costs. The complete MIQP includes constraints for contiguousness, strict memory budgets, exact placement, and selection, with the objective to minimize training time per iteration (TPI) under the GPipe pipelining model:

$\min \; tpi_{gpipe} = \sum_{i=1}^{deg} p_i + \sum_{j=1}^{deg-1} o_j + (c-1)\,\max\bigl(\{p_i\}\cup\{o_j\}\bigr)$

Here, $p_i$ includes both computation and intra-stage communication costs for stage $i$ , and $o_j$ aggregates cross-stage communication costs. Enumeration over pipeline degree and micro-batch splits is performed to achieve a global optimum.

Experimental results across five large Transformer-based models show up to $1.71\times$ throughput improvement and $107\times$ reduction in strategy-search time compared to hierarchical or separated allocation baselines (Lin et al., 2023).

3. Mixed-Scheme Quantization and Intra-Layer Allocation in FPGAs

The MSP (Mixed-Scheme, Multi-Precision) framework (Chang et al., 2020) formalizes mixed-scheme and intra-layer allocations for deep neural network quantization on FPGAs. MSP orchestrates the split of layer MAC operations among:

SPoT (Sum of Power-of-Two, LUT-mapped),
4-bit fixed-point (DSP-mapped),
8-bit fixed-point (DSP-mapped).

Fractions $r_s : r_f : r_8$ are chosen to saturate both LUT and DSP budgets for a given FPGA, yielding optimal pipeline throughput. For intra-layer allocation, MSP employs a per-row sensitivity analysis: the top 5% of rows (by quantization error) are assigned 8-bit quantization, and the rest use 4-bit. This intra-layer split is fixed for all layers, obviating reconfiguration overhead and enabling a single processing engine to serve the entire network.

Quantitative results on XC7Z045 show resource utilization (5% 8-bit DSP, 30% 4-bit DSP, 65% SPoT LUT), top-1 accuracy ( $70.47\%$ , exceeding baseline), and 3.5 $\times$ throughput improvement over fixed-point-only deployment (Chang et al., 2020).

4. Mixed-Scheme Resource Allocation in Communications and Networking

In network-coded multimedia multicast, mixed-scheme and intra-layer allocation are formalized as optimization programs over coded packet allocation per layer and subchannel (Tassi et al., 2014). Two primary network coding schemes are:

Intra-layer only (NOW-RLNC): Coding packets only within each layer.
Inter-layer (EW-RLNC): Expanding-window coding across cumulative sets of layers.

Mixed allocation (EW-MA) allows coded transmissions of expanding windows over arbitrary subchannels, whereas intra-layer mixed allocation (NOW-MA) allows intra-layer-only coding, but with arbitrary packet assignment across subchannels. Integer programs minimize total transmission count while guaranteeing layer-wise coverage to fractions of users.

Empirical studies demonstrate that EW-MA offers up to 28% reduction in traffic block footprint compared to intra-layer mixed allocation (NOW-MA) for small field sizes ( $q=2$ ), with respective coverage distance benefits (up to 252 m vs. 224 m for NOW-MA, and only 203 m for separated allocation NOW-SA). When $q$ increases ( $q=2^8$ ), differences narrow, but inter-layer mixed allocation remains more resource-efficient with heterogeneous user channels (Tassi et al., 2014).

5. Mixed-Scheme and Intra-Layer Scheduling in Parallel Training

LayerPipe (Unnikrishnan et al., 2021) defines a scheduling paradigm for deep neural network training that integrates mixed-scheme (overlap of inter- and intra-layer scheduling) and intra-layer allocation by:

Decoupling backward gradient computation into weight-gradient and activation-gradient tasks, which are independent once inputs are ready (intra-layer parallelism).
Further splitting activation gradient computation across two processors by channel/feature map groups (fine-grained inter-layer pipelining).
Formulating task allocation as an integer program targeting balanced per-processor load (makespan minimization), with greedy backward scheduling to achieve near-optimal allocation.

Measured on VGG16 and ResNet50, LayerPipe's mixed-scheme (intra-layer + inter-layer) approach produces average speedups of $25-45\%$ compared to PipeDream, and up to $70-85\%$ in the saturation regime for 7–9 processors (Unnikrishnan et al., 2021). Communication overhead for inter-processor split is minimal due to boundary conditions on the size of activation-gradient transfer.

6. Intra-Layer and Mixed Memory Allocation for Hardware Clusters

On kilo-core RISC-V clusters, the Dynamic Allocation Scheme (DAS) enables mixed-scheme and intra-layer allocations by programmable bank mapping per memory region (Wang et al., 2 Aug 2025). The allocator exposes parameters $(p, s)$ that select hierarchical folding of address partitions and rows onto L1 banks, supporting:

Fully interleaved (shared) mapping for broadcast/weights,
Per-tile local mapping for private kernel outputs,
Region-wise mixing among arbitrary schemes.

In attention-based models (ViT-L/16), intra-layer allocation enables each encoder substage (qkv-generation, attention, FFN) to select per-region banking that maximizes PE-local memory access. This reduces remote L1 latency from $4.75$ cycles to $1$ cycle (theoretical $4.75\times$ access speedup), yielding empirical end-to-end kernel speedups of $1.94\times$ and PE utilizations of $0.81$—with negligible area overhead ( $<0.1\%$ ) (Wang et al., 2 Aug 2025).

7. Computational Complexity and Solver Techniques

Mixed-scheme and intra-layer allocation problems are almost always NP-hard (mixed-integer nonlinear or quadratic programs). Approaches differ:

MIQP with early stopping and parallel branch-and-bound (Gurobi) in UniAP (Lin et al., 2023).
Heuristic two-stage allocation (MCS selection, greedy packet assignment) in network-coded multicast (Tassi et al., 2014).
Exact linearization and convexification of MINLPs, enabling large-scale practical optimization for intra-operator spectrum scheduling (Kibria et al., 2017).
Fixed intra-layer splits and per-layer enumeration for FPGA quantization (Chang et al., 2020).
Greedy backward scheduling with task splitting for intra-/inter-layer parallel neural training (Unnikrishnan et al., 2021).

A common trend is the combination of precise modeling of costs/constraints with scalable solver or heuristic reductions to attain tractable and near-optimal allocations at practical system scale.

The above summary traces all critical technical developments, formulations, and empirical findings to their source data. All statistics, algorithms, and mathematical notation correspond to published results in the cited papers (Lin et al., 2023, Unnikrishnan et al., 2021, Chang et al., 2020, Tassi et al., 2014, Kibria et al., 2017, Wang et al., 2 Aug 2025).