Split-Frame Encoding: Theory & GPU Transcoding
- Split-Frame Encoding (SFE) is a dual-framework paradigm that includes both splittable codes for integer pair sequences and real-time video encoding on NVIDIA NVENC hardware.
- The theoretical approach uses two independent prefix codes to generate uniquely decodable splittable codes, ensuring efficient data compression and representation.
- The video transcoding application splits UHD frames into slices processed in parallel, nearly doubling throughput with minimal rate-distortion penalties and latency effects.
Split-Frame Encoding (SFE) refers to two distinct frameworks in contemporary information theory and practical video encoding: (1) the formal theory of splittable codes for sequences of integer pairs, established in coding theory, and (2) the hardware-accelerated Split-Frame Encoding employed by NVIDIA NVENC for parallelized UHD video transcoding on multi-chip GPUs. Both operate by partitioning data into independently coded subunits, but their structural motivations, algorithmic workflows, and application domains are distinct and will be treated separately.
1. Formal Definition and Structure of Splittable Codes
Let be the set of sequences of ordered integer pairs . A split-frame codeword (in this context, synonymous with a splittable codeword) is generated by two independent prefix encoders:
- , for the “first” (finite) component
- , for the “second” (infinite) component
A codeword is the concatenation: where "" denotes bitwise concatenation. Prefix-freeness is guaranteed as and are themselves prefix codes and their blocks do not overlap (Anisimov et al., 2015).
This formalism enables mixing any finite prefix code for bounded symbols and any infinite prefix code for unbounded symbols, facilitating broad applicability in data compression and representation.
2. Multi-Delimiter Splittable Codes and Properties
A prominent subclass employs multi-delimiter codes, utilizing runs of ones as delimiters. For a set , and the encodings and , one builds the set of all words either of the form or satisfying the following for longer codewords:
- Terminates in a delimiter
- Lacks interior appearance of such delimiters
- Does not begin with a delimiter
Each codeword is thus unambiguously split into -groups, ensuring unique decodability. Parsing can be executed efficiently by left-to-right sequential scans (Anisimov et al., 2015).
Completeness is shown by verifying that the Kraft-McMillan sum for codeword lengths exhausts unity, with the codeword cardinality following a linear recurrence with terms subtracted by delimiters. Universality follows directly by Elias’ lemma: the integer-to-codeword mapping guarantees codeword length , so, for any source distribution , the expected codeword length is within a constant of the entropy .
3. Decoding Algorithmic Realization: Byte-Aligned Fast Table Decoding
To enable high-throughput decoding, a small lookup table approach is adopted: the decoder’s state is determined by the remainder (at most 3 bits) from the previous byte and the next byte’s contents. The decoded outputs and new remainder are indexed by this tuple.
The pseudocode principle is as follows:
1 2 3 4 5 6 7 |
r ← 0 // initial remainder for each input byte B: T ← Table[r][B] if f1=1: output w1; if f2=1: output w2; if f3=1: output w3; r ← (T>>remShift)&remMask |
4. Comparative Compression Performance and Density
Splittable multi-delimiter codes demonstrate notable compression improvements over higher-order Fibonacci codes:
- For the Bible (), “Fib3” codes average 9.21 bits/word; averages 8.95 bits/word (–2.8%).
- For Hamlet ( words): Fib3 at 10.00 bits, at 9.74 (–2.5%).
- For a 20M-word Wikipedia fragment: Fib3 decoding in 0.321 s, in 0.255 s (20% faster).
Asymptotically, codeword set densities satisfy : with more delimiters, density increases but the short-codeword reservoir becomes richer (Anisimov et al., 2015).
5. Split-Frame Encoding in Video Transcoding (NVIDIA NVENC SFE)
Modern NVIDIA GPUs integrate multiple independent on-die NVENC encoder chips. SFE divides each UHD frame into horizontal slices assigned to each chip, which perform fully parallel encode pipelines (motion estimation, intra, entropy coding). Resultant slice bitstreams, each containing their own SPS/PPS, are stitched by pruning redundant headers and concatenating at the NAL boundary into a single elementary bitstream. Decoder compatibility is preserved, but motion estimation across the slice boundary is disabled (Arunruangsirilert et al., 24 Nov 2025).
| Component | Functionality |
|---|---|
| Frame partitioning | Top/bottom horizontal slicing, one per NVENC chip |
| Encoding pipeline | Parallel submission (\texttt{-split_encode_mode 2}) via ffmpeg/driver |
| Bitstream stitching | Host prunes SPS/PPS, concatenates slices at NAL boundary |
For typical workloads (RTX 4070 Ti SUPER, ffmpeg, test suite 10x4K/11x8K, CBR 10–100 Mbps), the SFE pipeline nearly doubles throughput (81–96% increase in FPS) compared to single-chip while incurring a negligible RD penalty. At 4K, mean PSNR degradation is below 0.05 dB (all presets), with worst case at –0.41 dB. At 8K, the penalty is negligible. Power draw for two-chip SFE rises by only 4.5–6 W over one chip, and remains much lower than CPU-based encoding (~150 W) (Arunruangsirilert et al., 24 Nov 2025).
6. Rate-Distortion, Throughput, Power, and Latency Trade-offs
The RD optimization objective for each encoder is: with the measured distortion (e.g., MSE, PSNR), the output bitrate, and the Lagrange parameter from CBR rate control (Arunruangsirilert et al., 24 Nov 2025).
Tables of performance (selected data):
Encoding Throughput (4K HEVC, P7 preset):
| SFE | FPS | ΔFPS (%) |
|---|---|---|
| Disabled | 45.24 | – |
| Enabled | 88.77 | +96.2 % |
Average RD Degradation (4K HEVC, P7):
| Metric | Δ Value |
|---|---|
| PSNR (dB) | –0.151 |
| VMAF (pt) | –0.389 |
Power Consumption (HEVC):
| NVENC Chips | Power (W) |
|---|---|
| 1 | 38.5 |
| 2 (SFE) | 43.0 |
End-to-End Latency (4K60 HEVC, P4):
| SFE | Latency (frames) |
|---|---|
| Disabled | 5 |
| Enabled | 5 |
A key trade-off is that SFE confers no latency penalty at 4K and reduces 8K latency by up to 1 frame. Only under “Ultra-High-Quality (2 Pass)” tuning does throughput decrease (–10–20%) due to serialized dependencies.
7. Unified Significance and Application Domains
The split-frame/splittable paradigm provides a uniform abstraction for both theoretical coding and high-throughput parallel video encoding. In the theory of prefix codes, splitting integer pairs enables the design of efficient, complete, and universal codes applicable to variable-length integer compression and text analytics. In practice, SFE on NVENC hardware enables real-time, power-efficient UHD transcoding, nearly doubling throughput with an RD penalty below 0.05 dB in production presets, ensuring feasibility for 4K/8K live applications at modest power budgets (Anisimov et al., 2015, Arunruangsirilert et al., 24 Nov 2025).
The SFE workflow is recommended for real-time 4K/8K use cases except for offline ultra-high-fidelity transcodes, where serialized dependencies reduce gains. In coding theory, splittable codes with multi-delimiters outperform Fibonacci codes for both compression efficiency and decoding speed.
These approaches thus collectively enable scalable, resource-efficient, and theoretically grounded solutions spanning digital communications, storage, and high-performance multimedia streaming.