Papers
Topics
Authors
Recent
2000 character limit reached

Split-Frame Encoding: Theory & GPU Transcoding

Updated 1 December 2025
  • Split-Frame Encoding (SFE) is a dual-framework paradigm that includes both splittable codes for integer pair sequences and real-time video encoding on NVIDIA NVENC hardware.
  • The theoretical approach uses two independent prefix codes to generate uniquely decodable splittable codes, ensuring efficient data compression and representation.
  • The video transcoding application splits UHD frames into slices processed in parallel, nearly doubling throughput with minimal rate-distortion penalties and latency effects.

Split-Frame Encoding (SFE) refers to two distinct frameworks in contemporary information theory and practical video encoding: (1) the formal theory of splittable codes for sequences of integer pairs, established in coding theory, and (2) the hardware-accelerated Split-Frame Encoding employed by NVIDIA NVENC for parallelized UHD video transcoding on multi-chip GPUs. Both operate by partitioning data into independently coded subunits, but their structural motivations, algorithmic workflows, and application domains are distinct and will be treated separately.

1. Formal Definition and Structure of Splittable Codes

Let SS be the set of sequences of ordered integer pairs (x1,y1),(x2,y2),...,(xn,yn)(x_1, y_1), (x_2, y_2), ..., (x_n, y_n). A split-frame codeword (in this context, synonymous with a splittable codeword) is generated by two independent prefix encoders:

  • f:{0,...,d}{0,1}f: \{0, ..., d\} \rightarrow \{0,1\}^*, for the “first” (finite) component xix_i
  • g:Z+{0,1}g: \mathbb{Z}^+ \rightarrow \{0,1\}^*, for the “second” (infinite) component yiy_i

A codeword is the concatenation: C=f(x1)g(y1)...f(xn)g(yn)C = f(x_1) \| g(y_1) \| ... \| f(x_n) \| g(y_n) where "\|" denotes bitwise concatenation. Prefix-freeness is guaranteed as ff and gg are themselves prefix codes and their blocks do not overlap (Anisimov et al., 2015).

This formalism enables mixing any finite prefix code ff for bounded symbols and any infinite prefix code gg for unbounded symbols, facilitating broad applicability in data compression and representation.

2. Multi-Delimiter Splittable Codes and Properties

A prominent subclass employs multi-delimiter codes, utilizing runs of ones as delimiters. For a set M={m1,...,mt},1m1<...<mtM = \{m_1, ..., m_t\}, 1 \leq m_1 < ... < m_t, and the encodings f(x)=xmod2f(x) = x \bmod 2 and g(y)=1y10g(y) = 1^{y-1}0, one builds the set Dm1,...,mtD_{m_1,...,m_t} of all words either of the form 1mi01^{m_i}0 or satisfying the following for longer codewords:

  • Terminates in a delimiter 01mi00\,1^{m_i}0
  • Lacks interior appearance of such delimiters
  • Does not begin with a delimiter

Each codeword is thus unambiguously split into (fg)(f\|g)-groups, ensuring unique decodability. Parsing can be executed efficiently by left-to-right sequential scans (Anisimov et al., 2015).

Completeness is shown by verifying that the Kraft-McMillan sum for codeword lengths exhausts unity, with the codeword cardinality fnf_n following a linear recurrence with terms subtracted by delimiters. Universality follows directly by Elias’ lemma: the integer-to-codeword mapping guarantees codeword length c(n)=O(logn)|c(n)| = O(\log n), so, for any source distribution pip_i, the expected codeword length is within a constant of the entropy H(p)H(p).

3. Decoding Algorithmic Realization: Byte-Aligned Fast Table Decoding

To enable high-throughput decoding, a small lookup table approach is adopted: the decoder’s state is determined by the remainder (at most 3 bits) from the previous byte and the next byte’s contents. The decoded outputs and new remainder are indexed by this tuple.

The pseudocode principle is as follows:

1
2
3
4
5
6
7
r  0 // initial remainder
for each input byte B:
    T  Table[r][B]
    if f1=1: output w1;
    if f2=1: output w2;
    if f3=1: output w3;
    r  (T>>remShift)&remMask
With a 6×256=1536-entry table, fits in approximately 6 KB, and decoding achieves hundreds of MB/s throughput; step complexity is O(L/8)O(L/8) for an LL-bit input (Anisimov et al., 2015).

4. Comparative Compression Performance and Density

Splittable multi-delimiter codes Dm1,...,mtD_{m_1, ..., m_t} demonstrate notable compression improvements over higher-order Fibonacci codes:

  • For the Bible (Alphabet12,500|\text{Alphabet}| \approx 12,500), “Fib3” codes average 9.21 bits/word; D2,3,5D_{2,3,5} averages 8.95 bits/word (–2.8%).
  • For Hamlet (4,500\approx 4,500 words): Fib3 at 10.00 bits, D2,3,5D_{2,3,5} at 9.74 (–2.5%).
  • For a 20M-word Wikipedia fragment: Fib3 decoding in 0.321 s, D2D_2 in 0.255 s (20% faster).

Asymptotically, codeword set densities satisfy Sn(D2)1.867n, Sn(Fib3)1.839nS_n(D_2) \sim 1.867^n,\ S_n(\mathrm{Fib}_3) \sim 1.839^n: with more delimiters, density increases but the short-codeword reservoir becomes richer (Anisimov et al., 2015).

5. Split-Frame Encoding in Video Transcoding (NVIDIA NVENC SFE)

Modern NVIDIA GPUs integrate multiple independent on-die NVENC encoder chips. SFE divides each UHD frame into horizontal slices assigned to each chip, which perform fully parallel encode pipelines (motion estimation, intra, entropy coding). Resultant slice bitstreams, each containing their own SPS/PPS, are stitched by pruning redundant headers and concatenating at the NAL boundary into a single elementary bitstream. Decoder compatibility is preserved, but motion estimation across the slice boundary is disabled (Arunruangsirilert et al., 24 Nov 2025).

Component Functionality
Frame partitioning Top/bottom horizontal slicing, one per NVENC chip
Encoding pipeline Parallel submission (\texttt{-split_encode_mode 2}) via ffmpeg/driver
Bitstream stitching Host prunes SPS/PPS, concatenates slices at NAL boundary

For typical workloads (RTX 4070 Ti SUPER, ffmpeg, test suite 10x4K/11x8K, CBR 10–100 Mbps), the SFE pipeline nearly doubles throughput (81–96% increase in FPS) compared to single-chip while incurring a negligible RD penalty. At 4K, mean PSNR degradation is below 0.05 dB (all presets), with worst case at –0.41 dB. At 8K, the penalty is negligible. Power draw for two-chip SFE rises by only 4.5–6 W over one chip, and remains much lower than CPU-based encoding (~150 W) (Arunruangsirilert et al., 24 Nov 2025).

6. Rate-Distortion, Throughput, Power, and Latency Trade-offs

The RD optimization objective for each encoder is: Ji=Di+λRiJ_i = D_i + \lambda\,R_i with DiD_i the measured distortion (e.g., MSE, PSNR), RiR_i the output bitrate, and λ\lambda the Lagrange parameter from CBR rate control (Arunruangsirilert et al., 24 Nov 2025).

Tables of performance (selected data):

Encoding Throughput (4K HEVC, P7 preset):

SFE FPS ΔFPS (%)
Disabled 45.24
Enabled 88.77 +96.2 %

Average RD Degradation (4K HEVC, P7):

Metric Δ Value
PSNR (dB) –0.151
VMAF (pt) –0.389

Power Consumption (HEVC):

NVENC Chips Power (W)
1 38.5
2 (SFE) 43.0

End-to-End Latency (4K60 HEVC, P4):

SFE Latency (frames)
Disabled 5
Enabled 5

A key trade-off is that SFE confers no latency penalty at 4K and reduces 8K latency by up to 1 frame. Only under “Ultra-High-Quality (2 Pass)” tuning does throughput decrease (–10–20%) due to serialized dependencies.

7. Unified Significance and Application Domains

The split-frame/splittable paradigm provides a uniform abstraction for both theoretical coding and high-throughput parallel video encoding. In the theory of prefix codes, splitting integer pairs enables the design of efficient, complete, and universal codes applicable to variable-length integer compression and text analytics. In practice, SFE on NVENC hardware enables real-time, power-efficient UHD transcoding, nearly doubling throughput with an RD penalty below 0.05 dB in production presets, ensuring feasibility for 4K/8K live applications at modest power budgets (Anisimov et al., 2015, Arunruangsirilert et al., 24 Nov 2025).

The SFE workflow is recommended for real-time 4K/8K use cases except for offline ultra-high-fidelity transcodes, where serialized dependencies reduce gains. In coding theory, splittable codes with multi-delimiters outperform Fibonacci codes for both compression efficiency and decoding speed.

These approaches thus collectively enable scalable, resource-efficient, and theoretically grounded solutions spanning digital communications, storage, and high-performance multimedia streaming.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Split-Frame Encoding (SFE).