Differential Erasure Coding (DEC)
- Differential Erasure Coding (DEC) is a method that leverages the sparse differences between data versions to reduce storage and I/O overhead.
- It integrates compressed sensing with traditional erasure coding for archival storage and employs fountain codes with adaptive doping for multimedia broadcast.
- DEC offers predictable performance with significant storage savings and minimal broadcast overhead, enabling efficient, differentiated service levels.
Differential Erasure Coding (DEC) encompasses a class of schemes that leverage the temporal or spatial structure of data to optimize reliable storage or broadcast transmission. Its core principle is to store or transmit only the differences ("deltas") between versions or blocks of data, combining erasure coding with sparsity-exploiting mechanisms such as compressed sensing, or with advanced rateless codes and decoder-level differentiation. DEC arises in two distinct research threads: compressed DEC for archival of versioned data (Harshan et al., 2015), and application-layer DEC for scalable, differentiated multimedia broadcast (Kokalj-Filipovic et al., 2012).
1. Formal Models and Definitions
In compressed DEC for distributed storage, an object is modeled as a vector $x \in \F_q^k$, with successive versions $x^{(0)},\,x^{(1)},\,\dots,\,x^{(t)} \in \F_q^k$. Each version differs from its predecessor by a "delta" , which is typically -sparse ( nonzeros), reflecting in-place or localized modifications common in real-world editing workloads. The focus is on minimizing storage and I/O required for highly versioned data, while retaining erasure resilience (Harshan et al., 2015).
For multimedia wireless broadcast, DEC refers to a framework atop Fountain codes. Here, the framework orchestrates encoding and two-phase (peeling + doping) decoding of blocks of source symbols, with broadcast output symbols generated via the Ideal Soliton (IS) distribution. The adaptation lies at the decoder, where feedback-driven "doping" enables class-based differentiation without altering the broadcast stream (Kokalj-Filipovic et al., 2012).
2. Compressed Sensing–Enhanced DEC for Versioned Storage
Compressed DEC exploits the sparse structure of deltas with a compressed sensing (CS) stage, followed by erasure-protected storage:
- For each delta with sparsity , a measurement vector $y^{(i)} = \Phi_{\gamma_i}d^{(i)} \in \F_q^{2\gamma_i}$ is formed, using a measurement matrix so that any $x^{(0)},\,x^{(1)},\,\dots,\,x^{(t)} \in \F_q^k$0 columns are linearly independent.
- Exact recovery of $x^{(0)},\,x^{(1)},\,\dots,\,x^{(t)} \in \F_q^k$1 from $x^{(0)},\,x^{(1)},\,\dots,\,x^{(t)} \in \F_q^k$2 is guaranteed if this condition holds, as per the unique sparse recovery proposition (Harshan et al., 2015):
$x^{(0)},\,x^{(1)},\,\dots,\,x^{(t)} \in \F_q^k$3
- Overhead for storing $x^{(0)},\,x^{(1)},\,\dots,\,x^{(t)} \in \F_q^k$4 is $x^{(0)},\,x^{(1)},\,\dots,\,x^{(t)} \in \F_q^k$5, much less than storing the full $x^{(0)},\,x^{(1)},\,\dots,\,x^{(t)} \in \F_q^k$6-dimensional vector when $x^{(0)},\,x^{(1)},\,\dots,\,x^{(t)} \in \F_q^k$7.
- For denser deltas ($x^{(0)},\,x^{(1)},\,\dots,\,x^{(t)} \in \F_q^k$8), the delta is stored in full using a standard $x^{(0)},\,x^{(1)},\,\dots,\,x^{(t)} \in \F_q^k$9 erasure code.
This process is summarized in the following table:
| Delta Sparsity | Compression/Encoding | Storage Overhead |
|---|---|---|
| 0 | 1 + inner EC | 2 |
| 3 | Direct EC: 4 | 5 |
By applying compressed sensing only when the update sparsity is low, and using "inner" erasure codes on the compressed deltas, DEC achieves storage savings proportional to the sum of delta sparsities, while retaining bounded I/O and full data reliability (Harshan et al., 2015).
3. Erasure Coding Integration and Decoding Workflow
The DEC architecture for storage interleaves compressed sensing with erasure coding. A 6 linear code with generator matrix 7 produces codewords 8 for full version or delta vectors. For compressed deltas, a smaller generator matrix 9 is used on 0, with codewords distributed across select storage nodes.
Retrieval is iterative: begin with the base version 1, then for each 2,
- Retrieve and decode 3 (either directly, or by sparse recovery),
- Reconstruct the sequence 4.
The total I/O is 5 for the initial version and 6 for deltas. In practical variants, a fixed threshold 7 yields a "two-level" DEC with only two generator matrices required.
4. Application-Layer DEC for Differentiated Broadcast
In multimedia wireless broadcast, DEC is instantiated as an Ideal Soliton–LT code with per-client two-phase decoding:
- The encoder uses the IS degree distribution 8 for 9 and 0, generating 1 packets.
- Each client runs a peeling decoder; if it stalls (ripple empty), a doping feedback prompts the sender to provide a missing source symbol, enabling decoding to proceed.
- Differentiation is realized by provisioning per-class upfront allocation 2 and doping budget 3: high-priority classes receive higher redundancy or more allowable dopes, translating to lower latency and overhead.
The combined impact is sub-1% resource overhead (for 4), linear time decoding, and a flexible trade-off between decode latency and resource allocation, with all clients receiving the same broadcast (Kokalj-Filipovic et al., 2012).
5. Performance, Complexity, and Overhead Analysis
For compressed DEC, storage size is reduced by exploiting the typically low 5 observed in versioned data repositories. Experimental results demonstrate storage overhead reductions of up to 60% against Rsync-inspired baselines under typical workloads. The I/O and storage load are proportional to 6 rather than 7 (8 = number of versions).
In application-layer DEC:
- Phase I decoding (pure peeling) requires 9 XOR operations.
- Phase II doping consists of 0 singleton updates (1), plus at most 2 work if a small residual system remains for Gaussian elimination; since 3, this term remains sublinear.
- Overhead approaches 1% as 4, outperforming Raptor and other pre-coded rateless schemes for similar block sizes and reliability targets.
6. Practical Variants and Differentiation Mechanisms
In the storage setting, a practical two-level DEC compresses all deltas below a sparsity threshold 5 with a fixed measurement matrix and generator. Only two matrices and one CS measurement need to be maintained, ensuring implementational simplicity (Harshan et al., 2015).
For broadcast, differentiation does not alter the physical broadcast or code, but is enabled by per-class control of 6 (extra packets) and 7 (doping budget). Analytical models based on IS stationarity and Poisson ripple increments provide closed-form performance predictions, guiding system design for service-level guarantees without physical-layer changes (Kokalj-Filipovic et al., 2012).
7. Summary and Significance
Differential Erasure Coding unifies concepts from versioned storage and broadcast communication by leveraging temporal or structural sparsity and erasure coding. In archival storage, compressed DEC delivers order-of-magnitude savings in storage and I/O for versioned datasets when edit operations are sparse, and generalizes efficiently to workloads with insertions and deletions. In multimedia broadcast, application-layer DEC achieves universally efficient rateless transmission with class-based quality-of-service controls, retaining linear-time decoding and minimal overhead. In both domains, DEC facilitates analytically predictable performance, efficient resource utilization, and system-level differentiation capabilities, as substantiated by the foundational works (Harshan et al., 2015) and (Kokalj-Filipovic et al., 2012).