Delta Decomposition: DeRS Paradigm

Updated 20 November 2025

Delta Decomposition is a dual-framework method that splits complex objects into a shared base and compact deltas, applicable in Boolean DNFs and neural MoE models.
In Boolean analysis, the approach employs polynomial factorization to achieve the finest Δ-partition, ensuring efficient and unique DNF decomposition.
In deep learning, the DeRS paradigm compresses expert weights using sparse, quantized, or low-rank representations, significantly reducing memory and computation costs.

Delta Decomposition (DeRS Paradigm) encompasses two distinct but conceptually related frameworks for structured decomposition: (1) the Δ-decomposition of positive Disjunctive Normal Forms (DNFs) in Boolean function analysis, as formalized using the Delta‐and‐Rooted-Semiring (DeRS) paradigm (Ponomaryov, 2018); and (2) the Decompose‐Replace‐Synthesis (DeRS) paradigm for parameter-efficient upcycled Mixture-of-Experts (MoE) models in deep learning (Huang et al., 3 Mar 2025). Both leverage a core principle: decomposing a complex object into a shared “base” and compact “deltas” or components, but the mathematical and algorithmic contexts differ substantially.

1. Definition and Theoretical Foundation

In Boolean function analysis, Δ-decomposition refers to expressing a positive DNF $\varphi$ as a conjunctive product of DNFs $\psi_1,\dots,\psi_k$ such that all subcomponents may only intersect on a shared set of “Delta” variables $\Delta$ , with each non- $\Delta$ block non-empty. The decomposition is called finest if it admits no further nontrivial refinement.

In neural model upcycling, DeRS refers to decomposing dense expert weights $W_i\in\mathbb R^{d\times d_h}$ as $W_i = W_{\mathrm{base}} + \Delta_i$ , optimizing storage and computation by expressing $\Delta_i$ in a lightweight representation while $W_{\mathrm{base}}$ remains expert-shared.

Both frameworks exploit high redundancy—either logical or algebraic—in composition, enabling a transition to more compact or structured representations without loss of essential information (Ponomaryov, 2018, Huang et al., 3 Mar 2025).

2. DeRS for Positive DNF Decomposition

A positive DNF is a disjunction of terms over Boolean variables, where terms are conjunctions of unnegated variables. In the DeRS paradigm for Boolean functions, a positive DNF $\varphi(x_1,\dots,x_n)=\bigvee_{t\in T}\bigwedge_{i\in t} x_i$ is represented as a multilinear Boolean polynomial $f(x) = \sum_{t\in T}\prod_{i\in t} x_i$ . Disjunctions correspond to addition, conjunctions to multiplication, and variables are interpreted in the Boolean ring.

The key insight is that $\Delta$ -decomposition corresponds precisely to the factorization of $f(x)$ into irreducible multilinear Boolean polynomials whose variable sets intersect only at $\Delta$ . Each irreducible factor maps back to a sub-DNF, producing the finest partitioning of the original function (Ponomaryov, 2018).

The following table summarizes the logical correspondence:

Aspect	DNF Decomposition	Polynomial Factorization
Object	Positive DNF $\varphi$	Multilinear polynomial $f(x)$
Decomposition	$\varphi = \psi_1\wedge\cdots\wedge\psi_k$	$f = f_1\cdot \cdots\cdot f_k$
Shared variables	$\Delta$	Overlaps in $U_i\cap U_j$
Fineness	No further $\Delta$ -splitting	All factors irreducible

This correspondence enables exploitation of algebraic factoring algorithms for logic decomposition, yielding a unique, finest $\Delta$ -partition in polynomial time for positive DNFs.

3. Algorithmic Framework and Complexity

The DeRS algorithm for positive DNF Δ-decomposition consists of: (1) removing redundant terms, (2) computing Δ-atoms (intersections of terms with $\Delta$ ), (3) for each Δ-atom pair, testing decomposition via specialized restrictions and polynomial partitioning (FindPartition subroutine), (4) constructing a partition graph from obtained blocks, and (5) extracting the finest partition via connected components.

Algorithmic steps include:

Reduce $\varphi$ by eliminating redundancy.
Extract all Δ-atoms $a_j$ .
For pairs $(a_1,a_2)$ , restrict $\varphi$ to the assignment $L=a_1\cup a_2$ (forcing other $\Delta$ variables to zero), and apply polynomial factorization (FindPartition) to the restricted DNF's polynomial form.
Aggregate all variable blocks into a graph with cliques for shared blocks.
Identify connected components as distinct variable blocks for DNF decomposition.
Project and minimize original DNF onto blocks $\cup \Delta$ to obtain each $\psi_i$ .

The complexity is $O(poly(m,n))$ for $m$ terms and $n$ variables, with the polynomial time bound achieved by efficient factoring algorithms leveraging formal derivatives and substructure exploitation (Ponomaryov, 2018).

4. Delta Decomposition in Upcycled Mixture-of-Experts Models

In upcycled MoE neural models, DeRS employs the decomposition: $W_i = W_{\mathrm{base}} + \Delta_i$ where $W_i$ is the $i$ -th expert's weight matrix, $W_{\mathrm{base}}$ the shared base (often a pretrained FFN weight), and $\Delta_i$ a small expert-specific correction. Empirical cosine similarity $(\cos(W_i, W_{\mathrm{base}})>0.999)$ supports the intuition that $\Delta_i$ is structurally redundant, motivating storage reduction (Huang et al., 3 Mar 2025).

To exploit this redundancy, DeRS replaces full $\Delta_i$ with one of several lightweight encodings:

Sparse-matrix (DeRS-SM): Store only a small subset of nonzero entries, defined by a binary mask with high drop rate ( $p\geq0.9$ ).
Quantized form (DeRS-Q): Uniformly quantize $\Delta_i$ to low bit-width ( $k\ll K$ ).
Low-rank factorization (DeRS-LM): Represent $\Delta_i$ as $U_iV_i^\top$ , with low $r$ .

Each representation yields drastic reductions in parameter and memory cost.

5. Practical Algorithms for Compression and Training

Inference-Time Compression (DeRS Compression)

Decompose: $\Delta_i \leftarrow W_i - W_{\mathrm{base}}$ .
Compress: $\Delta_i \mapsto \mathcal F_{\rm post}(\Delta_i)$ via sparsification, quantization, or low-rank.
(Optional) Fine-tune compression parameters to minimize $\sum_{i=1}^N\|W_i - (W_{\mathrm{base}} + \mathcal F_{\rm post}(\Delta_i))\|_F^2 + \lambda R$ .
At inference, synthesize $W_i$ on demand.

Training-Time Upcycling (DeRS Upcycling)

Instantiate expert deltas $\mathcal F_{\rm pre}(\Delta_i)$ efficiently (zero-filled sparse or low-rank).
Forward pass: route input, synthesize $W_i$ for active experts.
Backpropagate and update $W_{\mathrm{base}}$ and compact deltas. Optionally, regularize for sparsity or rank.

This decomposed approach enables upcycled MoEs to scale in parameter count and memory footprint orders-of-magnitude below naive multi-expert allocation.

6. Empirical Results and Application Domains

DeRS achieves high compression while maintaining or slightly improving accuracy on a range of benchmarks:

Task/Model	Vanilla MoE Params Added	DeRS-SM Params	DeRS-LM Params	Accuracy Delta
MoE-LLaVA-Phi	$+2.52$ B	$1.11$M	$2.42$M	$+0.3\%$ , $+0.2\%$
Med-MoE-StableLM	$+1.24$ B	$0.26$M	$1.20$M	$+0.2\%$ , $+0.3\%$
Coder-MoE	$+2.43$ B	$325$M	$9.09$M	$+0.8\%$ , $+0.7\%$

Memory and computational use are reduced by up to 52.7% for model size, 21.2% for training memory, and 43.8% for inference memory in Coder-MoE scenarios (Huang et al., 3 Mar 2025). Application domains include multi-modal learning, medical VQA, program synthesis, and vision-LLMs.

7. Limitations, Extensions, and Open Problems

For Boolean DNFs, Δ-decomposition is polynomial-time tractable only in the positive (negation-free) case; extending these results to general or non-Boolean settings, or to other logical formats (BDD, CNF), remains open and is known to be coNP-hard in some cases. Selection of the “optimal” $\Delta$ is combinatorial and not addressed by existing algorithms (Ponomaryov, 2018).

In upcycled MoE models, DeRS achieves maximal efficiency when all experts are close to $W_{\mathrm{base}}$ ; large deviations may reduce representational sufficiency with sparse or low-rank $\Delta_i$ . The choice between sparse and low-rank forms trades off memory efficiency against expressivity. Maintaining $W_{\mathrm{base}}$ as trainable is empirically preferable.

In both domains, the DeRS paradigm evidences that leveraging structural redundancy enables substantial gains in computational efficiency, minimization, and interpretability, while open questions remain on extension to less constrained settings and criteria for optimal decomposition.

Markdown Report Issue Upgrade to Chat

References (2)

A Polynomial Time Delta-Decomposition Algorithm for Positive DNFs (2018)

DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Delta Decomposition (DeRS Paradigm).

Delta Decomposition: DeRS Paradigm

1. Definition and Theoretical Foundation

2. DeRS for Positive DNF Decomposition

3. Algorithmic Framework and Complexity

4. Delta Decomposition in Upcycled Mixture-of-Experts Models

5. Practical Algorithms for Compression and Training

Inference-Time Compression (DeRS Compression)

Training-Time Upcycling (DeRS Upcycling)

6. Empirical Results and Application Domains

7. Limitations, Extensions, and Open Problems

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Delta Decomposition: DeRS Paradigm

1. Definition and Theoretical Foundation

2. DeRS for Positive DNF Decomposition

3. Algorithmic Framework and Complexity

4. Delta Decomposition in Upcycled Mixture-of-Experts Models

5. Practical Algorithms for Compression and Training

Inference-Time Compression (DeRS Compression)

Training-Time Upcycling (DeRS Upcycling)

6. Empirical Results and Application Domains

7. Limitations, Extensions, and Open Problems

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research