Random Partition Relaxation (RPR)

Updated 11 January 2026

Random Partition Relaxation (RPR) is a framework that relaxes discrete partition tasks using differentiable surrogates like Gumbel-Softmax and straight-through estimators.
It enables end-to-end trainable models for applications from neural network quantization to variational clustering, achieving competitive accuracy on benchmarks.
RPR employs a two-stage reparameterization and random partitioning with a geometric schedule, balancing hard quantization with continuous optimization.

Random Partition Relaxation (RPR) encompasses a family of principled methodologies enabling the relaxation of inherently discrete random partitioning tasks. These approaches facilitate all-or-nothing subset assignments or hard quantization constraints to participate in gradient-based optimization pipelines by introducing differentiable surrogates—chiefly Gumbel-Softmax reparameterizations and straight-through estimators—applied to either set partitions or quantized weight vector projections. RPR first emerged in contexts such as neural network quantization for binary/ternary weight models and was subsequently generalized for probabilistic modeling of partitions in variational inference. Its core objective remains end-to-end trainable modeling of partition structures that would otherwise resist differentiable handling.

1. Mathematical Frameworks

Set Partition Relaxation

Given a ground set of $n$ elements, $[n] = \{1, \ldots, n\}$ , a partition is a decomposition into $K$ mutually disjoint subsets, $\rho = (S_1, \ldots, S_K)$ , satisfying $S_1 \cup \cdots \cup S_K = [n]$ and $S_i \cap S_j = \varnothing$ for $i \ne j$ . The assignment can be encoded by a binary indicator matrix $Y \in \{0,1\}^{K \times n}$ with $y_{k,i} = 1$ if element $i$ is assigned to $S_k$ . Alternatively, one may specify the subset sizes $\mathbf{n} = (n_1,\ldots,n_K)$ , $\sum_k n_k = n$ , along with a permutation $\pi$ representing the assignment order.

Partitioned Weight Quantization

In the network quantization context, the space of quantized weights is

$\mathbb{Q}_2 = \{-1, +1\}$ for binary-weight networks,
$\mathbb{Q}_3 = \{-1, 0, +1\}$ for ternary-weight networks.

Given parameters $\theta \in \mathbb{R}^d$ , the quantization operation is $Q(w) = \arg\min_{\ell \in \mathbb{Q}} |w-\ell|$ . The whole training objective becomes the constrained empirical risk minimization: $\min_{w_q, w_c} \mathcal{L}(w_q, w_c) \quad \mathrm{s.t.} \quad w_q \in \mathbb{Q}^{d_q},\; w_c \in \mathbb{R}^{d_c}$ RPR introduces a (random) partial quantization, partitioning $w_q$ into "frozen" indices $I_f$ (quantized) and "relaxed" $I_r$ (float), with alternation over random partitions.

2. Algorithmic Procedures

Differentiable Random Partition Models

RPR leverages a two-stage reparameterization strategy (Sutter et al., 2023):

Relaxing Subset Sizes: The partition label vector $\mathbf{n}$ is sampled from a Fisher-noncentral hypergeometric distribution. Differentiable, reparameterizable samples are obtained via the Gumbel-Softmax trick applied recursively to marginals of the count vector.
Assignment via Soft Permutation: A random ordering $\pi$ is sampled using a Plackett–Luce model parameterized by (learned) positive scores $s_i$ . $\pi$ is relaxed to a doubly stochastic matrix using a differentiable sorting algorithm such as SoftSort.

Given these samples, the assignment matrix $Y$ is constructed by grouping (with smooth, temperature-controlled indicators) consecutive rows of the permutation matrix.

Random Partition Relaxation for Quantized Neural Networks

RPR for quantization (Cavigelli et al., 2020) operates in stages:

For each stage $k$ , select a random partition of quantized weights into $I_f$ (a random $\lfloor \mathrm{FF}_k d_q \rfloor$ subset) and $I_r$ (remaining).
Quantize $w_i$ for $i \in I_f$ and keep $w_j$ for $j \in I_r$ real-valued.
Perform SGD/Adam updates only on $I_r$ and $w_c$ for $E$ epochs, then reshuffle partitions with increased frozen fraction $\mathrm{FF}_{k+1}$ .
Continue geometric schedule until $\mathrm{FF}_K = 1.0$ ; fine-tune batch-norm layers post-final quantization.

Partition sampling is uniform random at each stage, and all weights eventually experience both quantized and relaxed phases.

3. Variational Inference and Optimization

ELBO Derivation in RPR Models

For partitioned latent variable models, the variational lower bound is: $\log p(X) \geq \mathbb{E}_{q(Y,Z|X)} \left[ \log p(X|Z) - \log \frac{q(Z|Y, X)}{p(Z|Y)} - \log \frac{q(Y|X)}{p(Y)} \right]$ Here, $q(Y|X)$ is parametrized using the two-stage RPR relaxation. Since $p(Y)$ is intractable (due to summing over permutations), a perturb-and-bound strategy is used: $KL[q(Y|X) \| p(Y)] \leq \mathbb{E}_{q(Y|X)} \left[ \log \frac{|\Pi_Y| q(\mathbf{n})}{p(\mathbf{n}) \max_\pi p(\pi)} \right]$ All sampling steps are differentiable, enabling low-variance pathwise gradients.

For quantized networks, optimization alternates between continuous relaxation and strict projection, minimizing empirical loss without explicit regularization penalties.

4. Empirical Results and Validation

Partition Modeling Applications

Results reported for the differentiable partition model (Sutter et al., 2023):

Variational Clustering (MNIST, Fashion-MNIST): RPR achieved NMI~0.89, ARI~0.88, ACC~0.94 on MNIST, exceeding VaDE and latent-GMM baselines.
Weakly-supervised Factor Inference (MPI3D-toy): Superior estimation of the number of shared factors (minimum MSE), and the highest balanced accuracy in recovering shared dimensions.
Multitask Learning (MultiMNIST): Comparable or superior per-task accuracy relative to Unitary Loss Scaling (ULS). RPR adaptively allocates more neurons to harder tasks.

Quantized Neural Network Performance

For ImageNet classification (Cavigelli et al., 2020), RPR (for both binary and ternary levels) achieves:

ResNet-18 ternary: 66.31% (top-1), binary: 64.62% (top-1).
GoogLeNet ternary: 64.88% (top-1, best reported), binary: 62.01% (top-1, best reported). RPR exceeds TWN and XNOR baselines in reported metrics and is competitive with ADMM, with similar total training epochs.

Model	Method	Levels	Top-1	Top-5
GoogLeNet	RPR	{-1,0,1}	64.88	86.05
GoogLeNet	RPR	{-1,1}	62.01	84.83
ResNet-18	RPR	{-1,0,1}	66.31	87.84
ResNet-18	RPR	{-1,1}	64.62	86.01

5. Implementation Considerations

Hyperparameters and Scheduling

For differentiable RPR: subset size temperature ( $\tau_M$ ) and permutation temperature ( $\tau_{PI}$ ) annealed from 1.0 to ~0.5 over training; $\beta=1$ , $\gamma=1$ , $\delta=0.01$ commonly used for ELBO weighting (Sutter et al., 2023).
In quantization: freezing fraction $\mathrm{FF}_k$ swept over a geometric schedule (0.9, 0.95, 0.975, 0.9875, 1.0), $E=15$ epochs per stage. Initial learning rate as for baseline models, decayed within each stage. Robustness noted for $\mathrm{FF} \in [0.75, 0.925]$ ; total epochs $\sim 120$ as in full-precision runs (Cavigelli et al., 2020).
PyTorch/TF integration via custom forward operations and masked gradients.

Scalability and Limitations

For differentiable random partitions, $O(n^2)$ scaling due to the $n \times n$ permutation matrix constrains $n$ .
PMF bounding is loose for highly variable scores $s_i$ .
Hard alternating quantization in neural network RPR can be sensitive to FF schedule, but no formal convergence proof is given.

6. Distinctions, Limitations, and Extensions

RPR fundamentally differs from Gumbel-Softmax for categorical variables or Sinkhorn relaxations for permutations by jointly relaxing both subset size and element-to-subset assignment within a single modular framework (Sutter et al., 2023). The alternating quantization variant (for neural weights) leverages stochastic partitioning to avoid bias and enable fair gradient updates to all parameters (Cavigelli et al., 2020).

Limitations include:

Computational quadratic scaling for large $n$ in differentiable models.
The tightness of surrogate bounds for KL-divergence terms.
Complexity of the two-step relaxation and potential for improved relaxations (e.g., earth-mover or transport-based alternatives discussed but not yet incorporated).

Suggested extensions involve improved partition mass function estimation, alternative assignment orderings, and new applications in multimodal and weak-supervised tasks.

The differentiable RPR approach synthesizes advances in stochastic relaxations, such as Gumbel-Softmax for counts and SoftSort for structured assignment, to generalize partition sampling within variational inference frameworks (Sutter et al., 2023). In quantized networks, RPR provides an efficient alternative to methods such as ADMM and Straight-Through Estimators by introducing systematic randomization and relaxation schedules (Cavigelli et al., 2020).

Current and future research directions focus on improving scalability; tightening bounds for variational inference; and broadening the utility of RPR for confounding structure discovery and resource-aware deep learning.

PDF Markdown Chat (Pro)

References (2)

Differentiable Random Partition Models (2023)

RPR: Random Partition Relaxation for Training; Binary and Ternary Weight Neural Networks (2020)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Random Partition Relaxation (RPR).