Papers
Topics
Authors
Recent
2000 character limit reached

Random Partition Relaxation (RPR)

Updated 11 January 2026
  • Random Partition Relaxation (RPR) is a framework that relaxes discrete partition tasks using differentiable surrogates like Gumbel-Softmax and straight-through estimators.
  • It enables end-to-end trainable models for applications from neural network quantization to variational clustering, achieving competitive accuracy on benchmarks.
  • RPR employs a two-stage reparameterization and random partitioning with a geometric schedule, balancing hard quantization with continuous optimization.

Random Partition Relaxation (RPR) encompasses a family of principled methodologies enabling the relaxation of inherently discrete random partitioning tasks. These approaches facilitate all-or-nothing subset assignments or hard quantization constraints to participate in gradient-based optimization pipelines by introducing differentiable surrogates—chiefly Gumbel-Softmax reparameterizations and straight-through estimators—applied to either set partitions or quantized weight vector projections. RPR first emerged in contexts such as neural network quantization for binary/ternary weight models and was subsequently generalized for probabilistic modeling of partitions in variational inference. Its core objective remains end-to-end trainable modeling of partition structures that would otherwise resist differentiable handling.

1. Mathematical Frameworks

Set Partition Relaxation

Given a ground set of nn elements, [n]={1,,n}[n] = \{1, \ldots, n\}, a partition is a decomposition into KK mutually disjoint subsets, ρ=(S1,,SK)\rho = (S_1, \ldots, S_K), satisfying S1SK=[n]S_1 \cup \cdots \cup S_K = [n] and SiSj=S_i \cap S_j = \varnothing for iji \ne j. The assignment can be encoded by a binary indicator matrix Y{0,1}K×nY \in \{0,1\}^{K \times n} with yk,i=1y_{k,i} = 1 if element ii is assigned to SkS_k. Alternatively, one may specify the subset sizes n=(n1,,nK)\mathbf{n} = (n_1,\ldots,n_K), knk=n\sum_k n_k = n, along with a permutation π\pi representing the assignment order.

Partitioned Weight Quantization

In the network quantization context, the space of quantized weights is

  • Q2={1,+1}\mathbb{Q}_2 = \{-1, +1\} for binary-weight networks,
  • Q3={1,0,+1}\mathbb{Q}_3 = \{-1, 0, +1\} for ternary-weight networks.

Given parameters θRd\theta \in \mathbb{R}^d, the quantization operation is Q(w)=argminQwQ(w) = \arg\min_{\ell \in \mathbb{Q}} |w-\ell|. The whole training objective becomes the constrained empirical risk minimization: minwq,wcL(wq,wc)s.t.wqQdq,  wcRdc\min_{w_q, w_c} \mathcal{L}(w_q, w_c) \quad \mathrm{s.t.} \quad w_q \in \mathbb{Q}^{d_q},\; w_c \in \mathbb{R}^{d_c} RPR introduces a (random) partial quantization, partitioning wqw_q into "frozen" indices IfI_f (quantized) and "relaxed" IrI_r (float), with alternation over random partitions.

2. Algorithmic Procedures

Differentiable Random Partition Models

RPR leverages a two-stage reparameterization strategy (Sutter et al., 2023):

  1. Relaxing Subset Sizes: The partition label vector n\mathbf{n} is sampled from a Fisher-noncentral hypergeometric distribution. Differentiable, reparameterizable samples are obtained via the Gumbel-Softmax trick applied recursively to marginals of the count vector.
  2. Assignment via Soft Permutation: A random ordering π\pi is sampled using a Plackett–Luce model parameterized by (learned) positive scores sis_i. π\pi is relaxed to a doubly stochastic matrix using a differentiable sorting algorithm such as SoftSort.

Given these samples, the assignment matrix YY is constructed by grouping (with smooth, temperature-controlled indicators) consecutive rows of the permutation matrix.

Random Partition Relaxation for Quantized Neural Networks

RPR for quantization (Cavigelli et al., 2020) operates in stages:

  • For each stage kk, select a random partition of quantized weights into IfI_f (a random FFkdq\lfloor \mathrm{FF}_k d_q \rfloor subset) and IrI_r (remaining).
  • Quantize wiw_i for iIfi \in I_f and keep wjw_j for jIrj \in I_r real-valued.
  • Perform SGD/Adam updates only on IrI_r and wcw_c for EE epochs, then reshuffle partitions with increased frozen fraction FFk+1\mathrm{FF}_{k+1}.
  • Continue geometric schedule until FFK=1.0\mathrm{FF}_K = 1.0; fine-tune batch-norm layers post-final quantization.

Partition sampling is uniform random at each stage, and all weights eventually experience both quantized and relaxed phases.

3. Variational Inference and Optimization

ELBO Derivation in RPR Models

For partitioned latent variable models, the variational lower bound is: logp(X)Eq(Y,ZX)[logp(XZ)logq(ZY,X)p(ZY)logq(YX)p(Y)]\log p(X) \geq \mathbb{E}_{q(Y,Z|X)} \left[ \log p(X|Z) - \log \frac{q(Z|Y, X)}{p(Z|Y)} - \log \frac{q(Y|X)}{p(Y)} \right] Here, q(YX)q(Y|X) is parametrized using the two-stage RPR relaxation. Since p(Y)p(Y) is intractable (due to summing over permutations), a perturb-and-bound strategy is used: KL[q(YX)p(Y)]Eq(YX)[logΠYq(n)p(n)maxπp(π)]KL[q(Y|X) \| p(Y)] \leq \mathbb{E}_{q(Y|X)} \left[ \log \frac{|\Pi_Y| q(\mathbf{n})}{p(\mathbf{n}) \max_\pi p(\pi)} \right] All sampling steps are differentiable, enabling low-variance pathwise gradients.

For quantized networks, optimization alternates between continuous relaxation and strict projection, minimizing empirical loss without explicit regularization penalties.

4. Empirical Results and Validation

Partition Modeling Applications

Results reported for the differentiable partition model (Sutter et al., 2023):

  • Variational Clustering (MNIST, Fashion-MNIST): RPR achieved NMI~0.89, ARI~0.88, ACC~0.94 on MNIST, exceeding VaDE and latent-GMM baselines.
  • Weakly-supervised Factor Inference (MPI3D-toy): Superior estimation of the number of shared factors (minimum MSE), and the highest balanced accuracy in recovering shared dimensions.
  • Multitask Learning (MultiMNIST): Comparable or superior per-task accuracy relative to Unitary Loss Scaling (ULS). RPR adaptively allocates more neurons to harder tasks.

Quantized Neural Network Performance

For ImageNet classification (Cavigelli et al., 2020), RPR (for both binary and ternary levels) achieves:

  • ResNet-18 ternary: 66.31% (top-1), binary: 64.62% (top-1).
  • GoogLeNet ternary: 64.88% (top-1, best reported), binary: 62.01% (top-1, best reported). RPR exceeds TWN and XNOR baselines in reported metrics and is competitive with ADMM, with similar total training epochs.
Model Method Levels Top-1 Top-5
GoogLeNet RPR {-1,0,1} 64.88 86.05
GoogLeNet RPR {-1,1} 62.01 84.83
ResNet-18 RPR {-1,0,1} 66.31 87.84
ResNet-18 RPR {-1,1} 64.62 86.01

5. Implementation Considerations

Hyperparameters and Scheduling

  • For differentiable RPR: subset size temperature (τM\tau_M) and permutation temperature (τPI\tau_{PI}) annealed from 1.0 to ~0.5 over training; β=1\beta=1, γ=1\gamma=1, δ=0.01\delta=0.01 commonly used for ELBO weighting (Sutter et al., 2023).
  • In quantization: freezing fraction FFk\mathrm{FF}_k swept over a geometric schedule (0.9, 0.95, 0.975, 0.9875, 1.0), E=15E=15 epochs per stage. Initial learning rate as for baseline models, decayed within each stage. Robustness noted for FF[0.75,0.925]\mathrm{FF} \in [0.75, 0.925]; total epochs 120\sim 120 as in full-precision runs (Cavigelli et al., 2020).
  • PyTorch/TF integration via custom forward operations and masked gradients.

Scalability and Limitations

  • For differentiable random partitions, O(n2)O(n^2) scaling due to the n×nn \times n permutation matrix constrains nn.
  • PMF bounding is loose for highly variable scores sis_i.
  • Hard alternating quantization in neural network RPR can be sensitive to FF schedule, but no formal convergence proof is given.

6. Distinctions, Limitations, and Extensions

RPR fundamentally differs from Gumbel-Softmax for categorical variables or Sinkhorn relaxations for permutations by jointly relaxing both subset size and element-to-subset assignment within a single modular framework (Sutter et al., 2023). The alternating quantization variant (for neural weights) leverages stochastic partitioning to avoid bias and enable fair gradient updates to all parameters (Cavigelli et al., 2020).

Limitations include:

  • Computational quadratic scaling for large nn in differentiable models.
  • The tightness of surrogate bounds for KL-divergence terms.
  • Complexity of the two-step relaxation and potential for improved relaxations (e.g., earth-mover or transport-based alternatives discussed but not yet incorporated).

Suggested extensions involve improved partition mass function estimation, alternative assignment orderings, and new applications in multimodal and weak-supervised tasks.

The differentiable RPR approach synthesizes advances in stochastic relaxations, such as Gumbel-Softmax for counts and SoftSort for structured assignment, to generalize partition sampling within variational inference frameworks (Sutter et al., 2023). In quantized networks, RPR provides an efficient alternative to methods such as ADMM and Straight-Through Estimators by introducing systematic randomization and relaxation schedules (Cavigelli et al., 2020).

Current and future research directions focus on improving scalability; tightening bounds for variational inference; and broadening the utility of RPR for confounding structure discovery and resource-aware deep learning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Random Partition Relaxation (RPR).