Random Partition Relaxation (RPR)
- Random Partition Relaxation (RPR) is a framework that relaxes discrete partition tasks using differentiable surrogates like Gumbel-Softmax and straight-through estimators.
- It enables end-to-end trainable models for applications from neural network quantization to variational clustering, achieving competitive accuracy on benchmarks.
- RPR employs a two-stage reparameterization and random partitioning with a geometric schedule, balancing hard quantization with continuous optimization.
Random Partition Relaxation (RPR) encompasses a family of principled methodologies enabling the relaxation of inherently discrete random partitioning tasks. These approaches facilitate all-or-nothing subset assignments or hard quantization constraints to participate in gradient-based optimization pipelines by introducing differentiable surrogates—chiefly Gumbel-Softmax reparameterizations and straight-through estimators—applied to either set partitions or quantized weight vector projections. RPR first emerged in contexts such as neural network quantization for binary/ternary weight models and was subsequently generalized for probabilistic modeling of partitions in variational inference. Its core objective remains end-to-end trainable modeling of partition structures that would otherwise resist differentiable handling.
1. Mathematical Frameworks
Set Partition Relaxation
Given a ground set of elements, , a partition is a decomposition into mutually disjoint subsets, , satisfying and for . The assignment can be encoded by a binary indicator matrix with if element is assigned to . Alternatively, one may specify the subset sizes , , along with a permutation representing the assignment order.
Partitioned Weight Quantization
In the network quantization context, the space of quantized weights is
- for binary-weight networks,
- for ternary-weight networks.
Given parameters , the quantization operation is . The whole training objective becomes the constrained empirical risk minimization: RPR introduces a (random) partial quantization, partitioning into "frozen" indices (quantized) and "relaxed" (float), with alternation over random partitions.
2. Algorithmic Procedures
Differentiable Random Partition Models
RPR leverages a two-stage reparameterization strategy (Sutter et al., 2023):
- Relaxing Subset Sizes: The partition label vector is sampled from a Fisher-noncentral hypergeometric distribution. Differentiable, reparameterizable samples are obtained via the Gumbel-Softmax trick applied recursively to marginals of the count vector.
- Assignment via Soft Permutation: A random ordering is sampled using a Plackett–Luce model parameterized by (learned) positive scores . is relaxed to a doubly stochastic matrix using a differentiable sorting algorithm such as SoftSort.
Given these samples, the assignment matrix is constructed by grouping (with smooth, temperature-controlled indicators) consecutive rows of the permutation matrix.
Random Partition Relaxation for Quantized Neural Networks
RPR for quantization (Cavigelli et al., 2020) operates in stages:
- For each stage , select a random partition of quantized weights into (a random subset) and (remaining).
- Quantize for and keep for real-valued.
- Perform SGD/Adam updates only on and for epochs, then reshuffle partitions with increased frozen fraction .
- Continue geometric schedule until ; fine-tune batch-norm layers post-final quantization.
Partition sampling is uniform random at each stage, and all weights eventually experience both quantized and relaxed phases.
3. Variational Inference and Optimization
ELBO Derivation in RPR Models
For partitioned latent variable models, the variational lower bound is: Here, is parametrized using the two-stage RPR relaxation. Since is intractable (due to summing over permutations), a perturb-and-bound strategy is used: All sampling steps are differentiable, enabling low-variance pathwise gradients.
For quantized networks, optimization alternates between continuous relaxation and strict projection, minimizing empirical loss without explicit regularization penalties.
4. Empirical Results and Validation
Partition Modeling Applications
Results reported for the differentiable partition model (Sutter et al., 2023):
- Variational Clustering (MNIST, Fashion-MNIST): RPR achieved NMI~0.89, ARI~0.88, ACC~0.94 on MNIST, exceeding VaDE and latent-GMM baselines.
- Weakly-supervised Factor Inference (MPI3D-toy): Superior estimation of the number of shared factors (minimum MSE), and the highest balanced accuracy in recovering shared dimensions.
- Multitask Learning (MultiMNIST): Comparable or superior per-task accuracy relative to Unitary Loss Scaling (ULS). RPR adaptively allocates more neurons to harder tasks.
Quantized Neural Network Performance
For ImageNet classification (Cavigelli et al., 2020), RPR (for both binary and ternary levels) achieves:
- ResNet-18 ternary: 66.31% (top-1), binary: 64.62% (top-1).
- GoogLeNet ternary: 64.88% (top-1, best reported), binary: 62.01% (top-1, best reported). RPR exceeds TWN and XNOR baselines in reported metrics and is competitive with ADMM, with similar total training epochs.
| Model | Method | Levels | Top-1 | Top-5 |
|---|---|---|---|---|
| GoogLeNet | RPR | {-1,0,1} | 64.88 | 86.05 |
| GoogLeNet | RPR | {-1,1} | 62.01 | 84.83 |
| ResNet-18 | RPR | {-1,0,1} | 66.31 | 87.84 |
| ResNet-18 | RPR | {-1,1} | 64.62 | 86.01 |
5. Implementation Considerations
Hyperparameters and Scheduling
- For differentiable RPR: subset size temperature () and permutation temperature () annealed from 1.0 to ~0.5 over training; , , commonly used for ELBO weighting (Sutter et al., 2023).
- In quantization: freezing fraction swept over a geometric schedule (0.9, 0.95, 0.975, 0.9875, 1.0), epochs per stage. Initial learning rate as for baseline models, decayed within each stage. Robustness noted for ; total epochs as in full-precision runs (Cavigelli et al., 2020).
- PyTorch/TF integration via custom forward operations and masked gradients.
Scalability and Limitations
- For differentiable random partitions, scaling due to the permutation matrix constrains .
- PMF bounding is loose for highly variable scores .
- Hard alternating quantization in neural network RPR can be sensitive to FF schedule, but no formal convergence proof is given.
6. Distinctions, Limitations, and Extensions
RPR fundamentally differs from Gumbel-Softmax for categorical variables or Sinkhorn relaxations for permutations by jointly relaxing both subset size and element-to-subset assignment within a single modular framework (Sutter et al., 2023). The alternating quantization variant (for neural weights) leverages stochastic partitioning to avoid bias and enable fair gradient updates to all parameters (Cavigelli et al., 2020).
Limitations include:
- Computational quadratic scaling for large in differentiable models.
- The tightness of surrogate bounds for KL-divergence terms.
- Complexity of the two-step relaxation and potential for improved relaxations (e.g., earth-mover or transport-based alternatives discussed but not yet incorporated).
Suggested extensions involve improved partition mass function estimation, alternative assignment orderings, and new applications in multimodal and weak-supervised tasks.
7. Related Work and Research Trajectory
The differentiable RPR approach synthesizes advances in stochastic relaxations, such as Gumbel-Softmax for counts and SoftSort for structured assignment, to generalize partition sampling within variational inference frameworks (Sutter et al., 2023). In quantized networks, RPR provides an efficient alternative to methods such as ADMM and Straight-Through Estimators by introducing systematic randomization and relaxation schedules (Cavigelli et al., 2020).
Current and future research directions focus on improving scalability; tightening bounds for variational inference; and broadening the utility of RPR for confounding structure discovery and resource-aware deep learning.