RS-IMLE: Rejection Sampling for IMLE Correction
- RS-IMLE is a method that improves train-test alignment in implicit generative models by modifying the latent prior with a rejection sampling rule.
- The approach corrects the mismatch between localized latents during training and dispersed latents at inference, enhancing sample quality and diversity.
- Empirical results show RS-IMLE achieving lower FID scores and improved recall in few-shot image synthesis and robust performance in real-time multimodal policy learning.
Rejection Sampling Implicit Maximum Likelihood Estimation (RS-IMLE) is a method for improving the train-test alignment and sample quality of deep implicit generative models, most notably in the few-shot regime and in fast policy learning. RS-IMLE modifies the classical IMLE objective by changing the training latent prior via a rejection sampling rule, thereby correcting the mismatch between latents selected during training and those encountered at inference in standard IMLE. RS-IMLE has achieved leading performance across few-shot image synthesis tasks and has been effectively extended to imitation learning for multimodal policy generation in robotics.
1. Foundations: IMLE and Latent Prior Mismatch
Implicit Maximum Likelihood Estimation (IMLE) was originally introduced as a mode-collapse-resistant alternative to GANs. Given a generator mapping latent vectors to images, standard batch IMLE minimizes
where is the dataset size, is the number of samples per iteration, and is a suitable distance metric. One key property is that each data point is matched to its closest generated sample. While effective for large datasets, standard IMLE suffers a pronounced mismatch in few-shot settings: as , the subset of latent codes that are selected during training become highly localized near data examples, whereas unconditional latents sampled at inference are far more dispersed. This distributional misalignment leads to degraded sample quality at test time (Vashist et al., 2024).
2. Theoretical Construction of RS-IMLE
RS-IMLE addresses the alignment problem by designing a new prior over latents via rejection sampling. The approach begins by quantifying, for each data point , the CDF of distances between 0 and generic samples, and the order statistics for the minimal distance among 1 latents. The theoretical analysis shows: 2 with corresponding density
3
This distribution, sharply skewed toward small distances when 4, underlies the mismatch. To compensate, RS-IMLE introduces a rejection rule:
- Sample 5.
- Compute 6.
- Accept 7 only if 8, for threshold 9.
Analytically, this alters the prior to exclude latents that are "too close" to any training exemplar, thereby widening the effective support for both training and test latents and making their distance distributions compatible (Vashist et al., 2024). Optimal selection of 0 balances acceptance rate and coverage; values in the range 1–2 (normalized units) yield 30–50% acceptance.
3. RS-IMLE Algorithms and Variants
The implementation of RS-IMLE involves a rejection filter applied to each batch of sampled latents, followed by nearest neighbor assignment and gradient update for the generator. The key procedural steps are:
- For each training iteration, sample 3 latent codes 4.
- Retain only those 5 for which 6.
- For each data point 7, find the nearest retained 8 and update 9 via gradients of 0.
In the policy learning context, as in the PRISM system, RS-IMLE is augmented with a batch-global rejection criterion. Here, each candidate trajectory generated for a batch element is compared against all demonstrations in the batch, and only those not "too close" (according to a robust sequence-level Charbonnier distance) to any demonstration other than its assigned target are accepted. The rejection threshold 1 is set via an online quantile calibration: 2 smoothed by exponential moving average and clamped within a fixed range (Bhaskar et al., 2 Feb 2026). This global rejection increases diversity across the batch and prevents mode-averaging artifacts.
4. Empirical Results and Performance
RS-IMLE achieves substantial improvements over GANs, diffusion models, and prior IMLE variants in few-shot image synthesis, as measured by Fréchet Inception Distance (FID) and Precision/Recall metrics.
| Dataset | FastGAN | AdaIMLE | RS-IMLE (FID) |
|---|---|---|---|
| Obama | 41.1 | 25.0 | 14.0 |
| Grumpy Cat | 26.6 | 19.1 | 11.5 |
| Panda | 10.0 | 7.6 | 3.5 |
| FFHQ-100 | 54.2 | 33.2 | 12.9 |
ACross nine datasets, RS-IMLE reduces average FID by nearly 46% versus the strongest baseline. In terms of Precision/Recall:
| Dataset | Prec. (AdaIMLE) | Prec. (RS-IMLE) | Rec. (AdaIMLE) | Rec. (RS-IMLE) |
|---|---|---|---|---|
| Obama | 0.99 | 0.98 | 0.68 | 0.82 |
| Grumpy Cat | 0.97 | 0.93 | 0.72 | 0.95 |
| FFHQ-100 | 0.99 | 1.00 | 0.77 | 0.99 |
RS-IMLE maintains or improves sample fidelity (precision) and achieves marked gains in recall (mode coverage) (Vashist et al., 2024).
In imitation learning, PRISM (Performer RS-IMLE) demonstrates state-of-the-art single-pass, real-time control and multimodal behavior coverage:
- MetaWorld: 96.4% (Easy) to 58.0% (Hard) success, surpassing diffusion policies by 10–25% absolute.
- CALVIN: 65.2% success (no dropout) vs. diffusion 36.4% and IMLE 56.2%.
- Real hardware: 10–30% absolute success improvements in manipulation tasks; trajectory jerk reduced by 20x–50x relative to diffusion (Bhaskar et al., 2 Feb 2026).
5. Technical and Computational Considerations
Efficient nearest-neighbor search is critical to the scalability of RS-IMLE, particularly as the rejection filter may require 3 samples per batch and distances must be computed for up to 4 pairs. Accelerations such as approximate NN search (e.g., DCI) and random projections are recommended. For policy learning, FAVOR5 linear attention is used in PRISM to enable low-latency, high-throughput candidate generation.
Threshold tuning represents the main hyperparameter concern: excessively low 6 yields negligible deviation from standard IMLE, while large 7 values decrease acceptance and risk under-representation of difficult regions. Ablation studies indicate stable performance across a moderate 8 range (Vashist et al., 2024). In PRISM, threshold calibration is automated via an EMA-quantile scheme, reducing the need for manual adjustment (Bhaskar et al., 2 Feb 2026).
6. Extensions and Limitations
The RS-IMLE principle may be extended by:
- Adaptive or per-sample threshold adjustment.
- Soft reweighting of latents instead of hard rejection, using the analytically derived importance function 9.
- Application to conditional generative models and to VAE prior design.
- Alternative sequence-level distances and batch selection strategies for multimodal output coverage.
Limitations include the additional computational overhead of rejection sampling and the introduction of (possibly multiple) threshold hyperparameters. Candidate sampling efficiency depends critically on the dimensionality of both latent and data spaces; approximate search and tailored metric selection mitigate this issue to some extent (Vashist et al., 2024, Bhaskar et al., 2 Feb 2026).
7. Impact and Context in Generative Modeling
RS-IMLE constitutes a theoretically motivated and empirically validated correction to the distribution mismatch inherent in IMLE and related models when used for few-shot generation or fast policy sampling. It has provided state-of-the-art results on standard few-shot synthesis benchmarks and enabled practical single-pass, multimodal robotic policy deployment at real-time frequencies, with improved success rates and action smoothness relative to diffusion and flow matching. The method has been applied in both computer vision (few-shot synthesis) and robotics (PRISM), establishing RS-IMLE as a versatile mechanism for aligning training and inference distributions in generative modeling and imitation learning (Vashist et al., 2024, Bhaskar et al., 2 Feb 2026).