Papers
Topics
Authors
Recent
Search
2000 character limit reached

RS-IMLE: Rejection Sampling for IMLE Correction

Updated 19 May 2026
  • RS-IMLE is a method that improves train-test alignment in implicit generative models by modifying the latent prior with a rejection sampling rule.
  • The approach corrects the mismatch between localized latents during training and dispersed latents at inference, enhancing sample quality and diversity.
  • Empirical results show RS-IMLE achieving lower FID scores and improved recall in few-shot image synthesis and robust performance in real-time multimodal policy learning.

Rejection Sampling Implicit Maximum Likelihood Estimation (RS-IMLE) is a method for improving the train-test alignment and sample quality of deep implicit generative models, most notably in the few-shot regime and in fast policy learning. RS-IMLE modifies the classical IMLE objective by changing the training latent prior via a rejection sampling rule, thereby correcting the mismatch between latents selected during training and those encountered at inference in standard IMLE. RS-IMLE has achieved leading performance across few-shot image synthesis tasks and has been effectively extended to imitation learning for multimodal policy generation in robotics.

1. Foundations: IMLE and Latent Prior Mismatch

Implicit Maximum Likelihood Estimation (IMLE) was originally introduced as a mode-collapse-resistant alternative to GANs. Given a generator Tθ(z)T_\theta(z) mapping latent vectors zN(0,I)z \sim \mathcal{N}(0, I) to images, standard batch IMLE minimizes

θIMLE=argminθEz1,...,zmN(0,I)[i=1nminj[m]d(xi,Tθ(zj))]\theta_{\mathrm{IMLE}} = \arg\min_\theta \mathbb{E}_{z_1, ..., z_m \sim \mathcal{N}(0, I)} \left[\sum_{i=1}^n \min_{j \in [m]} d(x_i, T_\theta(z_j)) \right]

where nn is the dataset size, mnm \geq n is the number of samples per iteration, and d(,)d(\cdot,\cdot) is a suitable distance metric. One key property is that each data point xix_i is matched to its closest generated sample. While effective for large datasets, standard IMLE suffers a pronounced mismatch in few-shot settings: as mnm \gg n, the subset of latent codes that are selected during training become highly localized near data examples, whereas unconditional latents sampled at inference are far more dispersed. This distributional misalignment leads to degraded sample quality at test time (Vashist et al., 2024).

2. Theoretical Construction of RS-IMLE

RS-IMLE addresses the alignment problem by designing a new prior over latents via rejection sampling. The approach begins by quantifying, for each data point xix_i, the CDF of distances FDi,1(t)F_{D_{i,1}}(t) between zN(0,I)z \sim \mathcal{N}(0, I)0 and generic samples, and the order statistics for the minimal distance among zN(0,I)z \sim \mathcal{N}(0, I)1 latents. The theoretical analysis shows: zN(0,I)z \sim \mathcal{N}(0, I)2 with corresponding density

zN(0,I)z \sim \mathcal{N}(0, I)3

This distribution, sharply skewed toward small distances when zN(0,I)z \sim \mathcal{N}(0, I)4, underlies the mismatch. To compensate, RS-IMLE introduces a rejection rule:

  • Sample zN(0,I)z \sim \mathcal{N}(0, I)5.
  • Compute zN(0,I)z \sim \mathcal{N}(0, I)6.
  • Accept zN(0,I)z \sim \mathcal{N}(0, I)7 only if zN(0,I)z \sim \mathcal{N}(0, I)8, for threshold zN(0,I)z \sim \mathcal{N}(0, I)9.

Analytically, this alters the prior to exclude latents that are "too close" to any training exemplar, thereby widening the effective support for both training and test latents and making their distance distributions compatible (Vashist et al., 2024). Optimal selection of θIMLE=argminθEz1,...,zmN(0,I)[i=1nminj[m]d(xi,Tθ(zj))]\theta_{\mathrm{IMLE}} = \arg\min_\theta \mathbb{E}_{z_1, ..., z_m \sim \mathcal{N}(0, I)} \left[\sum_{i=1}^n \min_{j \in [m]} d(x_i, T_\theta(z_j)) \right]0 balances acceptance rate and coverage; values in the range θIMLE=argminθEz1,...,zmN(0,I)[i=1nminj[m]d(xi,Tθ(zj))]\theta_{\mathrm{IMLE}} = \arg\min_\theta \mathbb{E}_{z_1, ..., z_m \sim \mathcal{N}(0, I)} \left[\sum_{i=1}^n \min_{j \in [m]} d(x_i, T_\theta(z_j)) \right]1–θIMLE=argminθEz1,...,zmN(0,I)[i=1nminj[m]d(xi,Tθ(zj))]\theta_{\mathrm{IMLE}} = \arg\min_\theta \mathbb{E}_{z_1, ..., z_m \sim \mathcal{N}(0, I)} \left[\sum_{i=1}^n \min_{j \in [m]} d(x_i, T_\theta(z_j)) \right]2 (normalized units) yield 30–50% acceptance.

3. RS-IMLE Algorithms and Variants

The implementation of RS-IMLE involves a rejection filter applied to each batch of sampled latents, followed by nearest neighbor assignment and gradient update for the generator. The key procedural steps are:

  1. For each training iteration, sample θIMLE=argminθEz1,...,zmN(0,I)[i=1nminj[m]d(xi,Tθ(zj))]\theta_{\mathrm{IMLE}} = \arg\min_\theta \mathbb{E}_{z_1, ..., z_m \sim \mathcal{N}(0, I)} \left[\sum_{i=1}^n \min_{j \in [m]} d(x_i, T_\theta(z_j)) \right]3 latent codes θIMLE=argminθEz1,...,zmN(0,I)[i=1nminj[m]d(xi,Tθ(zj))]\theta_{\mathrm{IMLE}} = \arg\min_\theta \mathbb{E}_{z_1, ..., z_m \sim \mathcal{N}(0, I)} \left[\sum_{i=1}^n \min_{j \in [m]} d(x_i, T_\theta(z_j)) \right]4.
  2. Retain only those θIMLE=argminθEz1,...,zmN(0,I)[i=1nminj[m]d(xi,Tθ(zj))]\theta_{\mathrm{IMLE}} = \arg\min_\theta \mathbb{E}_{z_1, ..., z_m \sim \mathcal{N}(0, I)} \left[\sum_{i=1}^n \min_{j \in [m]} d(x_i, T_\theta(z_j)) \right]5 for which θIMLE=argminθEz1,...,zmN(0,I)[i=1nminj[m]d(xi,Tθ(zj))]\theta_{\mathrm{IMLE}} = \arg\min_\theta \mathbb{E}_{z_1, ..., z_m \sim \mathcal{N}(0, I)} \left[\sum_{i=1}^n \min_{j \in [m]} d(x_i, T_\theta(z_j)) \right]6.
  3. For each data point θIMLE=argminθEz1,...,zmN(0,I)[i=1nminj[m]d(xi,Tθ(zj))]\theta_{\mathrm{IMLE}} = \arg\min_\theta \mathbb{E}_{z_1, ..., z_m \sim \mathcal{N}(0, I)} \left[\sum_{i=1}^n \min_{j \in [m]} d(x_i, T_\theta(z_j)) \right]7, find the nearest retained θIMLE=argminθEz1,...,zmN(0,I)[i=1nminj[m]d(xi,Tθ(zj))]\theta_{\mathrm{IMLE}} = \arg\min_\theta \mathbb{E}_{z_1, ..., z_m \sim \mathcal{N}(0, I)} \left[\sum_{i=1}^n \min_{j \in [m]} d(x_i, T_\theta(z_j)) \right]8 and update θIMLE=argminθEz1,...,zmN(0,I)[i=1nminj[m]d(xi,Tθ(zj))]\theta_{\mathrm{IMLE}} = \arg\min_\theta \mathbb{E}_{z_1, ..., z_m \sim \mathcal{N}(0, I)} \left[\sum_{i=1}^n \min_{j \in [m]} d(x_i, T_\theta(z_j)) \right]9 via gradients of nn0.

In the policy learning context, as in the PRISM system, RS-IMLE is augmented with a batch-global rejection criterion. Here, each candidate trajectory generated for a batch element is compared against all demonstrations in the batch, and only those not "too close" (according to a robust sequence-level Charbonnier distance) to any demonstration other than its assigned target are accepted. The rejection threshold nn1 is set via an online quantile calibration: nn2 smoothed by exponential moving average and clamped within a fixed range (Bhaskar et al., 2 Feb 2026). This global rejection increases diversity across the batch and prevents mode-averaging artifacts.

4. Empirical Results and Performance

RS-IMLE achieves substantial improvements over GANs, diffusion models, and prior IMLE variants in few-shot image synthesis, as measured by Fréchet Inception Distance (FID) and Precision/Recall metrics.

Dataset FastGAN AdaIMLE RS-IMLE (FID)
Obama 41.1 25.0 14.0
Grumpy Cat 26.6 19.1 11.5
Panda 10.0 7.6 3.5
FFHQ-100 54.2 33.2 12.9

ACross nine datasets, RS-IMLE reduces average FID by nearly 46% versus the strongest baseline. In terms of Precision/Recall:

Dataset Prec. (AdaIMLE) Prec. (RS-IMLE) Rec. (AdaIMLE) Rec. (RS-IMLE)
Obama 0.99 0.98 0.68 0.82
Grumpy Cat 0.97 0.93 0.72 0.95
FFHQ-100 0.99 1.00 0.77 0.99

RS-IMLE maintains or improves sample fidelity (precision) and achieves marked gains in recall (mode coverage) (Vashist et al., 2024).

In imitation learning, PRISM (Performer RS-IMLE) demonstrates state-of-the-art single-pass, real-time control and multimodal behavior coverage:

  • MetaWorld: 96.4% (Easy) to 58.0% (Hard) success, surpassing diffusion policies by 10–25% absolute.
  • CALVIN: 65.2% success (no dropout) vs. diffusion 36.4% and IMLE 56.2%.
  • Real hardware: 10–30% absolute success improvements in manipulation tasks; trajectory jerk reduced by 20x–50x relative to diffusion (Bhaskar et al., 2 Feb 2026).

5. Technical and Computational Considerations

Efficient nearest-neighbor search is critical to the scalability of RS-IMLE, particularly as the rejection filter may require nn3 samples per batch and distances must be computed for up to nn4 pairs. Accelerations such as approximate NN search (e.g., DCI) and random projections are recommended. For policy learning, FAVORnn5 linear attention is used in PRISM to enable low-latency, high-throughput candidate generation.

Threshold tuning represents the main hyperparameter concern: excessively low nn6 yields negligible deviation from standard IMLE, while large nn7 values decrease acceptance and risk under-representation of difficult regions. Ablation studies indicate stable performance across a moderate nn8 range (Vashist et al., 2024). In PRISM, threshold calibration is automated via an EMA-quantile scheme, reducing the need for manual adjustment (Bhaskar et al., 2 Feb 2026).

6. Extensions and Limitations

The RS-IMLE principle may be extended by:

  • Adaptive or per-sample threshold adjustment.
  • Soft reweighting of latents instead of hard rejection, using the analytically derived importance function nn9.
  • Application to conditional generative models and to VAE prior design.
  • Alternative sequence-level distances and batch selection strategies for multimodal output coverage.

Limitations include the additional computational overhead of rejection sampling and the introduction of (possibly multiple) threshold hyperparameters. Candidate sampling efficiency depends critically on the dimensionality of both latent and data spaces; approximate search and tailored metric selection mitigate this issue to some extent (Vashist et al., 2024, Bhaskar et al., 2 Feb 2026).

7. Impact and Context in Generative Modeling

RS-IMLE constitutes a theoretically motivated and empirically validated correction to the distribution mismatch inherent in IMLE and related models when used for few-shot generation or fast policy sampling. It has provided state-of-the-art results on standard few-shot synthesis benchmarks and enabled practical single-pass, multimodal robotic policy deployment at real-time frequencies, with improved success rates and action smoothness relative to diffusion and flow matching. The method has been applied in both computer vision (few-shot synthesis) and robotics (PRISM), establishing RS-IMLE as a versatile mechanism for aligning training and inference distributions in generative modeling and imitation learning (Vashist et al., 2024, Bhaskar et al., 2 Feb 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to RS-IMLE.