Papers
Topics
Authors
Recent
Search
2000 character limit reached

Min-K% Probability Analysis

Updated 28 January 2026
  • Min-K% Probability Analysis is a framework that quantifies the lowest k% outcomes using order statistics to assess risk and detect anomalies.
  • It utilizes tight concentration inequalities and quantile estimation techniques to provide high-confidence performance guarantees.
  • The framework applies to diverse settings including language models, discrete distributions, and robust estimation for both model behavior and risk control.

Min-K% Probability Analysis is a statistical framework focused on the analysis and high-confidence quantile estimation of the minimum of a collection of random variables, typically using order statistics such as the minimum of i.i.d. samples or the minima of token-level log-probabilities in autoregressive models. Its central concept is to quantify the most extreme (lowest) fraction—namely, the worst-performing k%—of a sequence of probabilistic outcomes for robust risk and anomaly assessment, pre-training data detection, and reliable high-probability guarantees.

1. Mathematical Formulation and Core Principles

The Min-K% criterion evaluates the behavior of the lowest k% order statistics. Given a sequence {Z1,...,ZT}\{Z_1, ..., Z_T\} (e.g., token-level log-likelihoods), the Min-K% score is the mean of the lowest m values, where m=k%×Tm = \lceil k\% \times T \rceil and the order statistics Z(1)Z(T)Z_{(1)} \le \dots \le Z_{(T)} are sorted. For log-probabilities in LLMs:

Min-K%k(x)=1mj=1m(j)\text{Min-K\%}_k(x) = \frac{1}{m} \sum_{j=1}^m \ell_{(j)}

where (j)\ell_{(j)} are the sorted log-probabilities. This technique generalizes to other settings, including coverage of quantiles and the minimum across i.i.d. random variables. In probabilistic estimation, "Min-K% Probability Analysis" refers to the explicit control and quantification of the smallest (worst) outcomes, and their deviation probabilities, via information-theoretic bounds.

2. Concentration Inequalities for the Minimum

Tight finite-sample bounds on the minimum of i.i.d. random variables, particularly binomial or general discrete/continuous distributions, underpin rigorous Min-K% analysis. Let X1,...,XmX_1, ..., X_m be independent Bin(n,p)\mathrm{Bin}(n,p), and M=miniXiM = \min_i X_i. Explicit nonasymptotic high-probability bounds can be derived as follows (Zhu et al., 25 Feb 2025):

  • Lower bound: For threshold kk, with t=k/nt = k/n,

m=k%×Tm = \lceil k\% \times T \rceil0

  • Upper bound:

m=k%×Tm = \lceil k\% \times T \rceil1

where m=k%×Tm = \lceil k\% \times T \rceil2 is the binary KL divergence and m=k%×Tm = \lceil k\% \times T \rceil3.

For quantile-based selection, identify m=k%×Tm = \lceil k\% \times T \rceil4 corresponding to the m=k%×Tm = \lceil k\% \times T \rceil5-quantile of m=k%×Tm = \lceil k\% \times T \rceil6 via m=k%×Tm = \lceil k\% \times T \rceil7. This framework generalizes: for any collection of i.i.d. variables, the Min-K% behavior is tightly controlled by Sanov-type and Chernoff-style exponential bounds in terms of KL divergence.

3. Min-K% in Statistical Estimation: High-Confidence Quantiles and Missing Mass

In distribution estimation, Min-K% analysis allows for robust large-deviation quantile control. For the missing mass m=k%×Tm = \lceil k\% \times T \rceil8 (the remaining probability of symbols not observed in a sample from a discrete distribution), explicit quantile bounds follow from variance-sensitive large-deviation inequalities (Berend et al., 2012):

m=k%×Tm = \lceil k\% \times T \rceil9

ensuring Z(1)Z(T)Z_{(1)} \le \dots \le Z_{(T)}0, thus controlling the upper Min-K% quantiles. This principle extends to other order statistics, quantiles, and risk measures for high-probability guarantees.

4. Min-K% for Model Behavior: Memorization and Outlier Detection

The Min-K% statistic has been adopted for pre-training data detection in LLMs. For each position Z(1)Z(T)Z_{(1)} \le \dots \le Z_{(T)}1, using the model conditional distribution Z(1)Z(T)Z_{(1)} \le \dots \le Z_{(T)}2, Min-K% selects the lowest-Z(1)Z(T)Z_{(1)} \le \dots \le Z_{(T)}3 log-likelihoods across a sequence. Empirically, these measure the model's weakest predictions, which are highly indicative of whether a sample was seen during training (Zhang et al., 2024).

  • Min-K%++ Extension: Advances beyond averaging the lowest log-probabilities by normalizing token log-probs against the context’s conditional mean and variance, i.e.,

Z(1)Z(T)Z_{(1)} \le \dots \le Z_{(T)}4

This discrete curvature score sharply identifies local maxima ("memorized" points). The Min-K%++ method shows substantial AUROC improvements in training- vs. non-training data detection benchmarks.

Method Statistic Application
Min-K% Worst-Z(1)Z(T)Z_{(1)} \le \dots \le Z_{(T)}5 log-probs Non-training data identification
Min-K%++ Curvature-normalized min Mode/memorization detection

These Min-K% analytics are robust to context variability and reveal sharp boundaries between memorized and non-memorized sequences.

5. Universal Lower Bounds and Sharpness in Continuous Laws

In continuous probability contexts, universal Min-K% lower bounds quantify how likely it is that, conditional on an extreme event (sum exceeding a threshold), the minimum of a set takes small values. For i.i.d. Z(1)Z(T)Z_{(1)} \le \dots \le Z_{(T)}6 with continuous density Z(1)Z(T)Z_{(1)} \le \dots \le Z_{(T)}7 and median Z(1)Z(T)Z_{(1)} \le \dots \le Z_{(T)}8,

Z(1)Z(T)Z_{(1)} \le \dots \le Z_{(T)}9

This logarithmic denominator is unavoidable—sharpened constructions prove the bound is optimal up to constants (Steinerberger, 2018).

6. High-Probability Minimax Rates in Discrete Estimation

In large-deviation regimes, Min-K% quantile control is essential in establishing high-probability minimax lower bounds. For estimating a discrete distribution Min-K%k(x)=1mj=1m(j)\text{Min-K\%}_k(x) = \frac{1}{m} \sum_{j=1}^m \ell_{(j)}0 of support size Min-K%k(x)=1mj=1m(j)\text{Min-K\%}_k(x) = \frac{1}{m} \sum_{j=1}^m \ell_{(j)}1 from Min-K%k(x)=1mj=1m(j)\text{Min-K\%}_k(x) = \frac{1}{m} \sum_{j=1}^m \ell_{(j)}2 samples, the minimax lower bound for the KL risk at confidence level Min-K%k(x)=1mj=1m(j)\text{Min-K\%}_k(x) = \frac{1}{m} \sum_{j=1}^m \ell_{(j)}3 is

Min-K%k(x)=1mj=1m(j)\text{Min-K\%}_k(x) = \frac{1}{m} \sum_{j=1}^m \ell_{(j)}4

No estimator can beat this rate for the Min-K% quantile of the KL loss (Hoeven et al., 23 Jul 2025). Efficient algorithms such as OTB (Online-to-Batch) with suffix averaging achieve matching upper bounds up to additional Min-K%k(x)=1mj=1m(j)\text{Min-K\%}_k(x) = \frac{1}{m} \sum_{j=1}^m \ell_{(j)}5 factors. The penalty in controlling rare Min-K% probability events accounts for the additional sample complexity compared to expected-risk settings.

Setting Min-K% Rate Reference
Expected KL risk Min-K%k(x)=1mj=1m(j)\text{Min-K\%}_k(x) = \frac{1}{m} \sum_{j=1}^m \ell_{(j)}6 (Hoeven et al., 23 Jul 2025)
High-prob Min-K% KL quantile Min-K%k(x)=1mj=1m(j)\text{Min-K\%}_k(x) = \frac{1}{m} \sum_{j=1}^m \ell_{(j)}7 (Hoeven et al., 23 Jul 2025)

7. Connections and Open Problems

Min-K% probability analysis bridges order statistics, large deviations, concentration of measure, and robust estimation. Its methodology elucidates phenomena in both discrete (model quantiles, missing mass, high-confidence tail behavior) and continuous (scale-invariant conditional minima) probabilistic systems. Open questions remain regarding optimal Min-K% bounds for sums of independent random variables, weightings in scale-dependent inequalities, and tightness classes for particular distribution shapes (Steinerberger, 2018).

The Min-K% paradigm thus delivers a general toolkit for quantifying tail events, calibrating risk, and detecting outlier or memorized behavior in high-dimensional statistical and machine learning settings, supported by sharp probabilistic inequalities and algorithmic adaptivity.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Min-K% Probability Analysis.