Min-K% Probability Analysis

Updated 28 January 2026

Min-K% Probability Analysis is a framework that quantifies the lowest k% outcomes using order statistics to assess risk and detect anomalies.
It utilizes tight concentration inequalities and quantile estimation techniques to provide high-confidence performance guarantees.
The framework applies to diverse settings including language models, discrete distributions, and robust estimation for both model behavior and risk control.

Min-K% Probability Analysis is a statistical framework focused on the analysis and high-confidence quantile estimation of the minimum of a collection of random variables, typically using order statistics such as the minimum of i.i.d. samples or the minima of token-level log-probabilities in autoregressive models. Its central concept is to quantify the most extreme (lowest) fraction—namely, the worst-performing k%—of a sequence of probabilistic outcomes for robust risk and anomaly assessment, pre-training data detection, and reliable high-probability guarantees.

1. Mathematical Formulation and Core Principles

The Min-K% criterion evaluates the behavior of the lowest k% order statistics. Given a sequence $\{Z_1, ..., Z_T\}$ (e.g., token-level log-likelihoods), the Min-K% score is the mean of the lowest m values, where $m = \lceil k\% \times T \rceil$ and the order statistics $Z_{(1)} \le \dots \le Z_{(T)}$ are sorted. For log-probabilities in LLMs:

$\text{Min-K\%}_k(x) = \frac{1}{m} \sum_{j=1}^m \ell_{(j)}$

where $\ell_{(j)}$ are the sorted log-probabilities. This technique generalizes to other settings, including coverage of quantiles and the minimum across i.i.d. random variables. In probabilistic estimation, "Min-K% Probability Analysis" refers to the explicit control and quantification of the smallest (worst) outcomes, and their deviation probabilities, via information-theoretic bounds.

2. Concentration Inequalities for the Minimum

Tight finite-sample bounds on the minimum of i.i.d. random variables, particularly binomial or general discrete/continuous distributions, underpin rigorous Min-K% analysis. Let $X_1, ..., X_m$ be independent $\mathrm{Bin}(n,p)$ , and $M = \min_i X_i$ . Explicit nonasymptotic high-probability bounds can be derived as follows (Zhu et al., 25 Feb 2025):

Lower bound: For threshold $k$ , with $t = k/n$ ,

$m = \lceil k\% \times T \rceil$ 0

Upper bound:

$m = \lceil k\% \times T \rceil$ 1

where $m = \lceil k\% \times T \rceil$ 2 is the binary KL divergence and $m = \lceil k\% \times T \rceil$ 3.

For quantile-based selection, identify $m = \lceil k\% \times T \rceil$ 4 corresponding to the $m = \lceil k\% \times T \rceil$ 5-quantile of $m = \lceil k\% \times T \rceil$ 6 via $m = \lceil k\% \times T \rceil$ 7. This framework generalizes: for any collection of i.i.d. variables, the Min-K% behavior is tightly controlled by Sanov-type and Chernoff-style exponential bounds in terms of KL divergence.

3. Min-K% in Statistical Estimation: High-Confidence Quantiles and Missing Mass

In distribution estimation, Min-K% analysis allows for robust large-deviation quantile control. For the missing mass $m = \lceil k\% \times T \rceil$ 8 (the remaining probability of symbols not observed in a sample from a discrete distribution), explicit quantile bounds follow from variance-sensitive large-deviation inequalities (Berend et al., 2012):

$m = \lceil k\% \times T \rceil$ 9

ensuring $Z_{(1)} \le \dots \le Z_{(T)}$ 0, thus controlling the upper Min-K% quantiles. This principle extends to other order statistics, quantiles, and risk measures for high-probability guarantees.

4. Min-K% for Model Behavior: Memorization and Outlier Detection

The Min-K% statistic has been adopted for pre-training data detection in LLMs. For each position $Z_{(1)} \le \dots \le Z_{(T)}$ 1, using the model conditional distribution $Z_{(1)} \le \dots \le Z_{(T)}$ 2, Min-K% selects the lowest- $Z_{(1)} \le \dots \le Z_{(T)}$ 3 log-likelihoods across a sequence. Empirically, these measure the model's weakest predictions, which are highly indicative of whether a sample was seen during training (Zhang et al., 2024).

Min-K%++ Extension: Advances beyond averaging the lowest log-probabilities by normalizing token log-probs against the context’s conditional mean and variance, i.e.,

$Z_{(1)} \le \dots \le Z_{(T)}$ 4

This discrete curvature score sharply identifies local maxima ("memorized" points). The Min-K%++ method shows substantial AUROC improvements in training- vs. non-training data detection benchmarks.

Method	Statistic	Application
Min-K%	Worst- $Z_{(1)} \le \dots \le Z_{(T)}$ 5 log-probs	Non-training data identification
Min-K%++	Curvature-normalized min	Mode/memorization detection

These Min-K% analytics are robust to context variability and reveal sharp boundaries between memorized and non-memorized sequences.

5. Universal Lower Bounds and Sharpness in Continuous Laws

In continuous probability contexts, universal Min-K% lower bounds quantify how likely it is that, conditional on an extreme event (sum exceeding a threshold), the minimum of a set takes small values. For i.i.d. $Z_{(1)} \le \dots \le Z_{(T)}$ 6 with continuous density $Z_{(1)} \le \dots \le Z_{(T)}$ 7 and median $Z_{(1)} \le \dots \le Z_{(T)}$ 8,

$Z_{(1)} \le \dots \le Z_{(T)}$ 9

This logarithmic denominator is unavoidable—sharpened constructions prove the bound is optimal up to constants (Steinerberger, 2018).

6. High-Probability Minimax Rates in Discrete Estimation

In large-deviation regimes, Min-K% quantile control is essential in establishing high-probability minimax lower bounds. For estimating a discrete distribution $\text{Min-K\%}_k(x) = \frac{1}{m} \sum_{j=1}^m \ell_{(j)}$ 0 of support size $\text{Min-K\%}_k(x) = \frac{1}{m} \sum_{j=1}^m \ell_{(j)}$ 1 from $\text{Min-K\%}_k(x) = \frac{1}{m} \sum_{j=1}^m \ell_{(j)}$ 2 samples, the minimax lower bound for the KL risk at confidence level $\text{Min-K\%}_k(x) = \frac{1}{m} \sum_{j=1}^m \ell_{(j)}$ 3 is

$\text{Min-K\%}_k(x) = \frac{1}{m} \sum_{j=1}^m \ell_{(j)}$ 4

No estimator can beat this rate for the Min-K% quantile of the KL loss (Hoeven et al., 23 Jul 2025). Efficient algorithms such as OTB (Online-to-Batch) with suffix averaging achieve matching upper bounds up to additional $\text{Min-K\%}_k(x) = \frac{1}{m} \sum_{j=1}^m \ell_{(j)}$ 5 factors. The penalty in controlling rare Min-K% probability events accounts for the additional sample complexity compared to expected-risk settings.

Setting	Min-K% Rate	Reference
Expected KL risk	$\text{Min-K\%}_k(x) = \frac{1}{m} \sum_{j=1}^m \ell_{(j)}$ 6	(Hoeven et al., 23 Jul 2025)
High-prob Min-K% KL quantile	$\text{Min-K\%}_k(x) = \frac{1}{m} \sum_{j=1}^m \ell_{(j)}$ 7	(Hoeven et al., 23 Jul 2025)

7. Connections and Open Problems

Min-K% probability analysis bridges order statistics, large deviations, concentration of measure, and robust estimation. Its methodology elucidates phenomena in both discrete (model quantiles, missing mass, high-confidence tail behavior) and continuous (scale-invariant conditional minima) probabilistic systems. Open questions remain regarding optimal Min-K% bounds for sums of independent random variables, weightings in scale-dependent inequalities, and tightness classes for particular distribution shapes (Steinerberger, 2018).

The Min-K% paradigm thus delivers a general toolkit for quantifying tail events, calibrating risk, and detecting outlier or memorized behavior in high-dimensional statistical and machine learning settings, supported by sharp probabilistic inequalities and algorithmic adaptivity.

Markdown Report Issue Upgrade to Chat

References (5)

Tight Bounds on the Binomial CDF, and the Minimum of i.i.d Binomials, in terms of KL-Divergence (2025)

On the Concentration of the Missing Mass (2012)

Min-K%++: Improved Baseline for Detecting Pre-Training Data from Large Language Models (2024)

A Sharp Estimate for Probability Distributions (2018)

Nearly Minimax Discrete Distribution Estimation in Kullback-Leibler Divergence with High Probability (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Min-K% Probability Analysis.

Min-K% Probability Analysis

1. Mathematical Formulation and Core Principles

2. Concentration Inequalities for the Minimum

3. Min-K% in Statistical Estimation: High-Confidence Quantiles and Missing Mass

4. Min-K% for Model Behavior: Memorization and Outlier Detection

5. Universal Lower Bounds and Sharpness in Continuous Laws

6. High-Probability Minimax Rates in Discrete Estimation

7. Connections and Open Problems

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Min-K% Probability Analysis

1. Mathematical Formulation and Core Principles

2. Concentration Inequalities for the Minimum

3. Min-K% in Statistical Estimation: High-Confidence Quantiles and Missing Mass

4. Min-K% for Model Behavior: Memorization and Outlier Detection

5. Universal Lower Bounds and Sharpness in Continuous Laws

6. High-Probability Minimax Rates in Discrete Estimation

7. Connections and Open Problems

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research