MX+ Outlier Repurposing Protocols

Updated 22 December 2025

Outlier repurposing (MX+) is a set of techniques that converts standard datasets into outlier benchmarks using methods like down-sampling and mixup-based pseudo-samples.
MX+ protocols enhance detection metrics by addressing challenges in graph-level outlier detection and misclassification/OOD detection, as evidenced by improved ROC-AUC and calibration on benchmark datasets.
In quantized inference for LLMs, MX+ reallocates bits for block outliers to reduce quantization error, thereby maintaining accuracy at ultra-low bitwidths.

Outlier Repurposing (MX+), also known as the MX+ protocol, encompasses a set of techniques in machine learning and systems designed to leverage, transform, or systematically utilize outlier samples for enhanced detection, quantization, or benchmarking across disparate applications. Notably, MX+ appears in three domains: (1) graph-level outlier detection (GLOD) via repurposing classification datasets, (2) misclassification and out-of-distribution detection using pseudo-sample interpolation, and (3) efficient quantized inference for LLMs by handling block outliers in microscaling formats. Each instantiation exploits the unique challenges and opportunities presented by outlier phenomena in its context.

1. Formal Definition and Protocols of MX+ in Outlier Repurposing

The MX+ protocol in graph-level outlier detection involves converting a binary graph classification dataset into an outlier detection benchmark by down-sampling one class to function as outliers. Given two classes $\mathcal{G}_0$ and $\mathcal{G}_1$ with cardinalities $N_0$ , $N_1$ , and a down-sampling ratio $\alpha\in (0,1)$ , a uniform random subset of one class (say, class $c$ ) is selected:

Outlier set: $\mathcal{O} = S_\alpha(\mathcal{G}_c)$ , $| \mathcal{O} | = \lfloor\alpha N_c\rfloor$
Inlier set: $\mathcal{I} = (\mathcal{G}_c \setminus \mathcal{O}) \cup \mathcal{G}_{\bar{c}}$

This repurposing strategy enables empirical study of graph-level outlier detectors in situations where genuine labeled outlier datasets are rare. The construction yields an outlier fraction $\rho = \frac{\alpha N_c}{\alpha N_c + N_{\bar{c}}}$ and supports two variants per dataset (either class as source of outliers) (Zhao et al., 2020).

Within deep learning for misclassification and out-of-distribution (OOD) detection, MX+ is realized through methods such as OpenMix, which synthesizes samples along the outlier–in-distribution (ID) boundary:

Outlier samples (from auxiliary, non-target sources) are mixup-interpolated with ID samples to form pseudo-samples.
These pseudo-samples receive soft target labels partially shared with the true class and a dedicated “reject” class.
A $(k+1)$ -way classifier (with $k$ in-distribution classes plus reject) is trained using both standard and pseudo-sample cross-entropy loss components (Zhu et al., 2023).

For reduced-precision inference in LLMs, MX+ denotes an enhancement to block floating-point (BFP) microscaling formats. It specifically addresses the deleterious effects of block outliers during quantization:

The exponent bits of the block-max element (BM) are repurposed as extended mantissa bits, dramatically increasing quantization precision for the outlier without overhead for remaining block elements.
A minimal increase in per-block metadata (e.g., 8 bits for BM index over 32 elements) achieves robust accuracy at ultra-low bitwidth (especially 4 bits/element) (Lee et al., 16 Oct 2025).

2. Characteristic Phenomena and Theoretical Considerations

Across domains, MX+ methodologies underpin several non-trivial phenomena:

ROC-AUC Flip in GLOD: When applying MX+ to propagation-based GLOD, the down-sampled class (outlier source) can dramatically affect model AUC ( $\mathrm{AUC}_0+\mathrm{AUC}_1\approx 1$ ), with the "flipped" variant performing below random, exposing detector bias (Zhao et al., 2020).
Embedding Dynamics: Propagation-layer depth amplifies within-class density disparities and class-support overlap, worsening AUC flip. Larger propagation leads to sparsification, increasing outlier detectability for the sparser class but reducing it for the denser one.
Pseudo-sample Interpolation: In OpenMix, interpolation between ID and outlier samples populates low-density regions of feature space, anchoring classifier uncertainty and improving confidence calibration near the decision boundary (Zhu et al., 2023).
Block Outlier Precision Bottleneck: In BFP microscaling, a single large element (“block outlier”) distorts shared scale, coarsely quantizing most values and the outlier itself. MX+ mitigates this by reallocating representational bits for the block-max (Lee et al., 16 Oct 2025).

3. Methodological Instantiations

3.1. Graph-Level Outlier Repurposing (GLOD)

The MX+ GLOD pipeline can be summarized:

Component	Description	Reference
Outlier Set ( $\mathcal O$ )	Uniform random subset of one class (down-sampled)	(Zhao et al., 2020)
Inlier Set ( $\mathcal I$ )	Remaining graphs of source class $+$ other class	(Zhao et al., 2020)
Embedding Model	Propagation-based kernel or GNN	(Zhao et al., 2020)
Outlier Detector	Density-based (LOF, k-NN), 1-class SVM, or GNN	(Zhao et al., 2020)

Two critical geometric factors are quantified:

$R_k(x)$ : $k$ -NN radius—higher values indicate sparser (outlier-prone) classes.
$I_k(x)$ : $k$ -NN impurity—high impurity signals overlapping support, prone to AUC flip.

3.2. OpenMix for Misclassification/OOD Detection

The OpenMix protocol is implemented as follows:

For each minibatch, mix ID and outlier samples according to $\lambda\sim\mathrm{Beta}(\alpha, \alpha)$ :

$\breve x = \lambda x + (1-\lambda)\tilde x, \qquad \mathbb{I}^{\breve y} = \lambda\mathbb{I}^y + (1-\lambda)\mathbb{I}^{\tilde y}$

Train a $(k+1)$ -way classifier with:

$\mathcal{L}_{\text{total}} = \mathbb{E}_{(x,y)\sim D_{\text{in}}^{\text{train}}} \ell_{\text{CE}}(f(x), y) + \gamma\, \mathbb{E}_{(\breve x,\breve y)\sim D_{\text{mix\_out}}} \ell_{\text{CE}}(f(\breve x), \breve y)$

At inference, threshold the maximum softmax score on $k$ ID classes for acceptance or rejection.

3.3. MX+ in Low-Bitwidth Quantization

Entity	MXFP4	MXFP4+ (MX+)	Overhead / Advantage
Block element	[S] [EE] [M]	Non-BM: [S] [EE] [M]; BM: [S] [MMM]	BM: 3 mantissa bits, others unchanged
Block metadata	None	8 bits per block (BM index)	4.25 bits/element (+6.25%)
Maximum quantization error	$X/2$ (BM, MXFP4)	$X/8$ (BM, MXFP4+)	BM error reduced by $4\times$
Integration	Standard MX pipeline	Software drop-in, minor hardware extension	No new API, <0.1% TCU area (Lee et al., 16 Oct 2025)

The MX++ variant provides a multi-scale block option, handling multiple outliers by storing additional shared exponents per block.

4. Empirical Findings and Comparative Metrics

4.1. GLOD with MX+

When evaluated over two MX+ variants, ROC-AUCs are approximately complementary: high for one variant, low for the other, with the sum close to 1.
The AUC gap is amplified as propagation depth increases, emphasizing the need to balance expressivity against density disparity.
Down-sampling the sparser class as outliers aligns with the assumption that outliers occupy low-density regions, leading to more faithful anomaly detection (Zhao et al., 2020).

4.2. OpenMix and Outlier Transformation

On CIFAR-10, WideResNet backbone: AURC drops from $4.76\times10^{-3}$ to $2.32\times10^{-3}$ , AUROC rises from $93.14\%$ to $94.81\%$ , FPR95 falls from $30.15\%$ to $22.08\%$ , while test accuracy remains stable at $\sim97\%$ .
Comparable gains persist for CIFAR-100 and under distribution shift (CIFAR-10-C): AUROC improves from $\sim79.9\%$ to $85.0\%$ .
OpenMix simultaneously boosts misclassification and OOD detection without compressing in-distribution feature space excessively (Zhu et al., 2023).

4.3. MX+ Quantization for LLM Inference

Representative perplexity (lower is better):

Model	BF16	MXFP6	MXFP6+	MXFP4	MXFP4+
Llama-3.1-8B (W2)	6.97	7.19	7.10	9.54	9.22
OPT-66B (C4)	10.90	11.19	10.94	12.01	11.85

Zero-shot accuracy: MXFP4 fails catastrophically ( $<50\%$ avg), while MXFP4+ restores $70$– $75\%$ , a $+42\%$ gain.
Latency overhead: software implementation +4–8% (small), direct hardware $<1\%$ .
Memory overhead: $+6.25\%$ storage for 4-bit blocks (Lee et al., 16 Oct 2025).

5. Practical Guidelines and Limitations

GLOD Application:
- Always test both down-sample variants and report individual AUCs.
- Choose small $\alpha$ (e.g., $0.05$–$0.1$) for outlier rarity.
- Control propagation/embedding depth to avoid excessive sparsification.
- Evaluate both class impurity and density for informed practice.
- Avoid reporting averaged AUC over variants, as this can conceal flip scenarios (Zhao et al., 2020).
OpenMix/OOD Detection:
- Avoid compressing ID feature space as with Outlier Exposure; instead, apply mixup-style interpolation for pseudo-sample generation.
- Both misclassification detection and OOD detection benefit, regardless of distribution shift (Zhu et al., 2023).
Quantized LLM Serving (MX+):
- Deploy MX+ for ultra-low bitwidth scenarios (notably MXFP4).
- Infrastructure overhead is negligible; software/hardware paths are minimally affected.
- For blocks with multiple outliers, “multi-BM” (MX++) or channel-wise reordering offer incremental improvements.

6. Limitations, Future Directions, and Open Challenges

In GLOD benchmarks, the MX+ protocol's reliance on down-sampling a classification class may not always correspond to semantically meaningful “anomalies”, particularly if that class overlaps significantly with the inlier class or has higher within-class density. This suggests that current outlier repurposing protocols require further refinement to robustly associate outliers with true low-density or isolated regions.
Block outlier handling in microscaling remains limited when more than one outlier is present; “multi-BM” schemes or enhanced channel-wise data layouts may address this for LLMs.
Formal error bound analysis for full pipeline post-training quantization under MX+ remains an open problem.
Embedding strategies in GLOD could further benefit from techniques that reduce overlap or equalize class densities post-propagation.

Systematic design and selection of outlier repurposing protocols, along with embedding and quantization strategies that correctly balance class density and overlap, remain important open areas for robust extensions of MX+ across domains. Continued progress will require detailed empirical characterization and development of model/embedding-aware outlier repurposing methods (Zhao et al., 2020, Zhu et al., 2023, Lee et al., 16 Oct 2025).