MultiOSR: Diverse Paradigms in Recognition & Quantization

Updated 4 July 2026

MultiOSR is a context-dependent term that encompasses distinct frameworks such as multi-head open set recognition, multi-attribute analysis, LLM quantization, and wave scattering approximations.
It enables collective decision-making by employing multiple one-vs-rest classifiers or semantic heads to distinguish known from unknown inputs in various applications.
Additionally, it supports layer-wise resource allocation in LLM quantization and approximates multiple scattering effects in wave physics, highlighting its versatile technical implementations.

MultiOSR is a context-dependent research term rather than a single standardized method. In open set recognition, it is used for a multi-head one-vs-rest architecture that classifies known classes while rejecting unknowns (Jang et al., 2021), and for a multi-attribute formulation in which each semantic attribute is treated as its own open-set task (Saranrittichai et al., 2022). In extremely low-bit LLM quantization, it denotes a layer- and linear-wise Over-Sampling Ratio allocation strategy within SDQ-LLM (Xia et al., 27 Sep 2025). In wave scattering, the closely related multiple-OSRC formulation extends on-surface radiation conditions from a single obstacle to multiple obstacles (Acosta, 2013). This suggests that the term is best interpreted through its local disciplinary context.

1. Terminological range

Context	Meaning of MultiOSR	Representative paper
Open set recognition	Shared backbone with multiple one-vs-rest heads and collective rejection	(Jang et al., 2021)
Multi-attribute OSR	Attribute-wise open-set recognition with multiple semantic heads	(Saranrittichai et al., 2022)
LLM quantization	Fine-grained Over-Sampling Ratio allocation across layers and linear modules	(Xia et al., 27 Sep 2025)
Wave scattering	Multiple-obstacle extension of on-surface radiation conditions	(Acosta, 2013)

The open set recognition usages share a concern with separating known from unknown regions, but they do so at different granularities. One formulation uses multiple class-specific heads whose outputs are fused by a collective decision rule, whereas the other generalizes open-set recognition from a single label to a vector of semantically independent attributes (Jang et al., 2021). By contrast, the SDQ-LLM usage concerns quantization resource allocation rather than novelty detection, and the scattering usage concerns outgoing-wave boundary modeling rather than machine learning (Xia et al., 27 Sep 2025).

A common misconception is that “MultiOSR” names a single architecture. The cited literature instead uses the label for distinct technical constructs. A plausible implication is that any rigorous use of the term requires explicit specification of the underlying field and paper lineage.

2. MultiOSR as collective decision of one-vs-rest networks

In open set recognition, the problem is to classify samples from known classes while rejecting samples from unknown classes that were never observed during training. The main difficulty is overgeneralization: a softmax classifier forces every test input into one of the known classes through

$P(y_i \mid l_{y_1},\dots,l_{y_M}) = \frac{\exp(l_{y_i})}{\sum_{m=1}^M \exp(l_{y_m})},$

so an entirely novel input can still receive high confidence for some known class (Jang et al., 2021).

The architecture proposed in “Collective Decision of One-vs-Rest Networks for Open Set Recognition” uses a shared CNN feature extractor $\mathcal{F}$ followed by $M$ one-vs-rest networks $\mathcal{G}_i$ , one for each known class $y_i$ . For an input $\mathbf{x}_j$ , the shared feature is $\mathbf{z}_j=\mathcal{F}(\mathbf{x}_j)$ , and each head produces

$P(y_i \mid \mathbf{x}_j) = \mathcal{G}_i(\mathcal{F}(\mathbf{x}_j)).$

Each $\mathcal{G}_i$ is a feed-forward network with one hidden layer, ReLU activation, a scalar logit $l_{j y_i}$ , and a sigmoid output

$\mathcal{F}$ 0

Unlike a softmax head, each OVRN is trained as an independent binary classifier with positives from class $\mathcal{F}$ 1 and negatives from all other known classes (Jang et al., 2021).

The distinctive part of this MultiOSR formulation is the collective decision score

$\mathcal{F}$ 2

which measures relative evidence for class $\mathcal{F}$ 3 against the average evidence of the remaining heads. The final decision chooses

$\mathcal{F}$ 4

and accepts class $\mathcal{F}$ 5 only if $\mathcal{F}$ 6; otherwise the sample is rejected as unknown. Thresholds $\mathcal{F}$ 7 are set per class so that 95% of training samples of class $\mathcal{F}$ 8 satisfy $\mathcal{F}$ 9 (Jang et al., 2021).

Training uses one-vs-rest binary cross-entropy over all heads,

$M$ 0

For MNIST, the OVRNs use a single hidden layer with 64 hidden units; for CIFAR and other datasets, they use 128 hidden units. Optimization uses Adam with learning rate $M$ 1 (Jang et al., 2021).

Empirically, the method was evaluated on MNIST, EMNIST, Omniglot, CIFAR-10, CIFAR-100, ImageNet subsets, and LSUN, with macro-averaged F1 over known classes plus the unknown class (Jang et al., 2021). The ablation study compared seven variants, including CNN-SoftMax, CNN-Sigmoid, CNN-OVRN, Sigmoid-GF, OVRN-GF, Sigmoid-CD, and OVRN-CD. The key observation was that OVRNs alone add little under naive thresholding, but OVRNs combined with collective decision give the strongest performance, especially at high openness; for CIFAR-10 known versus CIFAR-100 unknown at highest openness, collective decision improved CNN-OVRN by 0.336 in F1 (Jang et al., 2021). On MNIST experiments with Omniglot, MNIST-Noise, and Noise as unknowns, OVRN-CD achieved F1 scores of 0.918, 0.926, and 0.953 respectively, outperforming DOC and CGDL. On CIFAR-10 with ImageNet-crop, ImageNet-resize, LSUN-crop, and LSUN-resize as unknowns, OVRN-CD reached an average F1 of 0.836, above CROSR, DOC, MLOSR, and CGDL (Jang et al., 2021).

The main significance of this usage is architectural. MultiOSR here means a shared feature space with multiple class-specific binary boundaries plus an explicit reject option. The paper frames this as a way to minimize open space for known classes rather than to model unknowns directly (Jang et al., 2021).

3. MultiOSR as multi-attribute open set recognition

A second meaning of MultiOSR is “multi-attribute Open Set Recognition,” which generalizes conventional single-label OSR to a vector-valued label

$M$ 2

Each component $M$ 3 is the value of the $M$ 4-th attribute, such as shape, color, object type, background, shoe material, or shoe type. Training labels lie in

$M$ 5

where $M$ 6 denotes known values of attribute $M$ 7, while test labels lie in

$M$ 8

allowing any attribute to take an unknown value never seen during training (Saranrittichai et al., 2022).

The model output is

$M$ 9

where $\mathcal{G}_i$ 0 is the vector of predicted known attribute values and $\mathcal{G}_i$ 1 contains attribute-wise confidence scores. Each attribute has its own threshold $\mathcal{G}_i$ 2: if $\mathcal{G}_i$ 3, the model predicts a known value $\mathcal{G}_i$ 4; otherwise attribute $\mathcal{G}_i$ 5 is declared unknown (Saranrittichai et al., 2022). In this formulation, out-of-distribution status is attribute-wise. The paper introduces $\mathcal{G}_i$ 6 for $\mathcal{G}_i$ 7, meaning samples that are partially unknown on the $\mathcal{G}_i$ 8-th attribute (Saranrittichai et al., 2022).

Architecturally, the default extension from conventional OSR is a shared feature extractor $\mathcal{G}_i$ 9 with multiple heads $y_i$ 0, one per attribute, so that $y_i$ 1 and $y_i$ 2. The paper evaluates multi-head generalizations of MSP, OpenMax, MLS, ARPL, and ARPL+CS, along with duplicated-model variants such as MSP-D, OpenMax-D, and ARPL-D in which the entire model is duplicated per attribute (Saranrittichai et al., 2022).

The central empirical result is that these simple MultiOSR baselines are vulnerable to shortcuts when spurious cross-attribute correlations exist in the training data. Three data regimes are defined: uncorrelated (UC), semi-correlated (SC), and correlated (C). In UC all combinations of known attribute values are present; in SC only some combinations are present; in C the training set contains a near one-to-one mapping between complex and simple attributes (Saranrittichai et al., 2022). Under correlation, the complex attribute $y_i$ 3 becomes much more fragile than the simple attribute $y_i$ 4. For MSP on Color-MNIST, complex-attribute OSCR drops from 76.2 in UC to 30.3 in SC and 11.4 in C, whereas simple-attribute OSCR changes from 84.9 to 81.8 and 74.4 (Saranrittichai et al., 2022). Average OSCR for MSP on Color-MNIST declines from 80.5 in UC to 56.0 in SC and 42.9 in C (Saranrittichai et al., 2022).

This failure mode is analyzed through a cross-attribute confidence matrix $y_i$ 5, whose rows are heads and whose columns are input types: known, $y_i$ 6, and $y_i$ 7. Ideal behavior would satisfy approximate cross-attribute independence, such as $y_i$ 8 and $y_i$ 9, because each head’s confidence should depend only on whether its own attribute is known or unknown (Saranrittichai et al., 2022). In SC and C, however, the complex-attribute head’s confidence drops when the simple attribute is unknown, even if the complex attribute remains known; in the correlated case, the network “only uses the information of the second attribute to detect the unknown of both attributes” (Saranrittichai et al., 2022).

The evaluation protocol includes per-attribute OSCR and AUROC, plus the Open-Set Explainability Matrix $\mathbf{x}_j$ 0, which measures whether the model identifies which attribute or attributes are unknown. In UC, $\mathbf{x}_j$ 1 is near-diagonal. In SC and C, off-diagonal entries increase and the attribution of unknownness deteriorates, especially for the complex attribute (Saranrittichai et al., 2022). The behavior persists on synthetic datasets such as Color-MNIST, Color-Object, and Scene-Object, and on the real UT-Zappos dataset. On UT-Zappos, MSP gives AUROC 45.3 for shoe material and 67.7 for shoe type, while ARPL gives 51.0 and 69.6 respectively, again showing the asymmetry between complex and simple attributes (Saranrittichai et al., 2022).

The significance of this version of MultiOSR is conceptual. It turns open-set recognition into a multi-dimensional problem in which novelty is localized to semantic factors rather than only to whole-image class identity. It also demonstrates that multi-head decomposition alone does not guarantee attribute-wise independence of confidence scores (Saranrittichai et al., 2022).

4. MultiOSR in SDQ-LLM: fine-grained Over-Sampling Ratio allocation

In SDQ-LLM, MultiOSR refers not to open set recognition but to a layer- and linear-wise Over-Sampling Ratio allocation strategy for extremely low-bit quantization of LLMs. SDQ-LLM applies sigma-delta quantization to weight matrices by first upsampling them by an Over-Sampling Ratio and then quantizing to binary or ternary values. In Algorithm 1, for a weight block $\mathbf{x}_j$ 2, the columns are resampled to length $\mathbf{x}_j$ 3, where $\mathbf{x}_j$ 4, so OSR directly controls representation length and approximation fidelity (Xia et al., 27 Sep 2025).

The corresponding compression ratio is

$\mathbf{x}_j$ 5

where $\mathbf{x}_j$ 6 is the quantizer bit width, such as $\mathbf{x}_j$ 7 for binary or $\mathbf{x}_j$ 8 for ternary, and full precision is taken as 16 bits. Higher OSR increases storage and compute cost but reduces quantization error; lower OSR reduces memory but risks large accuracy loss (Xia et al., 27 Sep 2025).

The MultiOSR contribution is a static OSR schedule $\mathbf{x}_j$ 9 that assigns different OSRs to different transformer layers and to different linear modules within each layer. The paper describes it as “a layer- and linear-wise OSR allocation strategy,” where, given a target average OSR, layers are ranked by total weight variance, and within each layer the allocated OSR is further distributed across linear modules inversely proportional to weight variance and directly proportional to module size (Xia et al., 27 Sep 2025). The motivation is empirical: smaller variance is associated with greater quantization sensitivity, so low-variance weights require higher OSR (Xia et al., 27 Sep 2025).

Operationally, MultiOSR proceeds in four stages. First, an offline analysis computes weight variance and parameter count for each linear submodule. Second, a target average OSR is chosen from the memory budget through $\mathbf{z}_j=\mathcal{F}(\mathbf{x}_j)$ 0. Third, OSR is distributed across layers and then within layers according to variance and size. Fourth, SDQ quantization is run with the assigned per-module OSR, and at inference the activation for each linear module is upsampled by the same OSR to ensure dimensional alignment (Xia et al., 27 Sep 2025). The allocation is static rather than dynamic: OSRs are fixed at quantization time and are not adapted online during inference (Xia et al., 27 Sep 2025).

MultiOSR is integrated with Hadamard-based weight smoothing and sigma-delta quantization. Hadamard smoothing is used before quantization because it makes weight distributions smoother in the time domain and concentrates frequency energy in low-to-mid frequencies, while sigma-delta quantization shapes quantization noise toward high frequencies (Xia et al., 27 Sep 2025). MultiOSR then decides where higher sampling density should be spent.

The ablation on LLaMA3-8B at OSR $\mathbf{z}_j=\mathcal{F}(\mathbf{x}_j)$ 1 isolates the contribution of MultiOSR. Without Hadamard and without MultiOSR, SDQ gives WikiText2 perplexity 2434.87 and C4 perplexity 708.39. Hadamard alone reduces these to 20.13 and 26.26. MultiOSR alone changes them to 2751.13 and 358.08. Hadamard plus MultiOSR yields the best combination, 17.02 on WikiText2 and 24.72 on C4 (Xia et al., 27 Sep 2025). The paper therefore presents MultiOSR as a clear but modest gain on top of Hadamard smoothing, especially under aggressive low-OSR settings (Xia et al., 27 Sep 2025).

This usage of MultiOSR is important because it reinterprets “OSR” as a continuously adjustable resource dimension. Here the “multi” aspect is a structured, sensitivity-aware schedule over modules rather than a collection of recognition heads (Xia et al., 27 Sep 2025).

5. MultiOSR as multiple on-surface radiation conditions

In wave scattering, the relevant construct is the multiple-obstacle extension of on-surface radiation conditions. Classical OSRC replaces the exact Dirichlet-to-Neumann map on a single obstacle boundary $\mathbf{z}_j=\mathcal{F}(\mathbf{x}_j)$ 2 by an approximate local boundary operator $\mathbf{z}_j=\mathcal{F}(\mathbf{x}_j)$ 3, so that $\mathbf{z}_j=\mathcal{F}(\mathbf{x}_j)$ 4 on $\mathbf{z}_j=\mathcal{F}(\mathbf{x}_j)$ 5. The multiple-obstacle extension considers

$\mathbf{z}_j=\mathcal{F}(\mathbf{x}_j)$ 6

and solves the Helmholtz equation in the exterior domain $\mathbf{z}_j=\mathcal{F}(\mathbf{x}_j)$ 7 with Dirichlet boundary data and the Sommerfeld radiation condition (Acosta, 2013).

The central theorem states that the exact scattered field can be decomposed as

$\mathbf{z}_j=\mathcal{F}(\mathbf{x}_j)$ 8

where each $\mathbf{z}_j=\mathcal{F}(\mathbf{x}_j)$ 9 is a purely outgoing field radiating from a single boundary $P(y_i \mid \mathbf{x}_j) = \mathcal{G}_i(\mathcal{F}(\mathbf{x}_j)).$ 0. Each component has its own single-obstacle Dirichlet-to-Neumann map $P(y_i \mid \mathbf{x}_j) = \mathcal{G}_i(\mathcal{F}(\mathbf{x}_j)).$ 1, and the coupling between obstacles is expressed through propagation operators $P(y_i \mid \mathbf{x}_j) = \mathcal{G}_i(\mathcal{F}(\mathbf{x}_j)).$ 2 that transport the field generated on $P(y_i \mid \mathbf{x}_j) = \mathcal{G}_i(\mathcal{F}(\mathbf{x}_j)).$ 3 to another boundary $P(y_i \mid \mathbf{x}_j) = \mathcal{G}_i(\mathcal{F}(\mathbf{x}_j)).$ 4 (Acosta, 2013).

In block form, the exact system becomes

$P(y_i \mid \mathbf{x}_j) = \mathcal{G}_i(\mathcal{F}(\mathbf{x}_j)).$ 5

where the diagonal blocks are identities and the off-diagonal blocks $P(y_i \mid \mathbf{x}_j) = \mathcal{G}_i(\mathcal{F}(\mathbf{x}_j)).$ 6 represent inter-obstacle interactions. Replacing each $P(y_i \mid \mathbf{x}_j) = \mathcal{G}_i(\mathcal{F}(\mathbf{x}_j)).$ 7 by an OSRC approximation $P(y_i \mid \mathbf{x}_j) = \mathcal{G}_i(\mathcal{F}(\mathbf{x}_j)).$ 8 yields the MultiOSR system

$P(y_i \mid \mathbf{x}_j) = \mathcal{G}_i(\mathcal{F}(\mathbf{x}_j)).$ 9

This captures both outgoing-wave behavior and multiple reflections among obstacles (Acosta, 2013). The off-diagonal blocks encode one scattering hop, and under weak scattering the Neumann series

$\mathcal{G}_i$ 0

has the standard interpretation in terms of scattering order. The truncated series with $\mathcal{G}_i$ 1 gives an approximate multiple-scattering expansion (Acosta, 2013).

A major property of this formulation is that, unlike standard boundary integral equations, the kernels in the coupling operators are smooth because $\mathcal{G}_i$ 2 and $\mathcal{G}_i$ 3 lie on disjoint surfaces for $\mathcal{G}_i$ 4. As a result, the Fredholm system can be discretized with standard numerical techniques without singularity removal (Acosta, 2013). Under weak scattering, expressed heuristically through $\mathcal{G}_i$ 5, the resulting iteration bypasses the need to solve a single scattering problem at each step (Acosta, 2013).

The method is explicitly described as approximate and “generally a crude approximate method.” Its numerical examples show improved accuracy with increasing frequency and with higher-order OSRC approximations, but not the systematic refinement behavior of exact boundary integral formulations (Acosta, 2013). The paper therefore emphasizes a practical role for MultiOSR as an inexpensive initial guess or preconditioner for Krylov solutions of boundary integral equations (Acosta, 2013).

This usage is conceptually remote from machine-learning OSR, but structurally it is still about decomposing a global problem into multiple coupled surface-level components. That recurring decomposition theme is one of the few cross-domain commonalities in the literature.

Two adjacent open-set recognition lines are particularly relevant for interpreting the machine-learning meanings of MultiOSR. MMF proposes a loss extension that polarizes class mean activation vectors by maximizing each class’s most significant feature and minimizing its least significant feature. It is added to cross-entropy, ii loss, or triplet loss, and unknown detection is then performed by thresholding the distance from a sample embedding to the nearest class centroid (Jia et al., 2020). On MNIST, for example, ii improves from AUC 0.9578 to 0.9649 with MMF, and on AG from 0.8427 to 0.8694; in overall F1, ii+MMF achieves 0.9308 on MNIST and 0.8991 on MC, while incurring almost no extra training cost relative to ii alone (Jia et al., 2020). M2IOSR, by contrast, discards pixel-level reconstruction entirely and learns class-specific features by maximizing mutual information between inputs and latent features across multiple scales while regularizing latent codes toward class-conditional Gaussian distributions (Sun et al., 2021). Its open-set decision remains a threshold on maximum softmax confidence, fixed at $\mathcal{G}_i$ 6, but the latent structure substantially improves rejection performance; macro F1 reaches 0.892 on MNIST, 0.788 on SVHN, 0.733 on CIFAR10, 0.796 on CIFAR+10, and 0.744 on CIFAR+50, exceeding Softmax, OpenMax, CROSR, CGDL, and GDFR (Sun et al., 2021).

These related methods clarify that open-set performance does not depend only on the final reject rule. In the OVRN-based MultiOSR formulation, gains arise from class-specific heads and collective scoring (Jang et al., 2021). In multi-attribute OSR, the decisive issue is whether attribute-wise confidences remain independent under correlation shifts (Saranrittichai et al., 2022). In MMF and M2IOSR, the emphasis shifts to feature geometry and latent statistics rather than head structure (Jia et al., 2020). A common misconception is therefore that “multi-head” alone explains robust unknown rejection. The multi-attribute study shows that even duplicated models remain vulnerable to shortcuts when training correlations induce cross-attribute confidence dependencies (Saranrittichai et al., 2022).

Taken together, the literature suggests a broad editorial interpretation. In open-set recognition, “MultiOSR” is best read as a family of structured decompositions: multiple class heads, multiple semantic attributes, or multiple latent constraints. In SDQ-LLM, it is a structured OSR schedule over modules rather than classes (Xia et al., 27 Sep 2025). In wave scattering, it is a structured decomposition over obstacles (Acosta, 2013). This suggests that the term consistently signals multiplicity plus explicit coordination, but not a single invariant algorithmic definition.