Cross-Class Batch Formation Schemes
- Cross-class batch formation is a method for constructing diverse minibatches by deliberately combining samples from different classes or task types, enhancing learning signal and model generalization.
- Techniques like P-way K-shot sampling, XML, and B³ enable structured negative mining in contrastive and deep metric learning, while fair scheduling in LLM inference optimizes throughput and latency.
- Empirical benchmarks demonstrate improved retrieval accuracy, enhanced zero-shot performance, and substantial capacity gains, validating the theoretical advantages of cross-class batching.
Cross-class batch formation schemes encompass algorithmic strategies designed to systematically construct minibatches for training or inference, wherein individual samples from distinct classes (or heterogeneous task types) are deliberately included within each batch to optimize learning signal, fairness, robustness, or computational objectives. These schemes notably appear in contrastive learning, deep metric learning, and inference scheduling for LLMs. The cross-class constraint contrasts with random sampling or class-homogeneous batching, with substantial empirical and theoretical implications for model generalization, transferability, and system throughput.
1. Definition and Problem Motivation
Cross-class batch formation stipulates that in each minibatch, samples originate from different semantic classes or task partitions. This design is essential in domains such as deep metric learning (DML) and contrastive learning (CL), where the effectiveness of in-batch relationships—positives vs. negatives—directly impacts the loss landscape, feature space geometry, and achievable generalization (Gurbuz et al., 2023, Thirukovalluru et al., 16 May 2025). In systems-level batch scheduling, particularly for LLM inference, cross-class batch formation refers to packing computationally heterogeneous tasks—such as prefill and decode requests—based on real-time priority and service-level objectives to balance throughput and latency (Lyu et al., 16 Oct 2025).
2. Methodologies for Cross-Class Batch Construction
Deep Metric Learning and Prototype-Based Schemes
In DML, batch formation is typically executed via "P-way, K-shot" sampling, where each minibatch consists of classes and instances per class. The cross-batch metric learning (XML) approach systematically splits these sampled classes into two non-overlapping groups, fits prototypes on each half via ridge regression, and cross-regularizes by reconstructing embeddings of one group using prototypes from the other. This cross-class batching and prototype sharing directly enforce feature-space generalizability across classes, thereby improving transfer to unseen classes (Gurbuz et al., 2023).
Contrastive Learning: Batch Mining and Community Detection
In contrastive learning, the Breaking the Batch Barrier (B³) scheme formulates cross-class batch formation as an explicit mining problem. Given that not all in-batch negatives provide useful learning signal, B³ leverages a pretrained teacher encoder to compute a dense similarity graph for the entire training set. Through sparsification—excluding nearest neighbors (likely true positives) and restricting to top- hard negatives—a graph with edges encoding strong inter-class similarity (but not duplication) is obtained. METIS graph partitioning is then used to produce communities such that nodes within a community (future batch) act as mutually strong negatives, thus turning each batch into a dense set of cross-class high-quality negatives (Thirukovalluru et al., 16 May 2025).
Inference Scheduling: Multi-Class Resource Fairness
In LLM inference, cross-class batching is reframed as fair and adaptive resource allocation. FairBatching partitions concurrent requests into urgent decode, prefill, and non-urgent decode classes. At each batching iteration, requests are sorted and selected based on per-token deadlines and resource slack, ensuring that compute resources are not monopolized by any single task class. This cross-class batch scheduler merges heterogeneous requests in a single batch to optimize both TTFT and TPOT metrics, breaking the rigid decode-prioritizing paradigm (Lyu et al., 16 Oct 2025).
3. Algorithmic Frameworks
The algorithmic realization of cross-class batch formation varies by application domain, yet common patterns include deterministic partitioning, offline batch mining, and dynamic scheduling. The following table summarizes representative frameworks:
| Domain | Core Mechanism | Batch Membership Constraint |
|---|---|---|
| Deep Metric Learning | P-way K-shot, XML | Disjoint class partitions; prototype sharing |
| Contrastive Learning | B³, Graph clustering | Each batch: community of mutual hard negatives |
| LLM Inference Scheduling | FairBatching | Mixed prefill, decode; slack-driven assignment |
B³ batch mining proceeds offline, utilizing teacher embeddings to form similarity graphs and METIS to extract communities, while XML repeats cross-class partitioning and regularization within each iteration. FairBatching computes envelopes of batch budgets and sorts heterogeneous requests by slack for each batch formation step.
4. Theoretical and Empirical Implications
Cross-class batch formation alters the loss and gradient landscape, providing the following effects:
- For DML and CL: The mutual presence of diverse, hard negatives in each batch escalates the informativeness of the InfoNCE or metric loss denominator, leading to tighter clustering of true positives and more dispersed embeddings for negatives. XML demonstrates that by fitting prototypes on one class group and applying them to another, feature extractors learn entities transferable to unseen classes. B³ ensures that each batch is nearly saturated with hard negatives, allowing strong performance even at small batch sizes (e.g., batch_size = 64) (Thirukovalluru et al., 16 May 2025, Gurbuz et al., 2023).
- For Inference Scheduling: Cross-class batch formation enforces fairness and resource efficiency by pooling requests of different types (decode, prefill), guaranteeing TPOT SLO compliance while reducing TTFT latency. FairBatching empirically increases peak per-node throughput and substantially lowers long-tail latency, particularly under bursty heterogeneous workloads (Lyu et al., 16 Oct 2025).
5. Practical Implementation and Hyperparameters
Practical deployment of cross-class batch formation involves several dataset- and domain-dependent choices:
- Class and sample selection: In DML, (number of classes per batch) and (number of samples per class) must be tuned for coverage and computational feasibility.
- Prototype configuration: Prototype count (), smoothing temperature (), and ridge regularization in XML affect feature granularity and transfer (Gurbuz et al., 2023).
- Graph parameters: In B³, the thresholds (exclude top neighbors) and (retain next-hardest) determine the sparsity pattern and the strength of community negatives (Thirukovalluru et al., 16 May 2025).
- Batch and resource budgets: FairBatching adapts time and token budgets per batch based on envelope deadlines, SLOs, and real-time system state (Lyu et al., 16 Oct 2025).
6. Empirical Evidence and Benchmarks
Cross-class batch formation schemes demonstrate measurable advantages on established benchmarks:
- B³ achieves state-of-the-art accuracy on the 36-task MMEB multimodal benchmark at both 2B- and 7B-parameter scales, surpassing prior art by +1.3 and +2.9 points, respectively, with batches as small as 64 (versus prior requirements of 256–1024) (Thirukovalluru et al., 16 May 2025).
- XML yields consistent improvements in Recall@1 and MAP@R across CUB, Cars, SOP, and InShop benchmarks, demonstrating that cross-class regularization enhances zero-shot retrieval performance (Gurbuz et al., 2023).
- FairBatching attains on average +20.0% single-node and up to +54.3% cluster-level capacity gain compared to stall-free or decode-prioritizing baselines, with 2.29x TTFT tail latency reduction and strict TPOT compliance (Lyu et al., 16 Oct 2025).
7. Connections, Limitations, and Research Directions
Cross-class batch formation aligns with broader trends in information-rich minibatch construction, curriculum learning, and computational fairness. Efficient offline construction (e.g., B³ graph mining) reduces runtime overhead but fixes batch structure, possibly limiting adaptation to evolving distributions. Online dynamic schemes (e.g., FairBatching) enable responsive resource allocation but are sensitive to system estimation and scheduling latency. A plausible implication is that future developments may combine offline mining and real-time batch adaptivity, further integrating cross-class semantics, difficulty-aware sampling, and holistic system-level SLO control. Cross-domain transfer of these strategies to reinforcement learning, natural language processing, and federated learning remains largely unexplored.