FMAPLS: Bayesian EM for Label Shift
- The paper introduces a Bayesian EM framework that jointly estimates target class priors and Dirichlet hyperparameters to correct label shift in supervised learning.
- It leverages a closed-form linear surrogate function for efficient hyperparameter updates, reducing KL divergence by up to 40% in severe imbalance scenarios.
- Empirical evaluations on datasets like CIFAR100 and ImageNet-LT demonstrate significant accuracy gains and robustness in both batch and online adaptations.
Full Maximum A Posterior Label Shift (FMAPLS) is a Bayesian framework for label-shift correction in supervised learning. Under the label shift assumption—where the class prior distribution varies between source (training) and target (test) domains, but class-conditional likelihoods remain fixed—FMAPLS enables joint and dynamic estimation of both the unknown target priors and the Dirichlet hyperparameters that govern uncertainty over these priors. The method leverages Expectation-Maximization (EM) algorithms in both batch and online variants and introduces a closed-form Linear Surrogate Function (LSF) for efficient hyperparameter updates. Empirical results demonstrate that FMAPLS and its online form outperform previous maximum a posteriori-based label-shift estimators, particularly under severe class imbalance and distributional uncertainty, in terms of Kullback–Leibler divergence and classification accuracy (Hu et al., 23 Nov 2025).
1. Problem Formulation and Generative Model
FMAPLS addresses the canonical label shift scenario with the following structure:
- Source (training) data: , where is the source class prior , and is the known class-conditional likelihood.
- Target (test) data: , assuming but .
- Classifier: Trained on , provides . Under label shift, .
- Bayesian model: Places a Dirichlet prior on with hyperparameter :
Optionally, a weak prior may be included.
Given test samples , the joint posterior for parameters is (up to normalization):
The log-posterior (incomplete data) is:
2. Batch EM Algorithm for Joint Estimation
FMAPLS employs a batch Expectation-Maximization (EM) procedure by treating the unknown test labels as latent variables:
- E-step: Computes posterior responsibilities
- M-step: Separately maximizes with respect to and using the expected complete-data log-posterior.
- Update for (closed form):
- Update for (MAP estimate for Dirichlet):
In standard MAPLS, this subproblem is solved via gradient ascent involving digamma functions, with significant computation if is large.
3. Linear Surrogate Function (LSF) Update
To overcome the computational and tuning issues of gradient-based updates for , FMAPLS introduces a Linear Surrogate Function (LSF):
- Key mechanism: Replace the -subproblem by enforcing with a large constant :
where .
- Rationale: Direct substitution yields updates that are asymptotically stationary as (gradient terms vanish), so in practice, a suitably large provides accurate approximation without iterative gradients.
- Computational benefit: The per-iteration cost drops from (gradient ascent) to (LSF closed-form).
4. Online-FMAPLS for Streaming Data
The online-FMAPLS variant enables real-time adaptation to non-stationary or streaming data by employing stochastic approximation of sufficient statistics:
- Stochastic responsibilities: At time step , for incoming , compute
Maintain running statistics (per-class) and (total), initialized as , .
- Online update (with forgetting rate ):
- M-step: Update
and set .
- Complexity: per data sample, enabling scalable, real-time operation.
5. Convergence–Accuracy Trade-Off
Under the LSF regime (), the step size of the online algorithm is governed by :
- The iterative increment satisfies .
- Interpretation: Larger yields more accurate (less biased) stationary points, but each update becomes smaller, slowing convergence.
A practical implication is that must be selected to balance estimation accuracy and adaptation speed: large enough for reliability, but not so large as to impede responsiveness, especially under concept drift or shifting priors.
6. Empirical Performance Evaluation
Extensive experiments were conducted on long-tail variants of CIFAR100 () and ImageNet-LT ():
- Training priors: Long-tail imbalanced, controlled by .
- Test priors: Either shuffled long-tail or Dirichlet-drawn (symmetric ).
- Metrics: KL divergence and post-shift classification accuracy.
Results, averaged over 100 runs, confirm:
- FMAPLS reduces KL divergence by up to over MAPLS in settings of severe imbalance () and high prior uncertainty ().
- Up to $3$– absolute accuracy gains over MAPLS in challenging cases.
- Online-FMAPLS achieves up to KL reduction over MAPLS, with only $0.5$– relative accuracy drop versus batch FMAPLS.
- Convergence (measured by KL) stabilizes within $2000$ iterations on CIFAR100 and $10,000$ iterations on ImageNet-LT.
| Method | Update Complexity | KL Reduction vs MAPLS | Typical Acc. Gain |
|---|---|---|---|
| FMAPLS+Gradient | up to | $3$– absolute | |
| FMAPLS+LSF | up to | $3$– absolute | |
| Online-FMAPLS | up to | $0.5$– drop |
7. Implementation and Practical Guidance
FMAPLS is particularly robust in scenarios with pronounced class imbalance and uncertain or dynamically shifting target priors. The dynamic adaptation provides a significant advantage over static-hyperparameter MAPLS approaches.
- LSF hyperparameter should be chosen in the range $10$–$100$; –$100$ achieves reliable stationary points with reasonable convergence speed.
- The forgetting rate for online-FMAPLS should typically fall in , with larger values used for more rapid adaptation in highly non-stationary streams.
- For large , batch FMAPLS is recommended due to its efficiency and statistical stability; online-FMAPLS is appropriate when is small or streaming data is encountered.
FMAPLS offers a Bayesian-EM framework for label-shift correction, accommodating dynamic target priors, with both batch and online variants. Its combination of closed-form surrogate updates and scalable computation makes it suitable for large-scale, imbalanced, or temporally-evolving domains (Hu et al., 23 Nov 2025).