Open-DeBias: Efficient Adapter Fusion

Updated 10 December 2025

The paper introduces modular two-layer adapters integrated at each transformer layer to efficiently mitigate biases with minimal extra parameters.
It demonstrates parameter and data efficiency by fine-tuning small adapters on limited data (<5% of full dataset), achieving robust performance on diverse benchmarks.
The approach enables modular and reversible debiasing, preventing catastrophic forgetting by combining attribute-specific adapters through a learned fusion layer.

Parameter- and data-efficient adapter fusion refers to a family of methods for bias mitigation in LLMs that leverage modular adapter architectures and lightweight fusion mechanisms to achieve effective debiasing—particularly in open-set scenarios—while incurring minimal additional parameter and data costs. Methods such as Open-DeBias and DAM instantiate these principles by integrating dedicated adapter modules for bias mitigation at every layer of a deep network and combining their effects via a learned fusion layer. These strategies advance the mitigation of both known and emergent biases in modern NLP models, often preserving or enhancing generalization capabilities, modularity, and reversibility (Rani et al., 28 Sep 2025, Kumar et al., 2023).

1. Adapter Module Architecture and Integration

Each adapter module is a two-layer bottleneck MLP inserted at each transformer layer, immediately before and after the feed-forward sublayer. For a hidden representation $h_\ell \in \mathbb{R}^d$ at layer $\ell$ , the transformation by the $i$ -th adapter $A_i$ is:

$A_i(h_\ell) = W_i^{up} \, \sigma(W_i^{down} h_\ell)$

where $W_i^{down} \in \mathbb{R}^{m \times d}$ (down-projection), $W_i^{up} \in \mathbb{R}^{d \times m}$ (up-projection), $\sigma$ is a nonlinearity (e.g., GeLU), and $m \ll d$ (typically $m \approx d/16$ ). In Open-DeBias, an adapter is placed before and another after the FFN block in each transformer layer, following the "Houlsby-style" insertion. Each adapter introduces only $\sim 0.5$ – $1\%$ of the model’s parameters per adapter, supporting the scaling of multiple adapters per model (Rani et al., 28 Sep 2025).

In parallel approaches such as DAM, similar two-layer adapters are attached for both task and debiasing purposes, enabling per-attribute modular debiasing (Kumar et al., 2023).

2. Fusion Mechanisms for Adapter Outputs

After independently training $N$ adapters, each specialized for a distinct bias or attribute, a lightweight fusion layer is employed at inference and optionally for fusion fine-tuning. In Open-DeBias, this layer learns scalar fusion weights $\alpha_1, \ldots, \alpha_N$ over all adapters. The output at each transformer block is:

$h_\ell' = h_\ell + \mathrm{FFN}(h_\ell) + \sum_{i=1}^N \alpha_i A_i(h_\ell)$

The fusion parameters $\alpha_i$ are trained jointly across data from all categories, allowing the layer to dynamically prioritize adapter corrections depending on context. At test time, these weights enable out-of-category, open-set transfer without further fine-tuning (Rani et al., 28 Sep 2025).

DAM generalizes the fusion step using a single-headed multiplicative attention mechanism. The concatenated outputs of all adapters are stacked as columns of a matrix $V \in \mathbb{R}^{d \times (k+1)}$ , and the fusion representation is:

$\alpha = \mathrm{softmax}(Q^\top K/\sqrt{d_k}), \quad h_\mathrm{fused} = \sum_{j=0}^k \alpha_j V'_{:,j}$

where learned projections $W_q, W_v$ produce $Q$ and $V'$ , and the fusion attends over all adapters (task and debias adapters) (Kumar et al., 2023).

3. Parameter and Data Efficiency

Adapter-based debiasing exhibits strong parameter- and data-efficiency:

Parameter footprint: For DeBERTa-V3-Large ( $d \approx 1024$ , 24 layers), each adapter with $m=64$ uses $\sim 6.7$ M parameters (about $1.7\%$ of the base, per adapter). Open-DeBias with $N=5$ adapters adds only $\sim 33$ M parameters ( $\approx 8\%$ overhead). The fusion layer is negligible in size ( $\ll 0.1$ M parameters) (Rani et al., 28 Sep 2025).
Data requirements: Each adapter requires fine-tuning on a small subset of available data (e.g., $500$ samples per adapter, covering $5$ categories for a total of $2,500$ on BBQ; or as few as $1,500$ in OpenBiasBench). This typically represents $<5\%$ of full datasets, in contrast to methods that require comprehensive full-finetuning (Rani et al., 28 Sep 2025).
Comparison: DAM requires $\sim 24.8$ M trainable parameters ( $\sim 22\%$ of BERT-Base), supporting multiple debiasing functions, compared to full fine-tuning with $110$M trainable parameters (Kumar et al., 2023).

4. Training Objectives and Loss Functions

Bias-mitigating adapter fusion typically proceeds in sequential training phases:

Adapter training: Each adapter is trained on data related to its target bias using standard cross-entropy loss (with the backbone frozen), optionally with adversarial setups for attribute removal.
Fusion tuning: The fusion layer is then trained with all adapters frozen, learning to combine their outputs using either cross-entropy (disambiguated answers) or a hybrid loss with KL-divergence penalties to encourage uniformity on ambiguous cases:

$L = L_{CE} + \lambda L_{KL}$

where $L_{CE}$ is cross-entropy, $L_{KL}$ is the KL-divergence to a uniform distribution over non-neutral choices, applied with $\lambda=0.1$ for ambiguous and $\lambda=0$ otherwise (Rani et al., 28 Sep 2025).

In DAM, task and debias adapters are trained separately. Debias adapters use adversarial training with a gradient reversal layer to explicitly remove attribute information, while the fusion layer balances task and adversarial losses (Kumar et al., 2023).

5. Empirical Findings Across Benchmarks

Extensive benchmarking demonstrates the effectiveness of parameter- and data-efficient adapter fusion:

Benchmark	Open-DeBias Accuracy	Baselines
BBQ–Ambiguous	0.98 (vs. 0.50)	BMBI (Rani et al., 28 Sep 2025)
BBQ–Disambiguated	0.98 (vs. 0.93)	BMBI
OpenBiasBench (avg)	0.91 (vs. 0.52, 0.20)	RACE-finetuned, pretrained LM
Korean BBQ	0.93 ambiguous, 0.85 dis. (vs. 0.48, 0.55)	Pretrained

Multilingual generalization: XLM-RoBERTa with English-trained adapters achieves $0.84$ accuracy on Korean BBQ in zero-shot transfer, indicating the language-agnosticity of adapter fusion debiasing (Rani et al., 28 Sep 2025).
StereoSet/CrowS-Pairs: Adapter-fused DeBERTa and RoBERTa models yield substantially improved bias scores (closer to ideal $50$) compared to PT baselines.
Cost: Open-DeBias adds only $\sim 8\%$ to the parameter count and requires $<5\%$ of the data versus full fine-tuning, with inference cost almost unchanged except for auxiliary adapter projections (Rani et al., 28 Sep 2025).

DAM further demonstrates equivalent or superior bias mitigation to full or adversarial fine-tuning, with strong task performance and effective modular mitigation across multiple attributes (Kumar et al., 2023).

6. Modularity, Catastrophic Forgetting, and On-Demand Debiasing

By encapsulating each bias-mitigation function in a separate adapter, adapter fusion methods such as Open-DeBias and DAM offer strong modularity. Adapters may be 'plugged in' or 'out' at inference time, restoring the original or selectively debiased model states without retraining. This modularity enables:

Efficient reversible debiasing.
Avoidance of catastrophic forgetting, as multi-attribute biases (e.g., gender and age) are mitigated without overwriting each other, contrary to joint fine-tuning approaches where debiasing one attribute can diminish effects on another (Kumar et al., 2023).
Flexible extensibility to new or emergent bias categories by simply training additional adapters and updating fusion weights.

7. Limitations and Future Directions

Current approaches focus on bias mitigation in encoder-only architectures and multiple-choice QA tasks. Extension to open-ended generation models would require dynamic fusion mechanisms at each decoding step. Other open directions include:

Adapter stacking and hierarchical compositions.
Gated or context-dependent fusion weights $\alpha_i(h)$ instead of globally learned scalars.
Active learning or curriculum-based selection for open-set bias example acquisition.
Extension to continuous or intersectional protected attributes, and to decoder-based models for generative tasks (Rani et al., 28 Sep 2025, Kumar et al., 2023).

A plausible implication is that adapter-based fusion techniques will remain well-suited for scenarios demanding scalable, precise, and modular debiasing while preserving model utility and supporting dynamic, attribute-specific interventions.