Multilingual Unlearning

Updated 16 January 2026

Multilingual unlearning is the selective removal of language-specific or cross-lingual information from machine learning models while preserving overall utility.
Recent methodologies include gradient-based, geometric subspace projection, and adaptive weighting techniques to balance forgetting and retention across diverse languages.
Evaluation protocols reveal challenges such as partial forgetting, asymmetric transfer, and parameter entanglement, prompting new metrics and benchmarks.

Multilingual unlearning is the process of removing designated information or capabilities from machine learning models that operate across multiple languages, while preserving their general utility and cross-lingual competencies. This practice has become essential in the context of large-scale multilingual LLMs, speech-understanding systems, and compliance-driven applications, driven by privacy (e.g., “right to be forgotten”), security, and fairness requirements. The multilingual setting introduces new technical challenges due to cross-lingual knowledge transfer, entangled parameter representations, and the risk of incomplete or asymmetric forgetting across languages. Current research has led to theory, methodology, and evaluation protocols that are specific to the multilingual context, revealing vulnerabilities in naïve approaches and motivating new unlearning algorithms, metrics, and benchmarks.

1. Problem Formulation and Taxonomy

Multilingual unlearning is operationalized through two main settings: “data unlearning” (erasing data-driven memories) and “concept unlearning” (removing skills, stereotypes, or capabilities), typically formalized as the selective removal of information relating to a forget set $\mathcal{D}_{fgt}^{(\ell)}$ in language $\ell$ while retaining performance on a retain set $\mathcal{D}_{retain}^{(\ell')}$ for all $\ell'$ in a language set $L$ .

The goal is to obtain a model $\theta'$ such that for every language $\ell$ ,

Forgotten knowledge in $\mathcal{D}_{fgt}^{(\ell)}$ is no longer accessible via generation or probing;
Retained knowledge in $\mathcal{D}_{retain}^{(\ell)}$ is preserved;
Overall downstream and generalization capability across $L$ is minimally degraded.

Contemporary approaches address (i) entity-level unlearning (removing all facts about an individual or entity in multiple languages), (ii) instance-level unlearning (removing selected facts across the language set), and (iii) skill or concept unlearning (removing a capability in a language-specific or cross-lingual manner) (Savelli et al., 17 Dec 2025, Li et al., 27 Mar 2025).

2. Algorithms and Methodologies

2.1. Loss-based Gradient Approaches

The majority of baseline methods adapt gradient-based objectives:

Method	Forget Objective	Retain Control	Cross-lingual Property
GradDiff	$\ell$ 0	Yes	Weak, often monolingual
GradDiff-KL	Adds KL-divergence to original on retain set	Stronger retention	Some cross-lingual transfer
NPO	Preference: penalize $\ell$ 1 on forget set w.r.t. reference	Yes	Varies

These methods are instantiated on synthetic and translated benchmarks to examine both in-language and cross-lingual forgetting (Farashah et al., 9 Jan 2026, Lizzo et al., 10 Jan 2026). Empirically, they typically fail to ensure uniform forgetting across languages, particularly in low-resource or typologically distant languages.

2.2. Subspace-Projection and Geometric Approaches

The UNLEARN subspace-projection framework (Lizzo et al., 10 Jan 2026) identifies low-rank parameter subspaces associated with the forgotten knowledge, leveraging both forget-set and retain-set gradients. By projecting model weights orthogonally to these subspaces:

Removing the “interlingua” component achieves cross-lingual forgetting across all languages with minimal utility loss.
Removing language-specific residuals enables targeted (monolingual) forgetting.

This geometric approach is uniquely effective in the multilingual setting, as loss-based methods often cannot isolate shared versus language-specific representations.

2.3. Language-Adaptive Weighted Objectives

LINGTEA (Choi et al., 2024) employs a per-token, per-language weighted loss blending teacher confidence in language $\ell$ 2 with language modeling loss, balancing unlearning and retention adaptively across the language set. The scheme:

Enhances alignment between forgetting and retention pressure in low- and high-resource languages.
Shows robust performance transfer in both token-sequence and factual-knowledge unlearning.

2.4. Inference-time and Training-Free Interventions

Skill unlearning methods such as Neuron Adjust and Key Space Detection (Li et al., 27 Mar 2025) do not update the model parameters, but apply activation modifications or abstention logic at inference. KSD defines per-language FFL key-space hypercubes; queries falling within these are intercepted, effectively abrogating the corresponding skill (e.g., QA in Vietnamese) without impacting unrelated capabilities.

2.5. Data Modality-Specific Pipelines

Spoken Language Understanding (SLU) presents additional complexity due to speaker variation and multimodal features. In UnSLU-BENCH (Koudounas et al., 21 May 2025), eight unlearning techniques (including FT, Negative Gradients, adversarial UNSIR, distillation-based SCRUB/BT/BT-L, and NG $\ell$ 3) are benchmarked across English, Italian, German, and French, with efficacy, utility, and efficiency jointly assessed by the Global Unlearning Metric (GUM).

3. Cross-Lingual Transfer, Asymmetry, and Failure Modes

Multiple studies establish that unlearning in a single language generally fails to remove the corresponding information from other languages—a vulnerability that is especially pronounced in low-resource or less-aligned languages (Lu et al., 2024, Choi et al., 2024, Farashah et al., 9 Jan 2026, Lizzo et al., 10 Jan 2026). Key phenomena include:

Partial forgetting: Forgetting in a source language $\ell$ 4 only partially diminishes recall in target languages $\ell$ 5.
Asymmetric transfer: Forgetting propagates more strongly from low-resource to high-resource languages than vice versa, reflecting parameter entanglement and representational overlaps (Farashah et al., 9 Jan 2026).
Language confusion: English-only unlearning may result in models “escaping” the forgetting objective by generating answers in another language (e.g., outputting a Chinese answer to an English query), leading to surface-form metric failure and false negatives (Hwang et al., 28 Oct 2025).

These results underscore the inadequacy of monolingual or naïve multilingual unlearning approaches.

4. Evaluation Metrics and Benchmarks

Robust evaluation protocols are necessary due to the risk of semantic leakage and interlingua encoding. The following design and metric choices are prominent:

Metric/System	Principle	Limitation
Forget Quality (FQ)	Distribution divergence (e.g., KS test)	May miss semantic leakage
Retain Utility	Harmonic mean over control metrics	Sensitive to metric choice
N-Mix Score	N-gram language detection for confusion	Only detects language mixing
Semantic Retention	LLM-based equivalence checking	Requires reliable evaluators
GUM (SLU)	Blends utility, efficacy (MIA), efficiency	Dependent on “gold model”
FAME/FLORES/TOFU/SLURP/MLQA	Parallel multilingual and synthetic benchmarks	May have limited realism

Established best practices (Lizzo et al., 10 Jan 2026, Hwang et al., 28 Oct 2025, Savelli et al., 17 Dec 2025) direct that semantic-based and language-agnostic evaluation protocols are critical, as surface metrics (BLEU, EM) fail under cross-lingual code-switching or response “escape.”

5. Empirical Findings and Design Insights

The behavior of multilingual unlearning is shaped by both model and task:

Subspace-projection (“UNLEARN”) achieves the highest cross-lingual forgetting quality while preserving utility, confirming that shared interlingua representations are the locus of factual knowledge (Lizzo et al., 10 Jan 2026).
Adaptive weighting in methods like LINGTEA yields better transfer, with particularly robust performance in both high- and low-resource languages (Choi et al., 2024).
Gradient-based unlearning, including GA, GD, and KL minimization, exhibits catastrophic utility drops or incomplete cross-lingual erasure on FAME, TOFU, and synthetic QA benchmarks (Savelli et al., 17 Dec 2025, Farashah et al., 9 Jan 2026).
In SLU, Negative Gradient (NG) methods provide extreme computational efficiency (≥600× speedup over retraining) with strong forgetting efficacy—the highest GUM in every evaluated language-model pair (Koudounas et al., 21 May 2025).
Inference-time KSD delivers >80% forgetting accuracy with <10% impact on retained skills across seven languages, suggesting a scalable tool for skill removal (Li et al., 27 Mar 2025).

6. Open Problems and Future Directions

Despite recent advances, unresolved challenges remain:

Scalability: Application to 10B+ parameter LLMs and beyond remains largely unstudied (Savelli et al., 17 Dec 2025).
Low-resource/extensive language sets: Most results focus on high-resource or typologically similar languages. Subspace and adaptive methods require extension to support 100+ languages, including low-resource and non-Latin scripts (Lizzo et al., 10 Jan 2026, Choi et al., 2024).
Realistic data: Synthetic benchmarks (FAME, TOFU) ensure contamination control but may not capture the full complexity of sensitive, personalized, or nuanced information found in deployment contexts (Savelli et al., 17 Dec 2025).
Evaluation: Reliance on LLM-based evaluators for semantic metrics raises issues of robustness, language coverage, and possible adversarial examples (Hwang et al., 28 Oct 2025).
On-device/continual unlearning: Current methods incur batch or epoch-level compute. Fast, streaming, or parameter-efficient protocols suitable for federated and continual settings are needed (Koudounas et al., 21 May 2025).
Interdisciplinary fairness and compliance: Measuring and guaranteeing no bias amplification or fairness harm post-unlearning is a nascent area (Savelli et al., 17 Dec 2025).

7. Design Recommendations and Best Practices

Research consensus establishes the following guidance for multilingual unlearning:

Always implement unlearning in both English and each target/source language, or in a 50/50 combined forget set. English-only procedures are inadequate due to parameter entanglement and cross-lingual overlap (Lu et al., 2024, Hwang et al., 28 Oct 2025).
Use geometric or subspace-based removal to target shared (interlingua) facts for cross-lingual erasure, and language-specific projections for monolingual forget requests (Lizzo et al., 10 Jan 2026).
Adopt per-token, per-language adaptive weighting schemes where possible (e.g., LINGTEA), harnessing teacher confidences to match model strengths and calibration (Choi et al., 2024).
Evaluate using semantic and language-agnostic metrics, with explicit quantification of language confusion and escape. Apply N-Mix and LLM-based equivalence scoring to discover incomplete or deceptive forgetting (Hwang et al., 28 Oct 2025).
Leverage parameter-efficient approaches (LoRA adapters, CF-k, KSD intervention) for computational efficiency and reversibility (Koudounas et al., 21 May 2025, Li et al., 27 Mar 2025).
Monitor for fairness, robustness, and side effects in rare languages or skill-overlapping queries. Consider typological proximity as a determinant of cross-lingual leakage (Farashah et al., 9 Jan 2026).

Multilingual unlearning, therefore, constitutes a complex, multidimensional challenge that requires principled methodological advances and rigorous, language-aware evaluation. The recent shift toward geometric and adaptive techniques, coupled with deeper cross-lingual analysis, is establishing new standards for responsible, secure, and fair AI systems across global linguistic landscapes.