Continual Alignment in Sequential Models

Updated 4 July 2026

Continual Alignment is a sequential learning principle that preserves critical relations (e.g., cross-modal, linguistic, or latent) to mitigate catastrophic forgetting.
It employs techniques like replay-based regularization, geometric alignment, and modular adapters to maintain consistency between old and new data representations.
This approach is vital in applications from multilingual instruction tuning to medical segmentation, ensuring stable performance even under continual adaptation.

Continual alignment denotes a family of sequential-learning formulations in which the principal object to be preserved is an alignment relation that would otherwise drift under continued adaptation. Across the recent literature, that relation may be crosslingual instruction-following behavior, latent representations in online self-supervised learning, historical and current features in medical segmentation, entity correspondence in growing knowledge graphs, classifier–backbone compatibility, or multimodal embedding structure linking text, video, audio, and images (Cahyawijaya et al., 2023, Cignoni et al., 14 Jul 2025, Ye et al., 4 Jul 2025, Wang et al., 2022, Tran et al., 10 Mar 2026, Wang et al., 28 Jan 2026). The term therefore does not identify a single algorithmic template. It names a recurring principle: when models are updated sequentially, some structured dependency between old and new data, tasks, modalities, or behaviors must be explicitly maintained, re-imposed, or routed to mitigate catastrophic forgetting, feature drift, or alignment tax.

1. Conceptual scope and research usages

In the current literature, “alignment” is used in several technically distinct senses. In multilingual post-training, it can mean aligning newly introduced low-resource languages to an instruction-tuned competence space rather than performing preference alignment (Cahyawijaya et al., 2023). In self-supervised continual learning, it can mean aligning current latent representations to past latents through an EMA teacher or stored replay embeddings (Cignoni et al., 14 Jul 2025). In medical segmentation, it can mean jointly aligning current and previous networks, and aligning historical and current representations inside the current network (Ye et al., 4 Jul 2025). In knowledge graphs, it refers to the discovery and revision of entity correspondences as graphs grow over time (Wang et al., 2022). In multimodal retrieval and generation, it refers to preserving a shared cross-modal embedding geometry or an alignment module that connects frozen visual and language backbones (Wang et al., 28 Jan 2026, Kong et al., 10 Jun 2026). In sequential safety post-training, it refers to preserving safe behavior and general capability under heterogeneous fine-tuning stages (Sun et al., 8 Feb 2026).

Setting	What is aligned	Representative papers
Multilingual instruction tuning	New-language inputs with prior instruction-following behavior	InstructAlign (Cahyawijaya et al., 2023)
Online SSL and rehearsal CL	Current latents with past latents or replay features	CLA (Cignoni et al., 14 Jul 2025), DualHSIC (Wang et al., 2023)
Medical and dense prediction	Cross-network and cross-representation dependencies	DAKR-HSIC (Ye et al., 4 Jul 2025), CA-SAM (Wang et al., 21 Nov 2025)
Knowledge graphs	Entity correspondences in growing KGs	ContEA (Wang et al., 2022)
Continual discovery and retrieval	Features with fixed geometric or cross-modal targets	GOAL (Han et al., 23 Feb 2026), StructAlign (Wang et al., 28 Jan 2026)
LLM safety and post-training	Safety constraints with retained general capability	OGPSA (Sun et al., 8 Feb 2026), Alignment Dynamics (Huang et al., 18 May 2026)

This multiplicity is not merely terminological. It reflects a substantive shift from viewing continual learning as parameter retention alone to viewing it as preservation of a relation: language-to-language, past-to-present latent, teacher-to-student feature, class-to-geometry, or prompt-to-safe behavior.

2. Recurrent failure modes that motivate continual alignment

The most common motivating failure mode is catastrophic forgetting, but the cited works sharpen that diagnosis in domain-specific ways. In continual text-to-video retrieval, forgetting is decomposed into intra-modal feature drift and non-cooperative feature drift across modalities, the latter causing text and video encoders to become misaligned even when each remains internally organized (Wang et al., 28 Jan 2026). In pre-trained class-incremental learning, the central problem is that task-specific classifiers can become incompatible with the new shared feature space after the backbone is adapted or merged, creating classifier–backbone mismatch rather than simple feature loss (Tran et al., 10 Mar 2026). In online continual self-supervised learning, the instability arises because data arrive in small minibatches, task boundaries are absent, and feature drift accumulates too quickly for ordinary replay alone (Cignoni et al., 14 Jul 2025).

In sequential post-training of LLMs, the literature increasingly casts the problem as heterogeneous continual learning. OGPSA explicitly describes safety alignment as objective-heterogeneous sequential optimization in which safety updates can overwrite reasoning, truthfulness, or instruction-following behaviors, producing alignment tax (Sun et al., 8 Feb 2026). A data-centric variant reaches a related conclusion from the opposite direction: high-gradient samples cause greater safety degradation and drive models toward pretrained distributions, while moderate-gradient samples enable task learning with minimal alignment loss (Bach et al., 19 Apr 2026). “Alignment Dynamics in LLM Fine-Tuning” further argues that fragility cannot be understood solely as either gradient geometry or output-distribution shift, because both interact through alignment dynamics (Huang et al., 18 May 2026).

Other domains expose additional pathologies. In growing knowledge graphs, continual entity alignment must cope with new entities and triples, revision of old predictions, and the presence of non-matchable entities for which ordinary one-way nearest-neighbor matching is unreliable (Wang et al., 2022). In multilingual adaptation, InstructAlign argues that directly adapting new languages to instruction-tuned LLMs can cause catastrophic forgetting, and that modular adapter approaches such as MAD-X can harm multilingual inference because language competence is isolated into modules instead of being integrated into a single prompting-capable model (Cahyawijaya et al., 2023). In continual audio-video pre-training, STELLA identifies sparse spatio-temporal correlation between audio-video pairs and multimodal correlation overwriting that forgets audio-video relations (Lee et al., 2023).

3. Main methodological families

One large family treats continual alignment as explicit replay-conditioned representation regularization. DualHSIC adds HSIC-Bottleneck for Rehearsal and HSIC Alignment to rehearsal-based class-incremental learning: the first lessens inter-task interference on replayed data, and the second maximizes statistical dependence between current-task and buffered final-layer representations to promote task-invariant knowledge sharing (Wang et al., 2023). CLA makes a closely related move in online continual self-supervised learning, but replaces task-boundary-dependent teachers with online targets from an EMA network or stored replay embeddings; its variants CLA-b, CLA-E, and CLA-R align present latents to temporally earlier ones while keeping the method compatible with a fixed computational budget (Cignoni et al., 14 Jul 2025). In continual UDA, “Multi-scale Feature Alignment for Continual Learning of Unlabeled Domains” combines generative feature-driven image replay with a dual-purpose discriminator and multi-scale feature aggregation, so that the same mechanism supports both replay realism and domain alignment across multiple feature depths (Thandiackal et al., 2023).

A second family aligns historical and current structure inside the model. DAKR-HSIC for domain-continual medical segmentation couples Cross-Network Alignment, which aligns bottleneck features from current and previous networks on buffered data, with Cross-Representation Alignment, which maximizes nonlinear HSIC between bottleneck features extracted from buffered historical data and current-domain data after feature mapping and permutation-based pairing (Ye et al., 4 Jul 2025). Adapt & Align adopts a similar consolidation perspective for generative models: a local model is first trained on the new task, and then a translator aligns the local latent space with a shared global latent space so that a persistent decoder or generator can consolidate old and new knowledge without old real data (Deja et al., 2023).

A third family imposes a fixed geometric target. GOAL replaces dynamic classifier updates in continual generalized category discovery with a fixed Equiangular Tight Frame classifier; supervised alignment anchors labeled base-session features to assigned ETF directions, and confidence-guided alignment maps confidently clustered novel samples to unused ETF prototypes, thereby integrating new classes without changing old class anchors (Han et al., 23 Feb 2026). StructAlign extends the same geometric intuition to multimodal retrieval: a cross-modal ETF alignment loss pulls both text and video features toward category-level ETF prototypes, while Cross-modal Relation Preserving loss distills similarity relations from the previous model to suppress intra-modal drift (Wang et al., 28 Jan 2026). Local Classifier Alignment, by contrast, is local rather than global: it samples Gaussian class features under the current backbone and trains classifiers with a class-conditioned local consistency term so that loss varies little between nearby same-class samples (Tran et al., 10 Mar 2026).

A fourth family isolates or routes alignment modules rather than full backbones. CA-SAM inserts a lightweight Alignment Layer between frozen SAM encoder and decoder, learns one such layer per task, and uses a VAE-based task router with OOD fallback to the identity alignment, thereby preserving both prior task adapters and SAM’s zero-shot priors on unseen domains (Wang et al., 21 Nov 2025). ECA makes the alignment module itself the object of continual adaptation in BLIP-2-style image-to-text generation: Mixture of Query adapts task-specific query tokens, Fisher Dynamic Expansion adds new Parallel Adapters only when a Fisher-based conflict score exceeds $0.5$, and Dictionary Replay preserves the alignment function in embedding space without raw exemplars (Kong et al., 10 Jun 2026). STELLA likewise acts at the level of patch selection rather than full-model replay, using Localized Patch Importance Scoring and Replay-guided Correlation Assessment to perform probabilistic patch selection in continual audio-video pre-training (Lee et al., 2023).

A fifth family works in gradient space or data space. OGPSA estimates a low-rank capability subspace from gradients on a small reference set and projects each safety gradient onto its orthogonal complement before the update, so that safety-directed changes minimally perturb general capabilities (Sun et al., 8 Feb 2026). “Continual Safety Alignment via Gradient-Based Sample Selection” instead keeps the model and objective fixed and filters the data: it computes per-sample gradient norms, keeps samples near the median gradient norm after loss pre-filtering, and trains only on those selected samples (Bach et al., 19 Apr 2026).

4. Evaluation regimes and empirical regularities

Continual alignment is evaluated with strongly domain-specific metrics rather than a single standardized protocol. Multilingual instruction tuning uses weighted F1 on $L1$ , $L2$ , and unseen related $L3$ languages, with replay-size ablations over $r \in \{0,1000,10000,100000\}$ (Cahyawijaya et al., 2023). Online SSL uses Final Accuracy and Average Accuracy under matched Cumulative Backward Passes (Cignoni et al., 14 Jul 2025). Medical segmentation reports Dice, IoU, HD95, AVG, and BWT (Ye et al., 4 Jul 2025). Continual discovery evaluates All, Old, New, forgetting rate $\mathcal{M}_f$ , and discovery rate $\mathcal{M}_d$ (Han et al., 23 Feb 2026). Continual retrieval uses Recall@1/5/10 and BWF (Wang et al., 28 Jan 2026). Safety-preservation work uses VISAGE, ASR, TruthfulQA, BWT, FM, and Max Drop (Bach et al., 19 Apr 2026). This diversity is itself informative: the field measures retention of an aligned relation, but the relation is task-specific.

Despite that heterogeneity, several empirical regularities recur. InstructAlign shows that replay is not merely preservative but enabling: when $r=0$ , performance on $L1$ drops significantly and newly added $L2$ languages also often drop, whereas increasing replay improves both retention and adaptation; improvement on $L1$ 0 strongly correlates with improvement on unseen related $L1$ 1, with Pearson correlation $L1$ 2 (Cahyawijaya et al., 2023). CLA reports that explicit latent alignment can improve not only end-of-stream representation quality but also early-stage convergence, and even that continuing i.i.d. pretraining from a CLA-based initialization can outperform full i.i.d. training from scratch under the same total budget (Cignoni et al., 14 Jul 2025). DAKR-HSIC shows the same complementarity pattern in dense prediction: on optic cup segmentation, REKD alone yields Dice AVG/BWT $L1$ 3, REKD+CNA yields $L1$ 4, and REKD+CRA+CNA yields $L1$ 5, indicating that cross-network and cross-representation alignment contribute additively (Ye et al., 4 Jul 2025).

Geometric continual alignment also produces consistent retention gains. GOAL lowers average forgetting rate from $L1$ 6 to $L1$ 7 and raises average discovery rate from $L1$ 8 to $L1$ 9, with especially large gains in the 10-stage setting where a fixed ETF scaffold accumulates less drift than dynamic classifier schemes (Han et al., 23 Feb 2026). StructAlign reports the lowest Backward Forgetting across all evaluated settings while improving or matching state-of-the-art retrieval quality (Wang et al., 28 Jan 2026). In continual image captioning, CLICITA’s strongest gains appear on semantic metrics: on the ContCap benchmark, METEOR rises to $L2$ 0 from $L2$ 1 and $L2$ 2 for the best ContCap variants, while forgetting on the RATT split improves from $L2$ 3 for the pretrained base model to $L2$ 4 (Taetz et al., 7 Oct 2025). STELLA reports a $L2$ 5p relative performance gain in zero-shot retrieval tasks compared to strong continual learning baselines while reducing memory consumption by about $L2$ 6 (Lee et al., 2023).

5. Post-training, safety, and foundation-model interfaces

In language-model post-training, continual alignment has become a way of describing sequential adaptation after instruction tuning or safety tuning. InstructAlign is explicit that, in its setting, alignment does not mean preference alignment or RLHF-style human-value alignment; it means crosslingual alignment of linguistic representations and instruction-following behavior, instantiated through bilingual denoising, machine translation, and crosslingual semantic similarity objectives interleaved with replay from old instruction data (Cahyawijaya et al., 2023). ECA makes a related architectural claim for pre-trained VLMs: in BLIP-2, the most important object to preserve under sequential adaptation is the cross-modal alignment mechanism, operationalized as the Q-Former, while the visual encoder and LLM remain frozen (Kong et al., 10 Jun 2026).

Safety alignment as continual learning is made fully explicit by OGPSA. There, the alignment tax is modeled as catastrophic forgetting under sequential SFT, DPO, or SFT $L2$ 7DPO, and the proposed solution is to estimate a capability subspace from reference gradients and project safety gradients orthogonally to it before updating (Sun et al., 8 Feb 2026). On Qwen2.5-7B-Instruct under SFT $L2$ 8DPO, this recovers SimpleQA from $L2$ 9 to $L3$ 0 and IFEval from $L3$ 1 to $L3$ 2 while preserving strong safety (Sun et al., 8 Feb 2026). The sample-selection line reaches a similar goal without architectural changes: by filtering high-gradient samples and keeping moderate-gradient ones, it reduces checkpoint-averaged ASR from $L3$ 3 to $L3$ 4 on Qwen2.5 and from $L3$ 5 to $L3$ 6 on LLaMA-3.1 while retaining competitive task performance (Bach et al., 19 Apr 2026).

“Alignment Dynamics in LLM Fine-Tuning” provides the most explicit dynamical theory in this group. It defines a sequence-level alignment score $L3$ 7, derives its first-order update under SFT, and decomposes the change into a Rebound Force, governed by current alignment state and posterior narrowness, and a Driving Force, governed by how the training distribution aligns with outcome-conditioned posteriors over aligned and non-aligned completions (Huang et al., 18 May 2026). The same framework predicts a Rehearsal Priming Effect: prior alignment leaves a latent posterior imprint that makes re-exposure unusually effective, a result validated in safety, emergent misalignment, and sentiment settings (Huang et al., 18 May 2026).

Foundation-model adaptation beyond language exhibits closely related structures. CA-SAM treats continual alignment as alignment of the latent interface between a frozen SAM encoder and frozen decoder; task-specific Alignment Layers are selected by a VAE router, and truly OOD samples are sent through the identity alignment so that frozen SAM handles them directly (Wang et al., 21 Nov 2025). This suggests that continual alignment in foundation models often migrates from full-parameter retention to interface preservation: the bridge, adapter, or router becomes the continual object.

6. Limitations, controversies, and open directions

Several limitations recur across the literature. Many methods depend on some form of replay, generated replay, or stored auxiliary structure. InstructAlign assumes access to small amounts of parallel data and to old supervised instruction-tuning data for replay (Cahyawijaya et al., 2023). ContEA relies on previously predicted trustworthy alignments and on seen neighbors for inductive reconstruction of new entities (Wang et al., 2022). ECA is exemplar-free in raw data but still stores an embedding dictionary, and its authors note that a fixed-size dictionary may become insufficient on very long task sequences (Kong et al., 10 Jun 2026). CA-SAM avoids exemplars but grows linearly with the number of tasks because each task adds one Alignment Layer and one VAE router (Wang et al., 21 Nov 2025).

Scalability and task structure remain unresolved. DAKR-HSIC’s Feature Pairing block is practical because the batch size is $L3$ 8, so exhaustive search over $L3$ 9 permutations is feasible; the same design is not obviously scalable to larger batches or 3D medical imaging (Ye et al., 4 Jul 2025). GOAL assumes the number of novel classes per stage is given or estimated, and its ETF capacity is fixed in advance, which leaves adaptive ETF expansion as future work (Han et al., 23 Feb 2026). Several settings are only weakly sequential in the long-horizon sense: InstructAlign introduces seven $r \in \{0,1000,10000,100000\}$ 0 languages together in one adaptation phase rather than one-by-one, and many benchmarks still rely on known stage orderings or bounded numbers of tasks (Cahyawijaya et al., 2023).

Theory is also uneven. DualHSIC states explicitly that a stronger theoretical link between HSIC optimization and catastrophic forgetting is still missing (Wang et al., 2023). Alignment Dynamics provides a much sharper account, but only under the Relatively Stable Kernel assumption and mainly for SFT rather than DPO or RLHF (Huang et al., 18 May 2026). The sample-selection method currently computes exact per-sample gradients, which the paper reports as about $r \in \{0,1000,10000,100000\}$ 1 baseline cost and $r \in \{0,1000,10000,100000\}$ 2 additional training overhead, leaving cheaper approximations as future work (Bach et al., 19 Apr 2026).

A broader interpretive issue is that “continual alignment” is not yet a unified subfield with a single benchmark, metric, or ontology. Some papers use alignment to denote task-invariant representation sharing, some use it for multimodal geometry, some for crosslingual instruction-following, and some for safety preservation. This suggests a productive but still unsettled research area. A plausible implication is that future work will increasingly focus on interfaces between these usages: whether safety alignment can profit from geometric or replay-conditioned latent alignment, whether multimodal continual alignment should be treated as posterior-structure preservation rather than only embedding matching, and whether foundation-model adaptation should preserve not merely old accuracy but the specific relational structure that makes new knowledge attachable to existing behavior.