Papers
Topics
Authors
Recent
Search
2000 character limit reached

Language Steerability in LLMs

Updated 4 July 2026
  • Language-steerability is the capacity of models to modify outputs through controlled interventions, such as prompt changes or hidden state adjustments, to target specific linguistic or stylistic features.
  • Methods include linear activation steering, sparse feature modifications, and language-vector techniques, all of which provide measurable improvements in multilingual and style-specific generation.
  • Empirical findings reveal that steerability emerges during intermediate pretraining stages and depends on careful intervention scaling and robust evaluation protocols to minimize side effects.

Language-steerability is the capacity of a LLM, or of a language-conditioned predictive system, to alter its outputs toward a specified target by changing prompts, profiles, hidden representations, sparse features, or token distributions at inference time. In the current literature, the term covers at least three related phenomena: controllable generation of a target language in multilingual models, controllable expression of semantic or stylistic concepts such as emotion or figurative language, and controllable adaptation to user, community, or persona-specific preferences. A central finding is that steerability is not reducible to mere concept encoding: a model may “know” a concept long before it becomes reliably steerable through simple interventions (She et al., 3 Aug 2025). In multilingual settings, this has motivated a family of “language vector” methods that treat languages as directions in an internal semantic space and modify activations without parameter updates (Kirtane et al., 2 Feb 2026).

1. Formal definitions and representational viewpoint

A standard formalization of linear steerability treats a hidden state hRmh_\ell \in \mathbb{R}^m at layer \ell as the object of intervention and applies a concept direction vv_\ell with steering strength α\alpha:

h=h+αv.h'_\ell = h_\ell + \alpha \cdot v_\ell.

Within the “Intervention Detector” framework, positive and negative stimuli for a concept are used to collect last-token hidden representations, form normalized difference matrices, extract the top principal component by PCA, and score alignment by

Il,i=R(M,si)[1],v.I_{l,i} = \langle R(M,s_i)[-1], v_\ell \rangle.

The resulting checkpoint-by-layer matrix is used to analyze where and when linear steerability emerges during pretraining (She et al., 3 Aug 2025).

In multilingual steering, a language direction is often defined as an activation-difference vector between semantically matched source- and target-language prompts. One formulation computes a layer-tt steering vector

v(t)=ExDcompute[h(t)(xt)]ExDcompute[h(t)(xs)],v^{(t)} = \mathbb{E}_{x \sim D_{\mathrm{compute}}}[h^{(t)}(x^t)] - \mathbb{E}_{x \sim D_{\mathrm{compute}}}[h^{(t)}(x^s)],

then injects it during inference by replacing token-position activations with hp(t)hp(t)+αv(t)h_p^{(t)} \leftarrow h_p^{(t)} + \alpha \cdot v^{(t)}. A related multilingual formulation, ReCoVeR, isolates language-specific vectors r(i)=v(i)c(i)r_\ell^{(i)} = v_\ell^{(i)} - c^{(i)} from a multi-parallel corpus and either adds the normalized target vector or adds the target vector while subtracting the normalized source vector in cross-lingual settings (Kirtane et al., 2 Feb 2026).

A more localized version of the same idea appears in sparse feature steering. There, a pretrained sparse autoencoder maps a residual-stream activation \ell0 to a sparse code \ell1, and a single feature index \ell2 is modified:

\ell3

This replaces diffuse residual steering with a monosemantic or near-monosemantic feature intervention (Chou et al., 17 Jul 2025).

Steerability is also formalized outside hidden-state editing. In natural-language recommenders, a steering intervention is a function \ell4 on a natural-language user profile, and success is measured by a tag-specific ranking shift \ell5. In multilingual system prompting, cross-lingual prompt steerability is represented by the four-dimensional metric vector \ell6, with an aggregated \ell7 built from min–max normalized components (Zhou et al., 28 Jan 2026).

2. Emergence during training and internal geometry

A key empirical result is that linear steerability emerges during intermediate stages of pretraining rather than appearing uniformly from the start. In CrystalCoder (7B) checkpoints saved every \ell8 steps, “anger” steerability remains near zero until \ell9 of training and then rises rapidly to vv_\ell0 in higher layers. “Fear” emerges slightly earlier at about vv_\ell1, “happiness” around vv_\ell2–vv_\ell3, while “sadness,” “surprise,” and “disgust” become steerable only near the very end, at vv_\ell4 of training. The same study reports that the first PCA component of the concept-difference matrix explains only vv_\ell5 of variance at vv_\ell6–vv_\ell7 of training but exceeds vv_\ell8 by vv_\ell9–α\alpha0, and that cosine similarity between adjacent checkpoint concept vectors drops sharply at the moment steerability appears. The authors interpret this as increasing linear separability and signal-to-noise ratio in the hidden space (She et al., 3 Aug 2025).

The same work treats linear steerability as a distinct emergent capability, separate from concept encoding or raw generation ability. Heatmaps of checkpoint-by-layer ID scores show that early training is characterized by α\alpha1 across layers, whereas after emergence the top α\alpha2 layers form a bright band of strong alignment. Entropy over normalized layer scores is high early, drops as a few layers concentrate the concept, and then rebounds slightly when many layers become aligned. A plausible implication is that pretraining induces a reorganization from diffuse representation to layer-localized control, after which simple additive interventions become effective (She et al., 3 Aug 2025).

Later multilingual work reports a related geometric picture. CLaS-Bench finds that language-specific structure emerges predominantly in later layers and that steering directions cluster by language family. “Cross-Lingual Steering for Figurative Language Generation” similarly reports a reusable but target-dependent cross-lingual signal: directions learned from figurative–literal activation differences transfer across six languages, and removing the shared component weakens native steering (Gurgurov et al., 13 Jan 2026).

3. Intervention families and their empirical performance

Several intervention families now coexist, differing mainly in the representation they edit and in how the steering direction is extracted.

Family Representation edited Core update
Linear activation steering Residual or hidden state α\alpha3
Sparse feature steering SAE code α\alpha4
Language-vector steering Layerwise mean-pooled activations α\alpha5
ReCoVeR Hidden states with centered language vectors Add target vector, subtract source vector in Cross-LC
DLM-SWAI Token logits in diffusion denoising α\alpha6
Neural FOXP2 Sparse language-neuron support Signed sparse shift in SAE feature space

Sparse feature steering shows that a single SAE feature can be sufficient for deterministic language control. On Gemma-2-9B, steering one feature yields FastText target-language accuracies of α\alpha7 for Chinese, α\alpha8 for Japanese, α\alpha9 for Spanish, and h=h+αv.h'_\ell = h_\ell + \alpha \cdot v_\ell.0 for French, while preserving semantic fidelity measured by LaBSE similarity. The strongest interventions occur in mid-to-late layers, such as layers h=h+αv.h'_\ell = h_\ell + \alpha \cdot v_\ell.1–h=h+αv.h'_\ell = h_\ell + \alpha \cdot v_\ell.2 in Gemma-2-9B, and specific attention heads are disproportionately aligned with language-sensitive features; for example, Head h=h+αv.h'_\ell = h_\ell + \alpha \cdot v_\ell.3 in layer h=h+αv.h'_\ell = h_\ell + \alpha \cdot v_\ell.4 dominates for both Chinese and French (Chou et al., 17 Jul 2025).

Training-free language vectors have been applied to multilingual in-context learning. On Llama-3.1-8B-Instruct, language steering improves MGSM from h=h+αv.h'_\ell = h_\ell + \alpha \cdot v_\ell.5 to h=h+αv.h'_\ell = h_\ell + \alpha \cdot v_\ell.6, XNLI from h=h+αv.h'_\ell = h_\ell + \alpha \cdot v_\ell.7 to h=h+αv.h'_\ell = h_\ell + \alpha \cdot v_\ell.8, and MSVAMP from h=h+αv.h'_\ell = h_\ell + \alpha \cdot v_\ell.9 to Il,i=R(M,si)[1],v.I_{l,i} = \langle R(M,s_i)[-1], v_\ell \rangle.0. On Qwen-2.5-14B, MGSM rises from about Il,i=R(M,si)[1],v.I_{l,i} = \langle R(M,s_i)[-1], v_\ell \rangle.1 to about Il,i=R(M,si)[1],v.I_{l,i} = \langle R(M,s_i)[-1], v_\ell \rangle.2. Hierarchical clustering of steering vectors yields Romance, Slavic, Indo-Aryan, and East Asian groupings, and five of six cross-task transfers among MGSM, MSVAMP, and XNLI improve over baseline, with one failure case in the MSVAMPIl,i=R(M,si)[1],v.I_{l,i} = \langle R(M,s_i)[-1], v_\ell \rangle.3XNLI direction (Kirtane et al., 2 Feb 2026).

ReCoVeR addresses language confusion rather than few-shot transfer. Its fixed version adds normalized target-language vectors or target-minus-source vectors; its supervised version, ReCoVeR+, learns a small low-rank residual block while freezing the LLM. On cross-lingual language control in LCB, ReCoVeR+ raises LPR from Il,i=R(M,si)[1],v.I_{l,i} = \langle R(M,s_i)[-1], v_\ell \rangle.4 to Il,i=R(M,si)[1],v.I_{l,i} = \langle R(M,s_i)[-1], v_\ell \rangle.5 on Llama 3.1, from Il,i=R(M,si)[1],v.I_{l,i} = \langle R(M,s_i)[-1], v_\ell \rangle.6 to Il,i=R(M,si)[1],v.I_{l,i} = \langle R(M,s_i)[-1], v_\ell \rangle.7 on Qwen 2.5, and from Il,i=R(M,si)[1],v.I_{l,i} = \langle R(M,s_i)[-1], v_\ell \rangle.8 to Il,i=R(M,si)[1],v.I_{l,i} = \langle R(M,s_i)[-1], v_\ell \rangle.9 on Gemma 2. On MMLU, steering with ReCoVeR never drops accuracy by more than tt0 percentage points, whereas LSI drops up to about tt1 points (Sterz et al., 18 Sep 2025).

Benchmark-scale comparisons are less favorable to many sophisticated steering directions than to simple residual means. In CLaS-Bench, the average harmonic-mean steering score tt2 on Llama-3.1-8B-Instruct is tt3 for DiffMean, compared with tt4 for LAPE, tt5 for probe-derived directions, tt6 for PCA steering, tt7 for LDA steering, and tt8 for SAE-DiffMean. The two prompting baselines score tt9 and v(t)=ExDcompute[h(t)(xt)]ExDcompute[h(t)(xs)],v^{(t)} = \mathbb{E}_{x \sim D_{\mathrm{compute}}}[h^{(t)}(x^t)] - \mathbb{E}_{x \sim D_{\mathrm{compute}}}[h^{(t)}(x^s)],0. This suggests that unsupervised difference-of-means directions can be more robust than probe-derived or low-dimensional reconstruction-based directions for multilingual language forcing (Gurgurov et al., 13 Jan 2026).

The scope of steerability methods has also broadened beyond autoregressive transformers. DLM-SWAI biases token distributions in diffusion LLMs at every denoising step using precomputed token-level style scores, with no auxiliary model and no hidden-state hooks. On OSE readability control, DLM-SWAI reaches v(t)=ExDcompute[h(t)(xt)]ExDcompute[h(t)(xs)],v^{(t)} = \mathbb{E}_{x \sim D_{\mathrm{compute}}}[h^{(t)}(x^t)] - \mathbb{E}_{x \sim D_{\mathrm{compute}}}[h^{(t)}(x^s)],1 accuracy and macro-v(t)=ExDcompute[h(t)(xt)]ExDcompute[h(t)(xs)],v^{(t)} = \mathbb{E}_{x \sim D_{\mathrm{compute}}}[h^{(t)}(x^t)] - \mathbb{E}_{x \sim D_{\mathrm{compute}}}[h^{(t)}(x^s)],2 on LLaDA-8B and v(t)=ExDcompute[h(t)(xt)]ExDcompute[h(t)(xs)],v^{(t)} = \mathbb{E}_{x \sim D_{\mathrm{compute}}}[h^{(t)}(x^t)] - \mathbb{E}_{x \sim D_{\mathrm{compute}}}[h^{(t)}(x^s)],3 and v(t)=ExDcompute[h(t)(xt)]ExDcompute[h(t)(xs)],v^{(t)} = \mathbb{E}_{x \sim D_{\mathrm{compute}}}[h^{(t)}(x^t)] - \mathbb{E}_{x \sim D_{\mathrm{compute}}}[h^{(t)}(x^s)],4 on Dream-7B, while on RealTox it reaches v(t)=ExDcompute[h(t)(xt)]ExDcompute[h(t)(xs)],v^{(t)} = \mathbb{E}_{x \sim D_{\mathrm{compute}}}[h^{(t)}(x^t)] - \mathbb{E}_{x \sim D_{\mathrm{compute}}}[h^{(t)}(x^s)],5 non-toxic accuracy. Neural FOXP2, by contrast, identifies a sparse, low-rank “language-neuron” circuit and applies signed sparse activation shifts in low-to-mid layers; on LLaMA-3 8B it reports v(t)=ExDcompute[h(t)(xt)]ExDcompute[h(t)(xs)],v^{(t)} = \mathbb{E}_{x \sim D_{\mathrm{compute}}}[h^{(t)}(x^t)] - \mathbb{E}_{x \sim D_{\mathrm{compute}}}[h^{(t)}(x^s)],6, v(t)=ExDcompute[h(t)(xt)]ExDcompute[h(t)(xs)],v^{(t)} = \mathbb{E}_{x \sim D_{\mathrm{compute}}}[h^{(t)}(x^t)] - \mathbb{E}_{x \sim D_{\mathrm{compute}}}[h^{(t)}(x^s)],7, v(t)=ExDcompute[h(t)(xt)]ExDcompute[h(t)(xs)],v^{(t)} = \mathbb{E}_{x \sim D_{\mathrm{compute}}}[h^{(t)}(x^t)] - \mathbb{E}_{x \sim D_{\mathrm{compute}}}[h^{(t)}(x^s)],8, Spanish leakage of v(t)=ExDcompute[h(t)(xt)]ExDcompute[h(t)(xs)],v^{(t)} = \mathbb{E}_{x \sim D_{\mathrm{compute}}}[h^{(t)}(x^t)] - \mathbb{E}_{x \sim D_{\mathrm{compute}}}[h^{(t)}(x^s)],9, and hp(t)hp(t)+αv(t)h_p^{(t)} \leftarrow h_p^{(t)} + \alpha \cdot v^{(t)}0 (An et al., 28 May 2026).

4. Evaluation protocols and benchmark design

The diversity of steering methods has been matched by a rapid diversification of evaluation protocols. CLaS-Bench defines language forcing success hp(t)hp(t)+αv(t)h_p^{(t)} \leftarrow h_p^{(t)} + \alpha \cdot v^{(t)}1 as the fraction of outputs in the target language according to FastText LID, semantic relevance hp(t)hp(t)+αv(t)h_p^{(t)} \leftarrow h_p^{(t)} + \alpha \cdot v^{(t)}2 as a normalized hp(t)hp(t)+αv(t)h_p^{(t)} \leftarrow h_p^{(t)} + \alpha \cdot v^{(t)}3–hp(t)hp(t)+αv(t)h_p^{(t)} \leftarrow h_p^{(t)} + \alpha \cdot v^{(t)}4 multilingual judge score, and combines them with the harmonic mean

hp(t)hp(t)+αv(t)h_p^{(t)} \leftarrow h_p^{(t)} + \alpha \cdot v^{(t)}5

Its construction yields hp(t)hp(t)+αv(t)h_p^{(t)} \leftarrow h_p^{(t)} + \alpha \cdot v^{(t)}6 steering instances across hp(t)hp(t)+αv(t)h_p^{(t)} \leftarrow h_p^{(t)} + \alpha \cdot v^{(t)}7 languages and provides a standardized multilingual benchmark for prompt-based and representation-based interventions alike (Gurgurov et al., 13 Jan 2026).

SteerEval formalizes steerability for natural-language recommenders. Given a profile revision hp(t)hp(t)+αv(t)h_p^{(t)} \leftarrow h_p^{(t)} + \alpha \cdot v^{(t)}8 for a tag hp(t)hp(t)+αv(t)h_p^{(t)} \leftarrow h_p^{(t)} + \alpha \cdot v^{(t)}9, it computes a tag-specific ranking AUC and its change

r(i)=v(i)c(i)r_\ell^{(i)} = v_\ell^{(i)} - c^{(i)}0

The framework distinguishes increase and decrease interventions, measures changes in the position of the ground-truth next item within its relevant or irrelevant subset, and evaluates both broad tags such as movie genres and finer-grained tags such as trigger warnings. Genres are substantially easier to steer than triggers: r(i)=v(i)c(i)r_\ell^{(i)} = v_\ell^{(i)} - c^{(i)}1 and r(i)=v(i)c(i)r_\ell^{(i)} = v_\ell^{(i)} - c^{(i)}2 for genres, versus r(i)=v(i)c(i)r_\ell^{(i)} = v_\ell^{(i)} - c^{(i)}3 and r(i)=v(i)c(i)r_\ell^{(i)} = v_\ell^{(i)} - c^{(i)}4 for triggers. Oracle metadata sharply improves both, indicating that world-knowledge limitations are a major bottleneck (Zhou et al., 28 Jan 2026).

A different evaluation philosophy appears in “A Course Correction in Steerability Evaluation,” which models user goals and model outputs as vectors in a multi-dimensional goal space r(i)=v(i)c(i)r_\ell^{(i)} = v_\ell^{(i)} - c^{(i)}5. It defines overall steering error as the expected r(i)=v(i)c(i)r_\ell^{(i)} = v_\ell^{(i)} - c^{(i)}6 distance between the achieved goal vector r(i)=v(i)c(i)r_\ell^{(i)} = v_\ell^{(i)} - c^{(i)}7 and the target r(i)=v(i)c(i)r_\ell^{(i)} = v_\ell^{(i)} - c^{(i)}8, and decomposes failures into miscalibration, which measures overshoot or undershoot along the desired change direction, and orthogonality, which measures unintended drift in non-target dimensions. On a four-dimensional text-rewriting task with reading difficulty, formality, lexical diversity, and length, side effects remain persistent even when prompt engineering, best-of-r(i)=v(i)c(i)r_\ell^{(i)} = v_\ell^{(i)} - c^{(i)}9 sampling, or reinforcement learning is applied (Chang et al., 27 May 2025).

Prompt-only steering has also acquired its own multilingual evaluation framework. “Cross-Lingual Prompt Steerability” defines \ell00, \ell01, \ell02, and \ell03, then combines them into \ell04 with weights \ell05, \ell06, \ell07, and \ell08. Across Qwen2.5-7B-Instruct, LLaMA-3.1-8B-Instruct, and Gemma-3-12B-IT, optimized prompts improve mean accuracy by \ell09, \ell10, and \ell11, respectively, while also increasing cross-lingual consistency and reducing unnecessary language-switching; for Spanish, the share of reasoning units in the native language rises from \ell12 to \ell13 after optimization (Zhang et al., 2 Dec 2025).

Earlier work used psychometric or choice-based proxies. The OCEAN-based framework sums integer trait ratings to obtain a trait-specific steerability score \ell14 and visualizes overlap between prompted personalities; it found pronounced peaks for Conscientiousness and Neuroticism and overlap between Extraversion and Agreeableness. STEER-BENCH instead evaluates community-specific steering as multiple-choice accuracy after conditioning on community-aligned examples, using \ell15 contrasting subreddit pairs and \ell16 validated questions (Noever et al., 2023).

5. Theoretical accounts, diagnostics, and failure modes

The strongest theoretical treatment of steering magnitude appears in “Towards Understanding Steering Strength.” It studies the dependence of token probabilities, concept presence, and cross-entropy on the scalar steering strength \ell17. The paper derives a “Bump Law,” under which most token-probability shifts increase and then decrease once \ell18 exceeds a token-specific threshold; a “Sigmoidal Law,” under which concept-level probability shifts follow an S-shaped curve; a “Quadratic Law,” under which cross-entropy grows like \ell19 near \ell20 with no linear term; and a “Saturation Law,” under which the distribution collapses onto top log-odds tokens as \ell21. The analysis is validated on eleven decoder-only transformers and implies that steering has a non-monotonic sweet spot rather than a monotone gain regime (Taimeskhanov et al., 2 Feb 2026).

A different explanation of steering instability is offered by the Cylindrical Representation Hypothesis. CRH retains linear concept directions but drops the assumption that concept directions can be made orthogonal without loss. It posits a sample-specific cylindrical geometry: a central axis \ell22 captures the main concept difference, a normal plane \ell23 controls steering sensitivity, and only certain angular sectors in that plane strongly facilitate concept activation. The paper argues that the magnitude of the normal-plane component is predictable, whereas the sensitive sector is not. Its empirical verification reports effectively zero correlation between difference-vector cosine similarity and sample-wise steering-strength difference, with Pearson correlation \ell24 and \ell25, and interprets this as intrinsic uncertainty at the sector level (Gao et al., 3 May 2026).

Work on geometric diagnostics complements these theories. “The Geometric Canary” distinguishes supervised and unsupervised geometric stability. Supervised Shesha variants predict linear steerability with Spearman \ell26 on \ell27 synthetic models, \ell28 on SST-2, and \ell29 on MNLI, while retaining substantial partial correlations after controlling for separability measures. By contrast, unsupervised stability fails for steering on real tasks, with \ell30 on SST-2, yet excels at drift detection, measuring on average \ell31 greater geometric change than CKA and as much as \ell32 in the Llama family (Raju, 20 Apr 2026).

Large-scale empirical audits show that many steering methods remain brittle. “Steering off Course” evaluates DoLa, function vectors, and task vectors on up to \ell33 models from \ell34 families. It finds only modest or negative gains for DoLa on TruthfulQA and FACTOR, and large variability for activation patching: under default settings, function vectors recover at least \ell35 of the five-shot baseline in only \ell36 of model–task combinations, while task vectors do so in \ell37; even with extensive search, recoveries remain inconsistent across families and tasks. The paper attributes these failures to flawed assumptions about where knowledge is localized and how it is promoted across layers (Silva et al., 6 Apr 2025).

Prompt-based steering can fail even more directly in high-stakes settings. In the college-admissions essay study, LLM-generated essays are readily distinguishable from human essays, with F1 approximately \ell38 using T5 embeddings and \ell39 using TF-IDF for the LLM-versus-human comparison. Demographic prompting is “remarkably ineffective”: the prompted and unprompted synthetic essays are more similar to each other than to human text, and prompting causes lexical insertions such as “Asian,” “parent,” and “California” without changing deeper stylistic traits. This exposes a persistent gap between surface instruction following and authentic steerability (Lee et al., 25 Mar 2025).

6. Applications, cross-lingual transfer, and open problems

Language-steerability has become a practical tool for multilingual control. In figurative language generation, a direction estimated from figurative–literal activation differences in one language can be applied in another. Across five figurative categories, six languages, and four multilingual LLMs, \ell40 of \ell41 non-monolingual routes yield positive target-category gains, with metaphor and simile transferring most robustly. German is reported as the most receptive target language, Bengali as the weakest, and leave-target-out mean vectors “win” or “tie” native steering in \ell42–\ell43 of settings. The authors present this as direct evidence of a reusable, language-agnostic but target-dependent cross-lingual signal (Liu et al., 28 May 2026).

The same control logic extends beyond language identity. Data-driven personas derived by collaborative filtering improve macro prediction accuracy by \ell44–\ell45 over the best prompting baselines on OpinionQA, depending on model, when converted into soft prompts by a learned prefix model. STEER-BENCH shows that community-sensitive steerability is measurable at scale but still far from human performance: human experts reach \ell46 accuracy with silver labels, the best models reach about \ell47–\ell48, and the weakest model reaches about \ell49. In recommendation, SteerEval finds that LLM rewriting of user profiles yields the strongest steering among tested interventions, while the relative position of the true next item changes by at most about \ell50 on average, indicating limited loss of baseline preference information (Li et al., 2023).

The literature therefore converges on a mixed assessment. Steerability is real, often strong, and increasingly interpretable. It can emerge during pretraining, localize in later layers, transfer across tasks and languages, and sometimes be driven by a single sparse feature or a compact low-rank subspace. At the same time, robustness is conditional: steering strength is non-monotonic, geometry can be sample-specific, unsupervised diagnostics may fail to predict controllability, prompt-based identity steering can remain superficial, and widely used intervention recipes can degrade or fail across model families. The present state of the field suggests that reliable language-steerability depends on three ingredients in combination: a representation in which the target is linearly accessible, an intervention scale that remains inside the model’s quality-preserving regime, and an evaluation protocol that measures not only target attainment but also semantic preservation, side effects, and cross-context generalization (She et al., 3 Aug 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Language-Steerability.