Papers
Topics
Authors
Recent
2000 character limit reached

Zero-Shot Personality Injection

Updated 15 December 2025
  • The paper demonstrates that personality traits can be injected into LLM outputs using parameter-free conditioning, preserving modular reasoning without fine-tuning.
  • It employs prompt engineering and latent vector injection techniques to steer style and psychometric profiles, achieving high accuracy in metrics like OCEAN prediction.
  • Zero-shot personality injection offers practical applications in personalized dialogue, digital humans, and survey simulation while addressing challenges such as instruction drift and fairness trade-offs.

Zero-shot personality injection is the parameter-free conditioning of a LLM’s (LLM's) output to reflect a specified personality profile at inference time, without model weight updates or additional fine-tuning. Techniques range from prompt engineering—and carefully mapping numeric trait vectors to textual “persona seeds”—to deterministic interventions in a model’s latent space, achieving stylistic, psychometric, or behavioral steering for applications in dialogue, recommendation, survey simulation, and embodied agents. This approach contrasts with supervised fine-tuning, offering modularity and preserving general reasoning by disentangling personality from cognitive processes.

1. Theoretical Foundations: Disentanglement and Linear Representation

The Soul Engine framework formalizes the Linear Representation Hypothesis: in a pre-trained Transformer, personality traits (such as the Big Five OCEAN factors) occupy mutually orthogonal subspaces within the base model’s high-dimensional hidden space Rd\mathbb{R}^d. Letting eRde\in\mathbb{R}^d denote a final hidden embedding for a text chunk, each trait ii corresponds to a subspace ViV_i (spanned by viRd\vec v_i\in\mathbb{R}^d):

  • Rd=(i=15Vi)Vreasoning\mathbb{R}^d = (\bigoplus_{i=1}^5 V_i)\oplus V_{\mathrm{reasoning}}
  • vivj=0,  ij\vec v_i^\top \vec v_j = 0,\;\forall i\ne j
  • Ppsy(e)=Wpsye,  WpsyWpsy=I5×5P_{psy}(e)=W_{psy}e,\;W_{psy}\,W_{psy}^\top=I_{5\times 5}

This formalism ensures geometric separability: reasoning and personality comprise orthogonal complements. Extension to new personality dimensions (e.g., “Dark Triad”) is supported by orthogonalizing new basis vectors via a Frobenius-norm penalty WpsyWpsyIF2||W_{psy}W_{psy}^\top-I||_F^2 (Wang, 8 Dec 2025).

2. Prompt-Based and Latent Steering Methodologies

Two principal classes of zero-shot personality injection are prevalent:

  • Prompt Engineering: Personality embeddings pRdp\in\mathbb{R}^d are mapped via ϕ(p)\phi(p) to textual persona descriptions, which are appended to the context window. For example, with OCEAN traits, ϕ\phi yields “You are an outgoing, social movie lover,” injected into recommendation or dialogue prompts (Brito et al., 22 Feb 2025, Sah, 20 Aug 2025, Ramirez et al., 2023). SoulBench and MBTI-inspired prompts similarly encode trait composition or cognitive function hierarchies (Liu et al., 25 Aug 2025).
  • Latent Vector Injection: Deterministic steering vectors are computed as vsteer=μtargetμneutral\vec v_{\mathrm{steer}}=\vec\mu_{\mathrm{target}}-\vec\mu_{\mathrm{neutral}} (mean embeddings for target and neutral personae), then applied at a selected layer \ell^*, h=h+αvsteer/vsteer2h' = h + \alpha \vec v_{\mathrm{steer}}/\|\vec v_{\mathrm{steer}}\|_2, where α\alpha is a learned coefficient. This approach ensures modular, reversible control over style, validated empirically by preservation of fluency and distinct personality manifolds in latent space visualizations (Wang, 8 Dec 2025).

In both paradigms, the main objective is to align generated content with specified trait vectors without altering model parameters.

3. Architectural and Pipeline Implementations

Zero-shot personality injection frameworks vary according to downstream requirements:

Framework Conditioning Mode Main Components
Soul Engine Latent Vector Injection Frozen backbone, dual probe heads, WpsyW_{psy}
PerFairX Prompt-based Prompt mapping for OCEAN, fairness audit
MARK Multi-stage prompt chain Demographic → MBTI inference → weighted vote
Digital Humans Prompt + multimodal sync Text, facial, gesture cues; cross-modal probe

Soul Engine employs a frozen Qwen-2.5 backbone, fine-tuning only upper layers and probe heads. The MARK system executes a three-stage pipeline: stress scoring for demographic features, MBTI-based function inference, and weighted cognitive imitation, culminating in reasoning traces conditioned on inferred type (Liu et al., 25 Aug 2025). In virtual human and dialogue settings, personality templates are cycled in prompts and affect both verbal and nonverbal modalities (Brito et al., 22 Feb 2025), while TST-style prompts (“rewrite in style PP”) exhibit superior controllability across dialogue and data-to-text scenarios (Ramirez et al., 2023).

4. Evaluation Metrics and Empirical Performance

Quantitative benchmarks for zero-shot personality injection span multiple domains:

  • Psychometric Profiling: Soul Engine attains MSE0.0113\mathrm{MSE}\approx0.0113 in predicting OCEAN scores (≈99% accuracy), with T-SNE confirming the separation of trait manifolds (Wang, 8 Dec 2025).
  • Dialogue & NLG: TST-style prompts yield semantic accuracy 78.46%78.46\% and personality accuracy up to 100%100\%, outperforming direct data-to-text approaches by ≈12 points in semantic score (Ramirez et al., 2023).
  • Recommendation Systems: PerFairX introduces Personality Alignment Score (PAS, cosine similarity in trait-genre space), Genre-Personality Alignment (GPA), demographic fairness measures (DP, EO), and intra-list diversity (ILF@K), observing that DeepSeek achieves PAS of $0.848$ (MovieLens, sensitive prompt) but at the cost of increased demographic disparity (DP up to $0.726$) (Sah, 20 Aug 2025).
  • Survey Simulation: MARK boosts accuracy by +8–15pp over prior baselines (e.g., GLM-4-air sampled ACC: 33.69%33.69\% vs. $25.49$–26.98%26.98\% in prior methods), while also delivering best 1–JSD and distributional divergence scores (Liu et al., 25 Aug 2025).

In digital human applications, the Personality Coherence Score (PC, mean cosine between intended/realized traits) guides prompt calibration and re-injection schedules (Brito et al., 22 Feb 2025).

5. Challenges, Limitations, and Best Practices

Empirical assessments identify several recurring challenges in zero-shot personality injection:

  • Instruction Drift: Persona seeds lose efficacy over extended contexts; periodic re-injection and window sliding are required (Brito et al., 22 Feb 2025).
  • Calibration and Overfitting: LLMs can underplay/exaggerate traits; template scaling and prompt metric optimization are recommended (Brito et al., 22 Feb 2025, Sah, 20 Aug 2025).
  • Modality Synchronization: Nonverbal output disjunction is mitigated by adapters aligning sentiment across verbal/facial/gesture streams (Brito et al., 22 Feb 2025).
  • Personalization–Fairness Tradeoff: Personality prompting can exacerbate demographic disparity; guidelines include isolating psychographic (not demographic) cues, limiting trait stacking, running neutral-prompt parallels, and multi-objective tuning of aggregate metrics like FPx (Sah, 20 Aug 2025).

Prompt design and ranking strategies, such as selecting maximally diverse few-shot templates (lowest BLEURT similarity) and single-trait targeting, enhance domain adaptation and semantic control (Ramirez et al., 2023). Empirical sweet spots for cross-layer latent intervention (Soul Engine: layers $14$–$16$, α=6.0\alpha=6.0–$8.0$) are critical; early or late-layer modifications degrade either coherence or style adherence (Wang, 8 Dec 2025).

6. Applications, Extensions, and Transferability

Zero-shot personality injection is central in:

  • Personalized dialogue systems: enabling controllable, interpretable persona stylization in customer support, entertainment, and gaming (Ramirez et al., 2023).
  • Virtual embodiment and VR agents: multi-modal synchronization for text, facial, and gesture outputs; enabling real-time and low-latency immersive interactions (Brito et al., 22 Feb 2025).
  • Recommender systems: psychographically aligned content suggestion with explicit fairness-accuracy trade-offs (Sah, 20 Aug 2025).
  • Survey simulation and social modeling: simulating population-level value distributions using demographically and cognitively grounded personality profiles (Liu et al., 25 Aug 2025).

For scaling to new model families or trait spaces, layer-wise linear probing identifies optimal intervention strata, and SoulBench-style dynamic sampling ensures cross-domain stylistic invariance (Wang, 8 Dec 2025). As model scale grows, personality and reasoning subspaces become increasingly orthogonal, enhancing transferability of these geometric methods (Wang, 8 Dec 2025).


Zero-shot personality injection, implemented via prompt engineering, latent space intervention, or staged cognitive prompting, enables safe, modular, and high-fidelity personality conditioning of LLMs without sacrificing core reasoning or necessitating costly fine-tuning. These methods are supported by rigorous metrics, established evaluation protocols, and empirically validated architectural schemata, ensuring technical robustness and practical versatility across domains including dialogue systems, human-computer interaction, personalized recommendation, and social simulation (Wang, 8 Dec 2025, Brito et al., 22 Feb 2025, Sah, 20 Aug 2025, Liu et al., 25 Aug 2025, Ramirez et al., 2023).

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Zero-Shot Personality Injection.