Soul Engine: LLM Personality Control

Updated 15 December 2025

Soul Engine is a framework that models personality in LLMs as orthogonal latent subspaces, enabling zero-shot persona injection without compromising reasoning.
Its dual-headed architecture and stratified freezing isolate psychometric traits from linguistic functions, ensuring stable and deterministic control.
Extensive ablation studies and the SoulBench corpus validate its high precision (MSE≈0.011) and robust geometric disentanglement for safe AI personalization.

The Soul Engine is a framework designed for the precise, safe, and disentangled representation and control of personality in LLMs. Grounded in the Linear Representation Hypothesis, the Soul Engine models high-level psychometric traits as orthogonal linear subspaces within the latent space of transformer models. Its architecture employs a stratified freezing protocol and dual-headed readout, enabling zero-shot persona injection and deterministic behavioral steering without global weight updates or loss of reasoning capability. SoulBench, a related dynamic contextual sampling corpus, provides psychometrically consistent ground-truth for both training and benchmarking. Through extensive ablation and visualization studies, the Soul Engine achieves high-precision profiling (MSE ≈ 0.011) and clean geometric disentanglement, establishing a mathematically rigorous foundation for controllable AI personalization (Wang, 8 Dec 2025).

1. Motivation: Stability–Plasticity Dilemma in LLM Personalization

Traditional personalization protocols for LLMs—including Supervised Fine-Tuning (SFT) and parameter-efficient methods like LoRA—characterize persona as a narrow style distribution to be memorized via gradient updates. Such approaches often suffer from an "alignment tax": the phenomenon by which the tuning process induces catastrophic forgetting of general knowledge and significantly degrades performance on reasoning benchmarks (e.g., MMLU). In-Context Learning (ICL) and prompt-based personalization avoid weight updates but encounter substantial persona dilution and "catastrophic amnesia" over extended dialogues. The Soul Engine circumvents these limitations by eschewing global updates to reasoning circuits and instead projecting and modulating personality within orthogonal latent subspaces, affording stable persona activation/deactivation without compromising base intelligence.

2. Linear Representation Hypothesis and Geometric Disentanglement

The foundational hypothesis asserts that each high-level psychometric trait, notably the Big Five (OCEAN), is embedded in its own linear subspace of the model's latent space. Denote $e \in \mathbb{R}^d$ as the final hidden embedding in the transformer. The psychometric projection matrix $W_{psy} \in \mathbb{R}^{5 \times d}$ is constructed so that its rows span the five trait subspaces, and orthonormality constraints are enforced:

$W_{psy} W_{psy}^\top = I_5$

Trait prediction is performed via:

$\hat{y} = W_{psy} e$

Orthogonality regularization employs the Frobenius norm:

$L_{Orth} = \|W_{psy} W_{psy}^\top - I_5\|_F^2$

This geometric configuration ensures each trait's direction in latent space is disentangled, enhancing interpretability, control, and stability.

3. Architecture: Dual-Head Design and Stratified Freezing

Soul Engine operates on a frozen Qwen-2.5 backbone, dividing layers as follows:

Layers 0–19: frozen to preserve syntactic and reasoning structures ( $\theta_{frozen}$ ).
Layers 20–23 and final normalization heads: unfrozen for psychometric probing ( $\theta_{active}$ ).

The architecture deploys two heads:

Identity Head: $z_{id} = P_{id}(e) \in \mathbb{R}^{256}$ via a 2-layer MLP.
Psychometric Head: $\hat{y} = P_{psy}(e) = W_{psy} e \in \mathbb{R}^5$ (linear mapping).

For steering, a trait-score vector $y^*$ is mapped back to latent space:

$v_{psy} = W_{psy}^\top y^*$

This persona vector, orthogonal to reasoning circuits, can be injected into residual streams without modifying $\theta_{frozen}$ , supporting activation and deactivation of specific persona features.

4. SoulBench: Dynamic Contextual Sampling and Psychometric Ground Truth

SoulBench is a corpus generated via dynamic contextual sampling, maximizing stylistic invariance extraction from character data. For character $c$ with sentences $D_c = \{s_1, \ldots, s_M\}$ , each training step samples $k$ sentences ( $k=3$ is typical) and concatenates them:

$A_t = \text{Concat}(s_{i_1}, \ldots, s_{i_k}), \quad \{i_1 \ldots i_k\} \sim \text{Uniform}(1 \ldots M)$

The virtual dataset size $C(M, 3) \gg M$ forces reliance on stylistic rather than content features. OCEAN trait scores $y_{truth}$ are supplied by a Teacher Model (Doubao-Seed-1.6) prompted with full character profiles, yielding psychologically coherent ground truth.

5. Experimental Results: Precision, Visualization, and Steering

The Soul Engine demonstrates a mean squared error (MSE) of 0.0113 on profiling against psychological ground-truth vectors:

$L_{MSE} = \frac{1}{N} \sum_i \|\hat{y}_i - y_{truth, i}\|^2$

T-SNE visualizations of 1,000 character embeddings reveal continuous, cluster-separated gradients for each trait, supporting the hypothesis that $W_{psy}$ columns form disentangled, orthogonal directions.

Zero-shot personality injection is achieved by calculating steering vectors from statistical means ( $\mu_N$ for neutral, $\mu_T$ for target persona):

$v_{steer} = \mu_T - \mu_N$

At inference, hidden activations are updated at layer $\ell$ :

$h' = h + \alpha \cdot \frac{v_{steer}}{\|v_{steer}\|}$

with $\alpha$ controlling injection magnitude. Vector arithmetic enables reliable, deterministic persona modulation (e.g., Neutral → Villain: $h' = h + \alpha v_{Villain}$ ).

6. Ablation Studies: Freezing, Orthogonality, and Injection Efficacy

A series of ablation experiments establish the protocol’s robustness:

Freezing Depth: Freezing at least 20/24 layers preserves both profiling accuracy ( $\text{MSE} \approx 0.011$ ) and reasoning ability. Fine-tuning additional layers offers no profiling advantage and risks degradation of general intelligence.
Orthogonality Regularization: Omission of $L_{Orth}$ increases inter-trait latent correlation and worsens MSE by approximately 30%.
Injection Layer and Strength: Optimal persona adherence and semantic coherence are achieved by injecting in middle layers (14–16) with $\alpha \in [6.0, 8.0]$ $α \in [6.0, 8.0]$ .
- Villainy score increases by 70%
- Coherence retention at 95%

7. Mathematical Rationale and Safety Implications

Personality is conceptualized as a linear manifold orthogonal to reasoning circuits, thereby avoiding destructive updates inherent to traditional fine-tuning. Deterministic latent intervention via vector arithmetic affords granular control absent in stochastic prompting.

For safety, latent vector guardrails can intercept undesirable behaviors by identifying and subtracting harmful directions (e.g., the "Dark Triad") within $W_{psy}$ space ("Safety Interceptor"), serving as robust preemptive filters that act on semantic intent rather than surface-level tokens. This suggests a pathway for operationalizing latent-level controls to mitigate malicious intent without sacrificing linguistic or cognitive capabilities.

In sum, the Soul Engine framework advances the field of controlled LLM personalization by establishing personality as a geometric feature of latent space, independent of memorized weights. The combination of SoulBench’s data protocol, dual-head architecture, and deterministic vector steering produces high-precision, safe, and controllable persona modulation without the intrinsic trade-offs of conventional personalization (Wang, 8 Dec 2025).

Markdown Upgrade to Chat

References (1)

The Geometry of Persona: Disentangling Personality from Reasoning in Large Language Models (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Soul Engine.