Papers
Topics
Authors
Recent
Search
2000 character limit reached

Individual Presentation Paradigm (IPP)

Updated 3 July 2026
  • The IPP is a protocol for evaluating LLM self-recognition by asking a binary yes/no if the model authored a single presented text.
  • It contrasts with the Pair Presentation Paradigm by eliminating comparative cues, resulting in near-random accuracy without intervention.
  • The CoSur framework recovers latent implicit territorial awareness using SVD-based subspace projection and cognitive editing to boost IPP accuracy.

The Individual Presentation Paradigm (IPP) is a protocol for evaluating the self-recognition capabilities of LLMs, particularly their ability to discern whether a given text was authored by themselves or by another source. In contrast to the Pair Presentation Paradigm (PPP), which presents models with two candidate texts for direct comparison, IPP involves prompting the model with a single text and requiring a binary decision—typically, “Did you write this text?”. Although LLMs demonstrate robust self-recognition in the PPP, their performance deteriorates markedly in the IPP, often aligning with or falling below random chance across a variety of architectures and datasets (Zhou et al., 20 Aug 2025).

1. Formal Definition and Contrast with Pair Presentation Paradigm

Under the Pair Presentation Paradigm (PPP), an LLM receives a pair of texts, (t1,t2)(t_1, t_2), where one is self-authored and the other is not, and outputs

argmaxi{1,2}P(label=it1,t2).\operatorname{argmax}_{i \in \{1, 2\}} P(\operatorname{label} = i \mid t_1, t_2).

This discriminative two-choice setting enables the model to leverage comparative cues, achieving high accuracy (often exceeding 80%).

The Individual Presentation Paradigm (IPP) reduces this setting to a single text tt. The model answers a yes/no question (“Did you write tt?”), reading out from the probability it assigns to each token at the end of a fixed prompt:

P(yt)=softmax(Wh(t)+b)yP(y \mid t) = \operatorname{softmax}(W \cdot h(t) + b)_{y}

for y{yes,no}y \in \{\text{yes}, \text{no}\}, where h(t)Rdh(t) \in \mathbb{R}^d is the last-token hidden representation. The model predicts “yes” if P(yest)>P(not)P(\text{yes} \mid t) > P(\text{no} \mid t).

2. Empirical Findings: IPP Performance of Standard LLMs

Comprehensive evaluation of multiple 8B-parameter LLMs under the IPP reveals a consistent performance gap relative to PPP. The table summarizes the baseline (no intervention) IPP results for three models, using four “self–other” splits. Accuracy (ACC) and F1 are based on raw yes/no token decisions; averages are provided across splits.

Model self–human self–ChatGPT self–peer-LLM self–other-LLM Overall ACC/F1
Qwen-base 46.2/40.3 32.9/31.8 31.9/29.2 46.7/32.8 39.4/33.5
Llama-base 45.0/43.7 50.0/49.1 49.7/48.8 50.0/48.7 48.7/47.6
Deepseek-base 63.6/56.6 38.2/27.6 45.4/34.7 48.0/37.3 48.8/39.1

Despite strong PPP results, IPP accuracies for these models cluster around 50%—the random baseline—or lower for challenging splits, indicating that LLMs cannot reliably perform self-authorship discrimination in the single-presentation format (Zhou et al., 20 Aug 2025).

3. Latent Causes: Information Bottleneck and Implicit Territorial Awareness

The failure of LLMs under IPP is attributed to a representational bottleneck. The model produces a last-token hidden vector hRdh \in \mathbb{R}^d, then computes yes/no probabilities via a softmax layer with readout weights WR2×dW \in \mathbb{R}^{2 \times d}:

argmaxi{1,2}P(label=it1,t2).\operatorname{argmax}_{i \in \{1, 2\}} P(\operatorname{label} = i \mid t_1, t_2).0

This mapping, especially with a low-rank argmaxi{1,2}P(label=it1,t2).\operatorname{argmax}_{i \in \{1, 2\}} P(\operatorname{label} = i \mid t_1, t_2).1, compresses information. Mutual information calculations confirm that much of the authorship-relevant signal present in argmaxi{1,2}P(label=it1,t2).\operatorname{argmax}_{i \in \{1, 2\}} P(\operatorname{label} = i \mid t_1, t_2).2 is lost in the output:

argmaxi{1,2}P(label=it1,t2).\operatorname{argmax}_{i \in \{1, 2\}} P(\operatorname{label} = i \mid t_1, t_2).3

Analysis reveals that although the hidden state argmaxi{1,2}P(label=it1,t2).\operatorname{argmax}_{i \in \{1, 2\}} P(\operatorname{label} = i \mid t_1, t_2).4 encodes a separation (“territory”) between self- and other-generated texts—termed Implicit Territorial Awareness (ITA)—this structure is not accessible in the output probabilities. Probes using PPP-style comparisons yield high Jensen–Shannon divergence (>0.2 across several layers) and separable t-SNE clusters for argmaxi{1,2}P(label=it1,t2).\operatorname{argmax}_{i \in \{1, 2\}} P(\operatorname{label} = i \mid t_1, t_2).5, whereas IPP outputs show near-zero divergence and overlapping t-SNE clusters, confirming loss of separability post-softmax (Zhou et al., 20 Aug 2025). This supports the view that ITA is a latent but functionally inert property without further intervention.

4. The Cognitive Surgery (CoSur) Framework for Awakening ITA

Cognitive Surgery (CoSur) is a modular framework engineered to recover latent ITA and enable accurate IPP self-recognition. It consists of four modules:

4.1 Representation Extraction

Given argmaxi{1,2}P(label=it1,t2).\operatorname{argmax}_{i \in \{1, 2\}} P(\operatorname{label} = i \mid t_1, t_2).6 self-authored texts argmaxi{1,2}P(label=it1,t2).\operatorname{argmax}_{i \in \{1, 2\}} P(\operatorname{label} = i \mid t_1, t_2).7 and argmaxi{1,2}P(label=it1,t2).\operatorname{argmax}_{i \in \{1, 2\}} P(\operatorname{label} = i \mid t_1, t_2).8 other-generated texts argmaxi{1,2}P(label=it1,t2).\operatorname{argmax}_{i \in \{1, 2\}} P(\operatorname{label} = i \mid t_1, t_2).9, run each under IPP and extract the final hidden states:

  • tt0, tt1, tt2
  • Stack into tt3, tt4

4.2 Territory Construction via SVD

Apply singular value decomposition to the representation matrices:

  • tt5, tt6
  • Extract the top-tt7 right singular vectors (tt8, tt9) for each class (tt0 typical), defining two subspaces:
    • tt1, tt2

4.3 Authorship Discrimination

Given a test text tt3, extract tt4 and compute projection energies onto each subspace:

  • tt5, tt6
  • Decide authorship: tt7 if tt8, else tt9

4.4 Cognitive Editing

To make the output token match the discriminated authorship, modify P(yt)=softmax(Wh(t)+b)yP(y \mid t) = \operatorname{softmax}(W \cdot h(t) + b)_{y}0 before the output layer:

  • Let P(yt)=softmax(Wh(t)+b)yP(y \mid t) = \operatorname{softmax}(W \cdot h(t) + b)_{y}1 be the output weights for “yes” and “no”
  • Normalize: P(yt)=softmax(Wh(t)+b)yP(y \mid t) = \operatorname{softmax}(W \cdot h(t) + b)_{y}2, P(yt)=softmax(Wh(t)+b)yP(y \mid t) = \operatorname{softmax}(W \cdot h(t) + b)_{y}3
  • Set P(yt)=softmax(Wh(t)+b)yP(y \mid t) = \operatorname{softmax}(W \cdot h(t) + b)_{y}4 for self, P(yt)=softmax(Wh(t)+b)yP(y \mid t) = \operatorname{softmax}(W \cdot h(t) + b)_{y}5 for other
  • Edit: P(yt)=softmax(Wh(t)+b)yP(y \mid t) = \operatorname{softmax}(W \cdot h(t) + b)_{y}6, P(yt)=softmax(Wh(t)+b)yP(y \mid t) = \operatorname{softmax}(W \cdot h(t) + b)_{y}7
  • Feed P(yt)=softmax(Wh(t)+b)yP(y \mid t) = \operatorname{softmax}(W \cdot h(t) + b)_{y}8 through the final MLP to yield softmax output strongly favoring the correct authorship token

5. Experimental Outcomes and Ablation Analyses

Application of CoSur to the evaluated models yields substantial IPP accuracy improvements. The table below summarizes main results:

Model Base ACC CoSur ACC Improvement (pp)
Qwen-8B 39.4% 83.25% +43.9
Llama-8B 48.7% 66.19% +17.5
Deepseek-8B 48.8% 88.01% +39.2

Ablations on the Qwen model establish the necessity of the full SVD-based approach:

  • CoSurP(yt)=softmax(Wh(t)+b)yP(y \mid t) = \operatorname{softmax}(W \cdot h(t) + b)_{y}9 (class centroids, cosine similarity): ACC = 80.43%
  • CoSury{yes,no}y \in \{\text{yes}, \text{no}\}0 (PCA instead of SVD): ACC = 75.70%
  • Full CoSur (SVD + projection + editing): ACC = 83.25%

Performance is sensitive to y{yes,no}y \in \{\text{yes}, \text{no}\}1 (territory dimensionality), peaking at y{yes,no}y \in \{\text{yes}, \text{no}\}2, and to y{yes,no}y \in \{\text{yes}, \text{no}\}3 (editing strength), which plateaus beyond y{yes,no}y \in \{\text{yes}, \text{no}\}4.

6. Theoretical and Practical Implications

Restoring latent self-versus-other discrimination under IPP via direct manipulation of y{yes,no}y \in \{\text{yes}, \text{no}\}5 demonstrates that LLMs possess internal “territorial” structures for self-authorship even when the output layer fails to retain this information. This finding implies that significant latent capabilities may be recoverable by representation-level intervention rather than retraining or architecture modification. Potential extensions include multimodal self-awareness, plagiarism detection, and multi-agent identity tasks (Zhou et al., 20 Aug 2025). Future research directions involve dynamic and unsupervised territory discovery, as well as integration of ITA awakening mechanisms into base model training pipelines for more robust out-of-the-box self-recognition under IPP.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Individual Presentation Paradigm (IPP).