Papers
Topics
Authors
Recent
2000 character limit reached

Under-trained Tokens as Fingerprints (UTF)

Updated 26 November 2025
  • The paper introduces UTF, which exploits infrequently trained tokens as robust fingerprints for model provenance and authentication in LLMs and hardware.
  • It outlines a detection pipeline using tokenizer analysis, weight-based indicators, and prompting that yields effective statistical signatures with token incidences between 0.1% and 1%.
  • The methodology offers both passive and active fingerprinting, demonstrating high reliability (up to 94.4%) and efficient performance across varying model scales and embedded systems.

Under-trained Tokens as Fingerprints (UTF) refers to a family of methods exploiting the presence, distribution, or manipulation of tokens in a system—tokens that are infrequently or never seen during training—as robust, intrinsic fingerprints for provenance, authentication, and attribution. These approaches have found utility in domains spanning LLMs and embedded device authentication, exploiting the unique behaviors that arise due to incomplete or biased training on a system’s representational vocabulary or hardware idiosyncrasies.

1. Formal Definition and Phenomenology of Under-trained Tokens

Under-trained tokens (also called “glitch tokens”) are entries in a system’s representational vocabulary that are observed either completely absent or at severely reduced frequency during system training. For LLMs, this reflects the inevitable mismatch between frozen tokenizers and the dynamic, incomplete sampling of the training corpus: certain tokens (e.g., rare character sequences, OOV placeholders, or artifacts of BPE) are absent from training data or appear only a handful of times. For embedded hardware, “under-trained” refers to features or response patterns (e.g., analog measurements) that an adversary cannot reliably learn due to deliberate poisoning or insufficient sampling.

Formally, for a model with vocabulary VV and a given token tt, let count_train(t)count\_train(t) denote its count in the training data. Define f(t)=count_train(t)/Nf(t) = count\_train(t)/N for corpus size NN, and declare tt under-trained if f(t)<τf(t) < \tau for some threshold τ\tau (Land et al., 8 May 2024). However, in practical scenarios (closed-source models, black-box endpoints), direct frequency estimates are unavailable and surrogate features—embedding statistics, output bias, or observed behaviors—are used.

2. Detection and Characterization Methodologies

Detection and characterization of under-trained tokens proceeds by combining statistical analysis of tokenizers, interrogation of model parameter matrices, and behavioral probing. The canonical analysis pipeline comprises three stages (Land et al., 8 May 2024):

  • Tokenizer Analysis: Inspect the vocabulary for unreachable tokens (where decode(t)=sdecode(t) = s but encode(s)tencode(s) \neq t), partial encodings (invalid UTF-8), and special/control tokens.
  • Weight-based Indicators: Compute proxies such as embedding norms or distances to the mean OOV vector in the model’s unembedding (UU) and/or input embedding matrices. For example, let URV×dU \in \mathbb{R}^{|V| \times d}; project out dominant variance by subtracting the leading principal component, then compute a cosine indicator Icos(t)=1Ut,uoovUt2uoov2I_{cos}(t) = 1 - \frac{\langle U'_t, u'_{oov} \rangle}{\|U'_t\|_2 \|u'_{oov}\|_2}.
  • Prompting-based Verification: Apply targeted prompts to the model, checking if output probabilities for a token tt under maximal self-triggering templates remain below a threshold (e.g., Sprompt(t)<1%S_{prompt}(t) < 1\%).

This pipeline efficiently identifies under-trained tokens, yielding incidence rates typically between 0.1%0.1\% and 1%1\% of the full vocabulary across modern LLMs (Land et al., 8 May 2024). The presence, distribution, and model-dependent clustering of these tokens provide a robust, reproducible signature.

3. Fingerprinting LLMs using UTF

Multiple fingerprinting strategies have emerged from the existence of under-trained tokens. A fundamental property is that the specific set and statistical idiosyncrasies of under-trained tokens are highly specific to model provenance and initialization choices (Land et al., 8 May 2024, Cai et al., 16 Oct 2024, Tong et al., 30 Sep 2025).

3.1 Passive Fingerprinting

By cataloging which glitch tokens are present (using the previous detection pipeline) and collecting behavioral signatures, a model's unique provable identity can be established. For example, FM={t:SpromptM(t)<θ}F_M = \{ t : S_{prompt}^M(t) < \theta \} serves as a fingerprint that is highly unlikely to be reproduced by models with different training data, initialization, or weight decay parameters (Land et al., 8 May 2024).

3.2 Active Fingerprinting via Supervised Fine-tuning

An alternative method explicitly embeds unique trigger–response pairs into a model by exploiting under-trained tokens’ isolation in the token space (Cai et al., 16 Oct 2024). Here, one identifies a set TT of under-trained tokens (via cosine distance or embedding norm), samples sequences (x,y)(x, y) from TT, and conducts supervised fine-tuning to maximize the probability Pθ(yx)P_{\theta}(y|x), resulting in a model MθfM_{\theta_f} that, upon query with xx, reliably outputs yy (effectiveness 100%100\%, reliability >94%>94\% for Llama2-7B-chat; negligible accuracy drop, see Table below). This fine-tuning phase is extremely fast, requires no architectural modification, and is robust to further model fine-tuning.

Model / Method Effectiveness (%) Reliability (%) Efficiency (min)
IF_SFT 100 34.4 25
IF 100 0.0 6
dialogue 100 76.2 6
UTF (ours) 100 94.4 6

(Table: Effectiveness, reliability, and efficiency for fingerprint embedding in Llama2-7B-chat (Cai et al., 16 Oct 2024).)

3.3 Initialization-based Fingerprinting: SeedPrints

SeedPrints leverages the random biases present at initialization as persistent, seed-dependent “birthmarks” imprinted in the probability distribution over tokens—most clearly observed in under-trained or untrained models (Tong et al., 30 Sep 2025). Measuring mean next-token bias bj(θ)b_j(\theta) for each vocabulary index jj across random contexts xx:

bj(θ)=ExUniformInputs[pθ(jx)]1Vb_j(\theta) = \mathbb{E}_{x \sim UniformInputs}[p_\theta(j|x)] - \frac{1}{V}

one observes reproducible “spikes” in pθp_\theta that are seed-dependent. These fine-grained signatures persist through the full training pipeline and can be detected using a black-box statistical protocol (intersection of least-likely tokens, Kendall’s τ\tau rank correlations, hypothesis testing vs. a null distribution). This method achieves perfect seed-attribution robustness across domain shifts and parameter modifications (see Section 6).

4. UTF for Hardware-Rooted Device Authentication

Under-trained Tokens as Fingerprints have also been generalized to embedded environments where “tokens” correspond to sets of measurable hardware fingerprints (Xiao et al., 22 Mar 2024). Here, UTF acts as a lightweight, hardware-rooted authentication that defends against replay, ML-based mimicry, and token compromise attacks on MCU-based IoT devices.

For each request, the client runtime maps the operation, nonce, and payload to arguments for NN fingerprinting tasks distributed across diverse on-chip modules (e.g., ADC/DAC loops, SRAM pattern reads, RTC frequency). Each task produces a short fingerprint fpifp_i, after which a randomized subset (typically 50%50\%) are deliberately “poisoned” with additive and multiplicative noise. The resulting token is appended to the payload and verified by a backend regression model trained on clean (non-poisoned) data.

The poisoning ensures that adversaries, even those in possession of keys and with the ability to collect large \langlepayload, token\rangle datasets, cannot effectively train models to replicate fresh fingerprints. Empirical results indicate false-positive rates below 5%5\%, true-positive rates above 97%97\%, and median end-to-end authentication latency below $31$–$115$ ms, all with sub-4%4\% power overhead (Xiao et al., 22 Mar 2024).

5. Statistical and Security Guarantees

The UTF framework is supported by rigorous statistical and empirical guarantees across both software and hardware domains.

  • LLM Provenance: Fingerprints based on under-trained tokens exhibit seed-level uniqueness and remain robust across all pre-training and fine-tuning stages (Tong et al., 30 Sep 2025). The system attains Area Under Curve (AUC) $0.992$ on the LeafBench benchmark, outperforming contemporaneous methods such as REEF (AUC 0.914\approx 0.914) and matching or exceeding Intrinsic and ICS baselines (Tong et al., 30 Sep 2025).
  • Behavioral Persistence: Both explicitly implanted and passively inherited UTF signatures remain detectable and effective after heavy domain adaptation, quantization, and model merging or distillation (Cai et al., 16 Oct 2024, Tong et al., 30 Sep 2025).
  • Adversarial Robustness (Hardware): Attackers’ models, trained on mixed clean and poisoned token sets, are insufficiently predictive and impersonation success drops below 2%2\% even with ML filtering (Xiao et al., 22 Mar 2024).

6. Empirical Results and Deployment Observations

Key empirical observations for UTF across published studies are summarized below:

Setting Logits (t-test) Hidden (U-test) Intrinsic REEF PCS ICS
TinyStories_(1000) p0p \approx 0 p7.77e ⁣ ⁣89p \approx 7.77e\!-\!89 1.000✔ 0.759✗ 0.999✔ 0.996✔
TinyStories_(123) [distractor] p1.000p \approx 1.000✔ p0.902p \approx 0.902✔ 0.950✗ 0.658✔ 0.332✔ 0.012✔
The_Stack_(1000) p0p \approx 0 p3.09e ⁣ ⁣69p \approx 3.09e\!-\!69 0.489✗ 0.557✗ 0.585✗ 0.123✗
The_Stack_(123) [distractor] p0.616p \approx 0.616✔ p0.831p \approx 0.831✔ 0.445✔ 0.580✔ 0.301✔ 0.026✔
  • Only UTF achieves flawless lineage detection across all domain shift conditions; alternative baselines exhibit frequent errors (Tong et al., 30 Sep 2025).
  • In LLMs, the fraction of under-trained tokens detected aligns with tokenizer idiosyncrasies and training regimes, typically $0.1$–1%1\% for major models (Land et al., 8 May 2024).
  • All-stage verifiability is maintained, as fingerprint persistence is observed from initialization (0%0\%) to full convergence (100%100\% pretraining) (Tong et al., 30 Sep 2025).

7. Limitations, Guidelines, and Extensions

UTF is subject to several caveats and practical design recommendations:

  • Scalability: Current methods are evaluated mainly on 7\sim7B parameter models (LLMs); extension to much larger models (e.g., $70$B+) remains open (Cai et al., 16 Oct 2024).
  • Security Boundaries: Brute-force probing of the under-trained token set is a theoretical attack vector for actively fingerprinted models, though the combinatorial search space is large (Cai et al., 16 Oct 2024).
  • Design: Careful selection of candidate tokens or hardware tasks and tuning of poisoning ratios is required to balance false positives and detection robustness (Xiao et al., 22 Mar 2024).
  • Model and Tokenizer Drift: Variations in tokenizer construction, weight tying, and software stack alter the under-trained token landscape, thus fingerprint catalogs should be maintained and updated accordingly (Land et al., 8 May 2024).
  • Operational Considerations: Defensive input sanitization can mitigate exploitation of model weaknesses in production; hardware deployments should prefer features with high inter-device variance (Land et al., 8 May 2024, Xiao et al., 22 Mar 2024).

UTF methods are expandable, with prospective developments including parameter-efficient adapter tuning, formalized fingerprint collision statistics, and multi-stage, multi-key fingerprints for both LLMs and embedded contexts (Cai et al., 16 Oct 2024).


In summary, Under-trained Tokens as Fingerprints (UTF) systems exploit the structural, statistical, or adversarially imposed underutilization of tokens to provide model or device fingerprinting, provenance, and authentication. These fingerprints are quantifiable, robust, and, when properly implemented, confer strong guarantees against a broad range of impersonation, copying, and misattribution attacks across both large-scale LLMs and resource-limited hardware platforms (Tong et al., 30 Sep 2025, Cai et al., 16 Oct 2024, Land et al., 8 May 2024, Xiao et al., 22 Mar 2024).

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Under-trained Tokens as Fingerprints (UTF).