Under-trained Tokens as Fingerprints (UTF)

Updated 26 November 2025

The paper introduces UTF, which exploits infrequently trained tokens as robust fingerprints for model provenance and authentication in LLMs and hardware.
It outlines a detection pipeline using tokenizer analysis, weight-based indicators, and prompting that yields effective statistical signatures with token incidences between 0.1% and 1%.
The methodology offers both passive and active fingerprinting, demonstrating high reliability (up to 94.4%) and efficient performance across varying model scales and embedded systems.

Under-trained Tokens as Fingerprints (UTF) refers to a family of methods exploiting the presence, distribution, or manipulation of tokens in a system—tokens that are infrequently or never seen during training—as robust, intrinsic fingerprints for provenance, authentication, and attribution. These approaches have found utility in domains spanning LLMs and embedded device authentication, exploiting the unique behaviors that arise due to incomplete or biased training on a system’s representational vocabulary or hardware idiosyncrasies.

1. Formal Definition and Phenomenology of Under-trained Tokens

Under-trained tokens (also called “glitch tokens”) are entries in a system’s representational vocabulary that are observed either completely absent or at severely reduced frequency during system training. For LLMs, this reflects the inevitable mismatch between frozen tokenizers and the dynamic, incomplete sampling of the training corpus: certain tokens (e.g., rare character sequences, OOV placeholders, or artifacts of BPE) are absent from training data or appear only a handful of times. For embedded hardware, “under-trained” refers to features or response patterns (e.g., analog measurements) that an adversary cannot reliably learn due to deliberate poisoning or insufficient sampling.

Formally, for a model with vocabulary $V$ and a given token $t$ , let $count\_train(t)$ denote its count in the training data. Define $f(t) = count\_train(t)/N$ for corpus size $N$ , and declare $t$ under-trained if $f(t) < \tau$ for some threshold $\tau$ (Land et al., 2024). However, in practical scenarios (closed-source models, black-box endpoints), direct frequency estimates are unavailable and surrogate features—embedding statistics, output bias, or observed behaviors—are used.

2. Detection and Characterization Methodologies

Detection and characterization of under-trained tokens proceeds by combining statistical analysis of tokenizers, interrogation of model parameter matrices, and behavioral probing. The canonical analysis pipeline comprises three stages (Land et al., 2024):

Tokenizer Analysis: Inspect the vocabulary for unreachable tokens (where $decode(t) = s$ but $encode(s) \neq t$ ), partial encodings (invalid UTF-8), and special/control tokens.
Weight-based Indicators: Compute proxies such as embedding norms or distances to the mean OOV vector in the model’s unembedding ( $U$ ) and/or input embedding matrices. For example, let $U \in \mathbb{R}^{|V| \times d}$ ; project out dominant variance by subtracting the leading principal component, then compute a cosine indicator $I_{cos}(t) = 1 - \frac{\langle U'_t, u'_{oov} \rangle}{\|U'_t\|_2 \|u'_{oov}\|_2}$ .
Prompting-based Verification: Apply targeted prompts to the model, checking if output probabilities for a token $t$ under maximal self-triggering templates remain below a threshold (e.g., $S_{prompt}(t) < 1\%$ ).

This pipeline efficiently identifies under-trained tokens, yielding incidence rates typically between $0.1\%$ and $1\%$ of the full vocabulary across modern LLMs (Land et al., 2024). The presence, distribution, and model-dependent clustering of these tokens provide a robust, reproducible signature.

3. Fingerprinting LLMs using UTF

Multiple fingerprinting strategies have emerged from the existence of under-trained tokens. A fundamental property is that the specific set and statistical idiosyncrasies of under-trained tokens are highly specific to model provenance and initialization choices (Land et al., 2024, Cai et al., 2024, Tong et al., 30 Sep 2025).

3.1 Passive Fingerprinting

By cataloging which glitch tokens are present (using the previous detection pipeline) and collecting behavioral signatures, a model's unique provable identity can be established. For example, $F_M = \{ t : S_{prompt}^M(t) < \theta \}$ serves as a fingerprint that is highly unlikely to be reproduced by models with different training data, initialization, or weight decay parameters (Land et al., 2024).

3.2 Active Fingerprinting via Supervised Fine-tuning

An alternative method explicitly embeds unique trigger–response pairs into a model by exploiting under-trained tokens’ isolation in the token space (Cai et al., 2024). Here, one identifies a set $T$ of under-trained tokens (via cosine distance or embedding norm), samples sequences $(x, y)$ from $T$ , and conducts supervised fine-tuning to maximize the probability $P_{\theta}(y|x)$ , resulting in a model $M_{\theta_f}$ that, upon query with $x$ , reliably outputs $y$ (effectiveness $100\%$ , reliability $>94\%$ for Llama2-7B-chat; negligible accuracy drop, see Table below). This fine-tuning phase is extremely fast, requires no architectural modification, and is robust to further model fine-tuning.

Model / Method	Effectiveness (%)	Reliability (%)	Efficiency (min)
IF_SFT	100	34.4	25
IF	100	0.0	6
dialogue	100	76.2	6
UTF (ours)	100	94.4	6

(Table: Effectiveness, reliability, and efficiency for fingerprint embedding in Llama2-7B-chat (Cai et al., 2024).)

3.3 Initialization-based Fingerprinting: SeedPrints

SeedPrints leverages the random biases present at initialization as persistent, seed-dependent “birthmarks” imprinted in the probability distribution over tokens—most clearly observed in under-trained or untrained models (Tong et al., 30 Sep 2025). Measuring mean next-token bias $b_j(\theta)$ for each vocabulary index $j$ across random contexts $x$ :

$b_j(\theta) = \mathbb{E}_{x \sim UniformInputs}[p_\theta(j|x)] - \frac{1}{V}$

one observes reproducible “spikes” in $p_\theta$ that are seed-dependent. These fine-grained signatures persist through the full training pipeline and can be detected using a black-box statistical protocol (intersection of least-likely tokens, Kendall’s $\tau$ rank correlations, hypothesis testing vs. a null distribution). This method achieves perfect seed-attribution robustness across domain shifts and parameter modifications (see Section 6).

4. UTF for Hardware-Rooted Device Authentication

Under-trained Tokens as Fingerprints have also been generalized to embedded environments where “tokens” correspond to sets of measurable hardware fingerprints (Xiao et al., 2024). Here, UTF acts as a lightweight, hardware-rooted authentication that defends against replay, ML-based mimicry, and token compromise attacks on MCU-based IoT devices.

For each request, the client runtime maps the operation, nonce, and payload to arguments for $N$ fingerprinting tasks distributed across diverse on-chip modules (e.g., ADC/DAC loops, SRAM pattern reads, RTC frequency). Each task produces a short fingerprint $fp_i$ , after which a randomized subset (typically $50\%$ ) are deliberately “poisoned” with additive and multiplicative noise. The resulting token is appended to the payload and verified by a backend regression model trained on clean (non-poisoned) data.

The poisoning ensures that adversaries, even those in possession of keys and with the ability to collect large $\langle$ payload, token $\rangle$ datasets, cannot effectively train models to replicate fresh fingerprints. Empirical results indicate false-positive rates below $5\%$ , true-positive rates above $97\%$ , and median end-to-end authentication latency below $31$–$115$ ms, all with sub- $4\%$ power overhead (Xiao et al., 2024).

5. Statistical and Security Guarantees

The UTF framework is supported by rigorous statistical and empirical guarantees across both software and hardware domains.

LLM Provenance: Fingerprints based on under-trained tokens exhibit seed-level uniqueness and remain robust across all pre-training and fine-tuning stages (Tong et al., 30 Sep 2025). The system attains Area Under Curve (AUC) $0.992$ on the LeafBench benchmark, outperforming contemporaneous methods such as REEF (AUC $\approx 0.914$ ) and matching or exceeding Intrinsic and ICS baselines (Tong et al., 30 Sep 2025).
Behavioral Persistence: Both explicitly implanted and passively inherited UTF signatures remain detectable and effective after heavy domain adaptation, quantization, and model merging or distillation (Cai et al., 2024, Tong et al., 30 Sep 2025).
Adversarial Robustness (Hardware): Attackers’ models, trained on mixed clean and poisoned token sets, are insufficiently predictive and impersonation success drops below $2\%$ even with ML filtering (Xiao et al., 2024).

6. Empirical Results and Deployment Observations

Key empirical observations for UTF across published studies are summarized below:

Setting	Logits (t-test)	Hidden (U-test)	Intrinsic	REEF	PCS	ICS
TinyStories_(1000)	$p \approx 0$	$p \approx 7.77e\!-\!89$	1.000✔	0.759✗	0.999✔	0.996✔
TinyStories_(123) [distractor]	$p \approx 1.000✔$	$p \approx 0.902✔$	0.950✗	0.658✔	0.332✔	0.012✔
The_Stack_(1000)	$p \approx 0$	$p \approx 3.09e\!-\!69$	0.489✗	0.557✗	0.585✗	0.123✗
The_Stack_(123) [distractor]	$p \approx 0.616✔$	$p \approx 0.831✔$	0.445✔	0.580✔	0.301✔	0.026✔

Only UTF achieves flawless lineage detection across all domain shift conditions; alternative baselines exhibit frequent errors (Tong et al., 30 Sep 2025).
In LLMs, the fraction of under-trained tokens detected aligns with tokenizer idiosyncrasies and training regimes, typically $0.1$– $1\%$ for major models (Land et al., 2024).
All-stage verifiability is maintained, as fingerprint persistence is observed from initialization ( $0\%$ ) to full convergence ( $100\%$ pretraining) (Tong et al., 30 Sep 2025).

7. Limitations, Guidelines, and Extensions

UTF is subject to several caveats and practical design recommendations:

Scalability: Current methods are evaluated mainly on $\sim7$ B parameter models (LLMs); extension to much larger models (e.g., $70$B+) remains open (Cai et al., 2024).
Security Boundaries: Brute-force probing of the under-trained token set is a theoretical attack vector for actively fingerprinted models, though the combinatorial search space is large (Cai et al., 2024).
Design: Careful selection of candidate tokens or hardware tasks and tuning of poisoning ratios is required to balance false positives and detection robustness (Xiao et al., 2024).
Model and Tokenizer Drift: Variations in tokenizer construction, weight tying, and software stack alter the under-trained token landscape, thus fingerprint catalogs should be maintained and updated accordingly (Land et al., 2024).
Operational Considerations: Defensive input sanitization can mitigate exploitation of model weaknesses in production; hardware deployments should prefer features with high inter-device variance (Land et al., 2024, Xiao et al., 2024).

UTF methods are expandable, with prospective developments including parameter-efficient adapter tuning, formalized fingerprint collision statistics, and multi-stage, multi-key fingerprints for both LLMs and embedded contexts (Cai et al., 2024).

In summary, Under-trained Tokens as Fingerprints (UTF) systems exploit the structural, statistical, or adversarially imposed underutilization of tokens to provide model or device fingerprinting, provenance, and authentication. These fingerprints are quantifiable, robust, and, when properly implemented, confer strong guarantees against a broad range of impersonation, copying, and misattribution attacks across both large-scale LLMs and resource-limited hardware platforms (Tong et al., 30 Sep 2025, Cai et al., 2024, Land et al., 2024, Xiao et al., 2024).