Refinement Provenance Inference (RPI)

Updated 12 January 2026

Refinement Provenance Inference is a method to audit LLM fine-tuning and identify whether a prompt was used in its raw form or after LLM-based refinement.
It leverages teacher-forced token statistics like normalized NLL, top-k inclusion, and confidence margin to detect systematic distributional shifts.
The RePro framework employs contrastive representation learning and a two-stage inference process to achieve robust and transferable provenance detection.

Refinement Provenance Inference (RPI) is the task of auditing fine-tuned LLMs at the instance level to determine whether a given prompt–response pair used for supervised fine-tuning was based on an original ("raw") user prompt or a version refined by an external LLM-based operator. In modern instruction tuning pipelines, prompt refinement—where raw prompts are rewritten for clarity and adherence to canonical instruction-following style by high-capacity LLMs (such as GPT-4o or Llama-3.3)—is widespread. RPI is central to dataset governance and dispute resolution, as it enables parties to verify the provenance of data used to train a released model, especially in the presence of mixed corpora where both raw and refined prompts are interleaved with unknown mixture ratios (Yin et al., 5 Jan 2026).

1. Formalization of the RPI Problem

RPI is defined with respect to a set of semantic instances each comprising a raw prompt $x_i^{\rm raw}$ and a fixed output $y_i$ . An external deterministic LLM refiner $R(\cdot)$ rewrites raw prompts to produce $x_i^{\rm ref}=R(x_i^{\rm raw})$ . During supervised fine-tuning, each instance is randomly included either as raw or refined according to a Bernoulli indicator $z_i\sim\mathrm{Bernoulli}(\rho)$ , with the effective training example being $x_i^{\rm tr}=x_i^{\rm ref}$ when $z_i=1$ and $x_i^{\rm raw}$ otherwise. The fine-tuned model (the "victim" $M_a$ , initialized from a base $M_0$ ) thus sees a mixture of prompt forms. The auditor, given $M_a$ and a prompt–response pair $(x_i,y_i)$ , seeks to infer the latent binary label $z_i$ indicating whether $x_i$ was raw or refined in training. Auditing is conducted by extracting a feature vector $\phi(M_a;x_i,y_i)\in \mathbb R^d$ from teacher-forced next-token distributions, applying a learned scoring function $g$ , and predicting $\hat z_i = \mathbf{I}[s_i \ge \tau]$ with $s_i = g(\phi(\cdot))$ . RPI performance is evaluated using AUC or TPR at low FPR thresholds (Yin et al., 5 Jan 2026).

Refinement of prompts produces systematic, detectable shifts in the fine-tuned model's teacher-forced token behavior. This occurs because canonicalized, LLM-refined prompts more closely match the model's instruction-following priors, thereby inducing differentiated gradients during fine-tuning as compared to raw prompts. When victim model $M_a$ is evaluated on $(x_i,y_i)$ via teacher forcing, the following per-instance statistics capture the distributional changes:

Normalized Negative Log-Likelihood (NLL): $\mathrm{NLL}_M(i) = -\frac{1}{|y_i|} \sum_{t=1}^{|y_i|} \ell^{(M)}_{i,t}$ , with $\ell^{(M)}_{i,t}$ being the log-likelihood at timestep $t$ .
Top- $k$ Inclusion: $\mathrm{TopK}_M(i) = \frac{1}{|y_i|} \sum_{t=1}^{|y_i|} \mathbf{I}[y_{i,t} \in \mathrm{TopK}_t^{(M)}]$ , where $\mathrm{TopK}_t^{(M)}$ contains the highest $k$ logits (with $k=1,5,10$ in practice).
Confidence Margin: $\mathrm{Gap}_M(i) = \frac{1}{|y_i|} \sum_{t=1}^{|y_i|}(s_{t, (1)}^{(M)} - s_{t, (2)}^{(M)})$ , denoting the logit gap between the top two predictions.

To isolate fine-tuning effects, corresponding "uplift" features are computed as $\Delta S(i) = S_{M_0}(i) - S_{M_a}(i)$ , for $S\in\{\mathrm{NLL}, \mathrm{TopK}, \mathrm{Gap}\}$ . These statistics, both in their base and "uplift" forms, are empirically observed to exhibit stable shifts distinguishing raw from refined provenance, even in cases where $x_i^{\rm raw}$ and $x_i^{\rm ref}$ have minimal surface variation (Yin et al., 5 Jan 2026).

3. The RePro Framework for Provenance Inference

RePro is a logit-centric provenance inference framework that aggregates the above teacher-forced features into a single vector and applies a two-stage learning procedure:

Feature Construction:

$\phi(M;x_i,y_i) = [\mathrm{NLL},\; \mathrm{TopK}_{k=1,5,10},\; \mathrm{Gap},\; \Delta\mathrm{NLL},\; \Delta\mathrm{TopK},\; \Delta\mathrm{Gap}]$

with per-dimension standardization.

Stage 1: Shadow Contrastive Representation Learning

A shadow fine-tuning dataset, disjoint from the victim's data, is constructed with the same raw/refined proportion.
A shadow model $M_c$ is fine-tuned, and for each instance, its $\phi_i$ is projected via an MLP encoder $h_\psi$ to $u_i$ .
Supervised contrastive loss is used: $\mathcal L_{\rm SupCon} = \sum_{i\in\mathcal B}\Bigl[-\frac1{|\mathcal P(i)|} \sum_{p\in\mathcal P(i)} \log\frac{\exp(\mathrm{sim}(u_i,u_p)/\tau)} {\sum_{a\neq i}\exp(\mathrm{sim}(u_i,u_a)/\tau)}\Bigr]$ where $\mathcal P(i)$ indexes positive examples (same $z$ -label) in the batch.
After contrastive pre-training, $h_\psi$ is frozen and a lightweight linear classifier $g$ is fit to these embeddings using cross-entropy loss.

Stage 2: Victim Model Inference

For the target victim $M_a$ and candidate $(x_j, y_j)$ , $\phi_j$ is extracted and embedded by $h_\psi$ ; $g$ predicts $z_j$ .
The pipeline generalizes across models and refiners, obviating access to the victim’s training data or stochastic decoding.

This architecture enables robust inference of provenance in unseen victim models and with previously unseen refiners, as it targets distributional shifts that are common to LLM refinement rather than style artifacts (Yin et al., 5 Jan 2026).

4. Empirical Evaluation

RePro is empirically validated on GSM8K (math word problems) and HumanEval (code generation) benchmarks. For GSM8K, $\sim$ 8,000 problems are split, with raw or refined prompts generated via GPT-4o and Llama-3.3-70B. Fine-tuned victims are trained on a 50/50 mix ( $\rho=0.5$ ) with 500 LoRA updates (rank 16, learning rate $2\times 10^{-4}$ ). The evaluation involves no access to the victim's training data at inference.

RePro consistently outperforms three learning-free baselines (raw NLL, uplift $\Delta$ NLL, and pairwise likelihood difference) in AUC and TPR@1%FPR metrics. On GSM8K, learning-free methods yield AUC in $[0.52,0.62]$ , while RePro achieves $[0.66,0.71]$ and TPR@1%FPR $[0.14,0.22]$ . On HumanEval, RePro yields AUC $0.63$–$0.68$ (vs.\ $0.50$–$0.60$ for baselines). Ablation studies reveal uplift features as most critical ( $\sim$ 0.04 AUC drop when removed), with ranking and margin cues also contributing. Contrastive learning of representations yields $\sim$ 0.03–0.05 AUC improvement over linear probing. Detection improves with increasing refined fraction $\rho$ and longer fine-tuning (Yin et al., 5 Jan 2026).

Empirical results indicate that RePro exploits refiner-agnostic, distributional shifts in likelihoods rather than model- or refiner-specific artifacts; transfer across victim model families (Qwen2.5, Llama-3.1, Mistral) and across refiners (GPT-4o, Llama-3.3) results in minimal ($0.02$–$0.03$ AUC) performance loss.

5. Transferability and Generalization

RePro demonstrates robust transferability. Models trained with shadow data refined by one LLM (e.g., GPT-4o) transfer to victims refined by a different LLM (Llama-3.3) with negligible AUC loss. Similar generalization is observed across diverse victim architectures. This transferability arises because the framework targets universal shifts in the model's likelihood landscape induced by the act of instruction-style canonicalization, not the surface artifacts of any particular refiner's style. Thus, RePro is applicable to heterogeneous and black-box fine-tuning pipelines where the details of refinement are unavailable (Yin et al., 5 Jan 2026).

6. Limitations and Practical Considerations

RPI and RePro have several noted limitations:

API Accessibility: Inference requires per-token log probabilities and top- $k$ logits under teacher-forced evaluation; these are inaccessible in some black-box models.
Reference Outputs: Access to gold reference outputs $y$ is required; fully open-ended auditing (without references) is not addressed.
Single-Pass Refinement: The analysis is restricted to single-pass prompt rewriting, with input unchanged and fixed outputs. Pipelines that jointly refine both prompts and responses, or employ multi-stage or multi-turn refinement, may exhibit different and possibly reduced leakage.
Defense Strategies: Practitioners may reduce detectability by interleaving multiple refiners, randomizing refinement styles, or injecting adversarial paraphrases to mask distributional shifts.

Potential directions for extension include adapting RPI to weaker interfaces (e.g., generation-only queries), integrating it within multi-stage curation workflows, and developing obfuscation strategies to mitigate provenance leakage (Yin et al., 5 Jan 2026).

7. Broader Impact and Significance

RPI introduces a principled data-provenance audit mechanism in LLM fine-tuning regimes increasingly reliant on prompt refinement. By formalizing and empirically validating refinement provenance inference, RPI highlights the persistent statistical traces left by canonicalization steps in the training data workflow. The universal “fingerprint” of LLM-based prompt refinement has implications for dataset transparency, copyright and data ownership verification, and the design of robust instruction-tuning protocols. A plausible implication is that future data curation pipelines will need to balance the benefits of prompt refinement against increased provenance detectability, particularly in regulated or adversarial settings (Yin et al., 5 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Refinement Provenance Inference: Detecting LLM-Refined Training Prompts from Model Behavior (2026)

Topic to Video (Beta)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Refinement Provenance Inference (RPI).

Refinement Provenance Inference (RPI)

1. Formalization of the RPI Problem

2. Distributional Shifts Induced by LLM-Based Prompt Refinement

3. The RePro Framework for Provenance Inference

4. Empirical Evaluation

5. Transferability and Generalization

6. Limitations and Practical Considerations

7. Broader Impact and Significance

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Refinement Provenance Inference (RPI)

1. Formalization of the RPI Problem

2. Distributional Shifts Induced by LLM-Based Prompt Refinement

3. The RePro Framework for Provenance Inference

4. Empirical Evaluation

5. Transferability and Generalization

6. Limitations and Practical Considerations

7. Broader Impact and Significance

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics