Private Mask Pre-Training (PMP)
- PMP is a framework that embeds a hidden sparse binary mask during pre-training to restrict unauthorized fine-tuning in foundation models.
- It uses an early-bird lottery ticket algorithm to identify and stabilize the mask, ensuring only the optimized subnetwork is updated in subsequent training.
- Empirical findings show that PMP maintains base performance while reducing adaptation gains from unauthorized fine-tuning by up to 20 points.
Private Mask Pre-Training (PMP) is a pre-training framework developed to embed intrinsic barriers against unauthorized downstream fine-tuning in open-sourced foundation models. PMP achieves this by identifying and privatizing a sparse subnetwork during pre-training via a binary mask, which is hidden prior to model release. As a result, unauthorized fine-tuning—absent knowledge of the mask—provably yields limited gains and is destabilized by a geometrical mismatch between the pre-training and adaptation subspaces. The PMP framework is architecture-agnostic and preserves base model usability, while granting model owners the ability to retain adaptation control without requiring architectural or policy obfuscation (Wang et al., 31 Jan 2026).
1. Formalization and Core Principles
Let denote the dense parameter vector of a foundation model. During PMP, a sparse binary mask is selected, with sparsity ratio (), partitioning into (the active "ticket") and (the inactive complement). Pre-training proceeds such that, after a brief warm-up, all learning updates are confined to , with frozen at initial values. The final released model is , and only the dense weights are published— remains secret.
The central PMP principle is that unauthorized fine-tuning without access to must indiscriminately update both and . Since only was optimized during pre-training, downstream SGD in the orthogonal frozen subspace encounters a loss surface with high curvature and misaligned gradients, inducing instability and bounding any net fine-tuning gain.
2. Mask Identification Procedure
PMP adopts an early-bird lottery ticket-based algorithm to derive the private mask . The procedure involves a warm-up phase of steps (typically 500) on a pre-training subset . At each step, the absolute parameter gradients are computed. The top- () entries by magnitude define candidate support . Mask stability is assessed using intersection-over-union (IoU) thresholds and a hit counter. Once the candidate stabilizes for consecutive steps (e.g., , ), is fixed. The main pre-training then proceeds, updating only .
The process is summarized in the following table:
| Step | Operation | Hyperparameters |
|---|---|---|
| Gradient evaluation | Compute | |
| Top-K mask assignment | if else $0$ | |
| Mask stabilization | IoU increase hits | , |
After mask selection, all subsequent training is strictly in the masked subspace.
3. Theoretical Analysis of Fine-Tuning Instability
PMP provides formal guarantees that unauthorized fine-tuning is globally destabilized by the mask secrecy. The analysis rests on a local geometrical assumption: in the vicinity of the trained parameters, the loss landscape along is flat (null Hessian), while along , the Hessian is strictly positive definite—representing steep directions untouched during pre-training.
Let the downstream objective be , where encodes task shift. For any standard SGD step with step-size not restricted to : $\E[L_{\rm pre}(W_m', W_{\bar m}')] \geq L_{\rm pre}(W_m^*, W_{\bar m}^0) + c\eta^2,$ for some , so long as the gradient in the frozen subspace is non-zero with positive probability. Thus, arbitrary adaptation not guided by systematically increases the pre-training loss, bounding downstream gains over many steps. The proof adapts second-order Taylor expansion to show that the quadratic curvature along dominates.
4. Implementation Aspects
The PMP pipeline comprises three principal stages:
- Mask Discovery: Early-bird mask selection (500 steps).
- Sparse-Subspace Pre-Training: Training proceeds with gradients confined to using standard optimizers (AdamW, cosine lr decay, batch size , gradient norm clipping).
- Release: Only is released, with withheld.
Experiments adopt TinyLlama-1.1B (22 layers, hidden size 2048, 32 heads) and GPT-2 architectures, pre-trained on SlimPajama-6B (tokenized to 256 tokens, causal LM loss). Authorized fine-tuning can be enabled for select users by providing such that only updates. Storage and release do not expose . Empirically, adversaries are unable to reconstruct from observed gradients or outputs, as gradient-magnitude distributions between masked and unmasked weights overlap.
5. Empirical Findings
PMP is evaluated on base (zero-shot, pre-finetuning) capabilities and unauthorized fine-tuning across GLUE tasks (CoLA, SST-2, MRPC, QQP, STS-B, MNLI, QNLI, RTE). The experimental protocol holds all training and adaptation hyperparameters constant across non-PMP and PMP settings.
Key outcomes:
- Base performance is unchanged with PMP (e.g., TinyLlama: w/o PMP vs. w/ PMP).
- Unauthorized fine-tuning accuracy is greatly reduced (TinyLlama: w/o PMP vs. w/ PMP).
- Varying the mask ratio modulates the barrier: base accuracy is stable, but unauthorized fine-tuning drops from at to at .
- Across a grid of learning rates and epochs, PMP consistently suppresses adaptation gains by points, while non-PMP models are robust.
- Authorized fine-tuning (with ) recovers high post-adaptation accuracy ( on GLUE).
6. Discussion and Open Problems
PMP establishes non-fine-tunability by imposing a pre-training-level geometry mismatch, graphically supported by loss-landscape sweeps indicating flat valleys along and steep walls in orthogonal directions. The mask ratio operates as a tunable control on the model's resilience to adaptation, trading off adaptation difficulty for training speed.
Empirical findings suggest that the mask cannot be reliably inferred from black-box queries, due to the statistical overlap of gradient magnitudes between masked and unmasked weights.
Several limitations and open directions are recognized:
- Theoretical guarantees focus on single-step adaptation; multi-step SGD dynamics, especially for sophisticated adversaries, warrant deeper investigation.
- The secrecy of is central; partial leakage or white-box access could present vulnerabilities, necessitating future robustness analyses.
- Current experiments focus on GLUE; effectiveness on instruction-tuning, multimodal, or adversarial tasks remains to be evaluated.
- Interactions with other release strategies, such as quantization or differential privacy, are unexplored.
PMP provides an architecture-agnostic, low-overhead method to regulate foundation model adaptation post-release, preserving base utility and authorized fine-tuning while bounding unauthorized reuse through theoretically and empirically supported mechanisms (Wang et al., 31 Jan 2026).