Proof-of-Learning (PoL)

Updated 22 December 2025

Proof-of-Learning (PoL) is a cryptographic protocol that certifies genuine machine learning model training via verifiable commitments and secure checkpoints.
It employs techniques like zero-knowledge proofs, checkpoint logging, and watermark chaining to enhance security, privacy, and model ownership attestation.
PoL protocols are applied in blockchain consensus, distributed training, and AI marketplaces to align computational effort with verifiable training outcomes.

Proof-of-Learning (PoL) is a class of protocols and cryptographic mechanisms whereby a computational agent provides a convincing, efficiently verifiable certificate that it has genuinely performed the expensive computation required for machine learning model training. PoL schemes address inefficiencies and waste in traditional proof-of-work (PoW) blockchain consensus, enable verifiable outsourcing of model training, and underpin emerging decentralized machine learning marketplaces. Modern PoL designs aim for strong soundness—even against active adversaries who know all task parameters except the secret randomness of an honest run—while providing efficient verification, privacy protection, incentive-alignment, and sometimes ownership attestation.

1. Formal Definition and Protocol Architecture

PoL protocols instantiate a proof game between a prover (training agent) and a verifier (judge or blockchain full node). Abstractly, a PoL tuple consists of training data $\mathcal{D}_{\text{train}}$ , optionally a test set $\mathcal{D}_{\text{test}}$ , model architecture $M$ parameterized by $\theta$ , and a target metric $\mathcal{M}$ subject to a threshold $\tau$ :

PoL instance: $C = (\mathcal{D}_{\text{train}}, \mathcal{D}_{\text{test}}, M, \mathcal{M}, \tau)$
Prover's submission: Final weights $\theta^*$ , claimed performance $p^* = \mathcal{M}(\theta^*, \mathcal{D}_{\text{test}})$ , cryptographic commitment $h^* = H(\theta^*)$ .

A canonical PoL block protocol proceeds through the following phases (Salhab et al., 2023, Jia et al., 2021):

Task Announcement: Publisher distributes (possibly via blockchain) a signed description of the training task, data, architecture, and threshold metric.
Commit Phase: Provers perform model training, usually SGD with prescribed hyperparameters, logging checkpoints and associated batch/metadata.
Header/Commitment: Provers publish a hash of the final weights (and optionally claimed accuracy) as a commitment.
Test Release & Model Reveal: After commitment, the test set is released, and provers publish the full model and proof.
Verification & Selection: Verifiers validate the hash, recompute the test metric, and, depending on protocol, selectively replay logged training segments.
Block Finalization: The prover with the best valid metric above threshold and matching commitment wins, and its block/model is recorded; reward logic is triggered.

The verification process can range from recomputation of SGD steps (requiring full logs of batches and checkpoint signatures (Jia et al., 2021, Salhab et al., 2023)) to zero-knowledge proofs or fine-grained watermark chains (Deng et al., 18 May 2025) when privacy or succinctness is paramount.

2. Security Models and Adversarial Threats

Soundness, or "proof-of-work equivalence," is central. Any adversary with knowledge of task details, final weights, and unlimited computational resources, but lacking the honest prover’s secrets (such as SGD randomness, batch order, or cryptographic secrets), should not be able to forge a valid proof without expending at least as much work as the original training. Major attack classes (Jia et al., 2021, Fang et al., 2022, Zhang et al., 2021) include:

Retraining spoof: Adversary independently retrains the model from existing data to match final parameters.
Stochastic spoof: Adversary seeks an alternate transcript matching the claimed model.
Directed/structural spoof: Fabrication of a plausible transcript with only a small number of SGD steps, or by exploiting faults in the verification subset logic.
Distillation spoof: Producing a surrogate model that matches outputs but did not follow the claimed trajectory.
Adversarial example spoof: Gradient-based optimization of trajectories or batches to minimize per-step verification error given the final model (Zhang et al., 2021, Fang et al., 2022).

Theoretical analysis in (Jia et al., 2021) relates PoL transcript entropy to inherent stochasticity in SGD, arguing that honest SGD transcripts carry linearly accumulating entropy—making precise spoofing combinatorially hard unless almost as much computation is expended as honest training. However, practical attacks exploiting "blind spots" in verification parameters, or non-uniqueness of trajectories, can sharply reduce this cost, as demonstrated empirically (Zhang et al., 2021, Fang et al., 2022).

3. Protocol Variants and Advanced Mechanisms

PoL has evolved from basic checkpoint-logging protocols to sophisticated systems incorporating advanced cryptography, watermarks, privacy, incentive engineering, and consensus integration.

3.1. Blockchains: PoDL, PoCL, PoFLSC, PoFL, SEDULity

Proof-of-Deep-Learning (PoDL): Blockchain miners train DL models; consensus is based on model accuracy over a freshly revealed test set, using a two-phase commit-and-reveal protocol to prevent overfitting and plagiarism (Chenli et al., 2019, Salhab et al., 2023).
Proof-of-Collaborative Learning (PoCL): Federated-learning-based protocols with multiple "winner" miners per round, decentralized model voting, federated averaging, and weighted incentive sharing (Sokhankhosh et al., 17 Jul 2024).
Proof-of-Federated-Learning-Subchain (PoFLSC): Subchain orchestration for FL blocks, Shapley-value partner selection, and challenge/audit layers for decentralized data and model integrity (Li et al., 2023).
Proof-of-Federated Learning (PoFL): Pools perform FL on private data, combined with privacy-preserving data trading and HE+2PC based privacy for model and test set (Qu et al., 2019).
SEDULity: Distributed PoL system with committee selection, chunked SGD stages as puzzles, randomization, and incentive-compatible verification through group sampling and stake-slashing (Cao et al., 15 Dec 2025).

3.2. Ownership and Watermarking: PoLO

PoLO: Chained watermarking, where cryptographically-linked watermarks are embedded per training shard; the full chain proves effort, and the terminal watermark attests ownership (Deng et al., 18 May 2025). This architecture ensures privacy and makes forging individual segments extremely costly.

3.3. Cryptographic and Zero-Knowledge Approaches

ZK-STARKs/IOP-based PoL: Training procedures are encoded as polynomial constraints, with Merkle-tree commitments and low-degree proofs. Verifier work is logarithmic in transcript length, and no training data is ever revealed (Ray et al., 13 Jun 2025).
Vanilla PoL: Hash-based trajectory proof recording; verification can be sublinear in training cost, but at the cost of information leakage (full batch sequence revealed) (Jia et al., 2021).
Functional encryption, secure mapping: Block-specific cryptography (e.g., Secure Mapping Layer) to bind models to blockchain history and prevent reuse or optimistic forgeries (Lan et al., 2020).

4. Performance, Security Guarantees, and Empirical Results

4.1. Verification and Overhead

The major practical concern is verification overhead—measured as the proportion of verifier computation relative to prover training.

Early PoL designs required significant recomputation: e.g., for checkpoint interval $k$ , verification is $O(\alpha k/E)$ for $\alpha$ random samples (Zhao et al., 13 Apr 2024), whereas cryptographic PoL can reduce verifier effort to $O(\log N)$ via succinct arguments (Ray et al., 13 Jun 2025).
Watermark-chaining schemes like PoLO achieve $1.5$– $10\%$ of the time required by gradient-trace PoL (Deng et al., 18 May 2025).

4.2. Empirical Security and Performance

Practical deployments verify that, under honest operation, reproduction errors between prover-logged and recomputed checkpoints are well below reference thresholds (Jia et al., 2021). Empirical analyses of attacks demonstrate that, if verification is performed naively, various forms of spoofing succeed at a small fraction (less than one tenth) of the honest training cost (Zhang et al., 2021, Fang et al., 2022). Systems like SEDULity and PoLO demonstrate resistance to such attacks through randomization and cryptographic commitment structuring, with robust empirical detection rates for proof-of-ownership and learning (Cao et al., 15 Dec 2025, Deng et al., 18 May 2025).

The following table summarizes key security mechanisms and their empirical/analyzed resistance:

Scheme	Key Security Features	Empirical/Analytical Guarantee
PoDL/PoLC/PoFLSC/PoFL	Commitment, challenge-response, FL aggregation	Honest-majority chain security, FL challenge-resistance
Vanilla PoL	Checkpoint logging, batch hashing	Vulnerable to adversarial optimization (Fang et al., 2022)
PoLO	Chained watermarks, DP noise, hash chaining	Forge resistance 1.1–4x honest, $>90\%$ detection rate
SEDULity	Stage randomization, group sampling, slashing	Honest-maximizing payoff, high utilization ( $>85\%$ )
ZK Proofs (Ray et al., 13 Jun 2025)	Polynomial IOP/STARK, zero-knowledge	Soundness $\le 2^{-\lambda}$ , $O(\log n)$ verification

5. Limitations, Open Problems, and Future Directions

5.1. Fundamental Barriers

Recent analyses show that optimistic security assumptions in early PoL designs break down due to:

Non-uniqueness of SGD trajectories (many plausible transcripts reach same endpoint) (Fang et al., 2022).
The feasibility of constructing infinitesimal-update or blindfold (partial-real) spoofs when per-step or subset verification is poorly parameterized (Zhang et al., 2021).

Provably robust PoL verification for DNNs reduces to difficult open problems in learning theory—such as characterizing SGD noise distributions, statistical uniqueness of trajectories, and minimal representative verification subsets (Fang et al., 2022). Moreover, purely classical PoL for deep networks likely requires either heavy cryptography or hybrid protocols (Caro et al., 31 Oct 2024).

5.2. Challenges and Research Directions

Tuning Verification Overhead vs. Security: Achieving sublinear verification cost without introducing exploitable blind spots.
Decentralized and Market-based PoL: Ensuring incentive compatibility, preventing collusion, and supporting dynamic task allocation (Zhao et al., 13 Apr 2024, Cao et al., 15 Dec 2025).
Privacy and Data Integrity: Preventing batch/data exposure in proof, supporting MPC/HE-based protocols for batch privacy (Deng et al., 18 May 2025, Qu et al., 2019).
Quantum and Advanced Proof-of-Learning: Exploiting quantum communication, streaming proofs, and interactive certificates for broader classes of learning problems (Caro et al., 31 Oct 2024).
Integration with Ownership: Combining PoL and proof-of-ownership via watermark chains without sacrificing verification efficiency (Deng et al., 18 May 2025).
Scalability: Addressing block-wide or federated-learning “subchain” approaches to scale PoL to hundreds or thousands of agents (Li et al., 2023, Sokhankhosh et al., 17 Jul 2024).

5.3. Open Problems

Characterizing the SGD-induced distribution over training checkpoints and deriving optimal verification tolerance parameters.
Designing succinct, privacy-preserving PoL mechanisms for large-scale or non-convex deep networks.
Developing practical and provably secure hybrid PoL-blockchain systems for real-world AI computation markets.

6. Applications and Ecosystem

PoL has been deployed (or proposed for):

Blockchain Consensus: Converting mining energy into useful model training, supporting both centralized tasks (PoDL) and decentralized federated learning (PoCL, PoFL, SEDULity).
Model Ownership Resolution: Certifying that a particular agent expended the requisite effort to train a published model (Jia et al., 2021, Deng et al., 18 May 2025).
Byzantine-resilient Distributed Training: Verifying worker updates (proof transcripts) in federated or parameter-server regimes (Jia et al., 2021, Sokhankhosh et al., 17 Jul 2024).
Outsourced Learning-as-a-Service: Verifying correctness and privacy of third-party MLaaS training (Ray et al., 13 Jun 2025, Deng et al., 18 May 2025).
AI Model Marketplaces: Ensuring proof-of-effort and attribution for machine learning tasks exchangable as on-chain economic assets (Salhab et al., 2023).

In all settings, the central thrust is the alignment of computational incentive (energy or reward) with genuinely productive machine learning work, with the additional guarantees of verifiability, attribution, and—where cryptographically fortified—privacy and security against premature model disclosure, data poisoning, or ownership forgery.