SlideChain: Provenance & Image Synthesis

Updated 31 December 2025

SlideChain is a dual-purpose framework combining blockchain-based semantic provenance for lecture slides with slider-based image synthesis for controlled attribute manipulation.
It employs multimodal extraction, deterministic normalization, and on-chain cryptographic anchoring to guarantee reproducibility and tamper-evidence.
The CompSlider variant enables disentangled multi-attribute control in text-to-image diffusion models, achieving precise, scalable image generation.

SlideChain refers to two distinct technical systems in recent academic literature: (i) a blockchain-backed semantic provenance framework designed for verifiable multimodal extraction from lecture slides in educational contexts, and (ii) a compositional slider-based image generation architecture (“CompSlider”) enabling disentangled multi-attribute control in text-to-image diffusion models. Both share a core commitment to rigorous provenance and precise control, but their domains, mechanisms, and impact are orthogonal.

1. Blockchain-Backed Semantic Provenance for Educational Slides

SlideChain (Manik et al., 25 Dec 2025) is engineered to ensure the auditability, reproducibility, and integrity of semantic outputs produced by state-of-the-art vision–LLMs (VLMs) on lecture slides. Its architecture integrates multimodal extraction, cryptographic anchoring, and blockchain registration to address systemic semantic variability and provenance deficiencies in AI-driven instructional pipelines. The principal motivators are:

Semantic inconsistency: Models (InternVL3, Qwen-VL, LLaVA) yield discordant lists of concepts and relational triples under different inference configurations.
Lack of provenance: There is no immutable link between a semantic output and the exact model, prompt, or environment used.
Reproducibility breakdown: Pipeline updates or stochastic variance can silently alter outputs, with conventional version control offering no cryptographic guarantees.

SlideChain extracts concepts and $(s,p,o)$ triples from four competitive VLMs for each slide in a curated medical imaging dataset (1,117 slides from 23 lectures), normalizes outputs into unified JSON schemas, hashes each record using Keccak-256, and registers only these cryptographic commitments on a local EVM-compatible blockchain. On-chain constants enable tamper-evidence, immutable timestamps, and third-party verifiability.

2. Multimodal Data Pipeline and Semantic Extraction

The SlideChain data pipeline operates in four principal stages:

Slide Ingestion: High-resolution slide images and transcript snippets form multimodal prompts.
Multi-Model Extraction: Each slide is processed by four VLMs. Extracted outputs include concept lists, relational triples, and textual evidence.
Provenance Construction: Deterministic normalization (null-safe parsing, whitespace regularization, deduplication, lowercasing) yields canonicalized JSON records:

$\text{provenance}(s) = \bigl\{\,\text{models}:\{M_i\mapsto\{C_i,T_i\}\},\,\text{paths},\,\text{metadata}\bigr\}$

Hashing & On-Chain Registration: Each slide JSON is serialized with sorted keys and hashed:

$h_{s} = \mathrm{Keccak256}\left(\mathrm{serialize}(\text{provenance}(s))\right).$

The contract function $\mathtt{registerSlide}(L,s,h_s,\text{uri})$ stores entries indexed by $\mathrm{keccak256}(L,s)$ , guaranteeing constant gas usage per slide and strong tamper resistance.

Semantic disagreement per slide is quantified by: - $D_{\mathrm{concept}}(s)=|\cup_i C_i(s)|$ - $D_{\mathrm{triple}}(s)=|\cup_i T_i(s)|$

Similarity is measured using the Jaccard index:

$J(S_a,S_b)=\frac{|S_a\cap S_b|}{|S_a\cup S_b|}\in[0,1]$

3. Cryptographic Design and Blockchain Performance

SlideChain employs a lightweight Solidity smart contract on the Hardhat Ethereum simulator for local registration. Only the compact hash $H_s$ is stored on-chain; the full JSON remains off-chain for inspection. Key characteristics:

Gas efficiency: $\overline{g}=231,430$ gas/slide at $p_{\rm gas}=30$ gwei yields $\overline{c} \approx \$0.92 $/slide.</li> <li>Linear scalability:$ G(N)=N\cdot\overline{g} $; million-slide deployments projected at$ <\$10^{3 $ total cost at L2 rates.</li> <li>Throughput:$ T\approx1.00 $slide/s under zero-delay local conditions.</li> <li>Tamper-detection: Any modification to off-chain JSON provokes a hash mismatch (100% detection on synthetic perturbations).</li> <li>Reproducibility: Independent runs under identical conditions produce perfect hash agreement ($ J=1.0 $over all pairs).</li> </ul> The contract storage layout uses a mapping from composite keys to SlideRecord structs, and enforces record uniqueness. Decentralized trust assumptions (in contrast to virtual logs or versioned storage) guarantee verifiability and silent overwrite detection. <h2 class='paper-heading' id='empirical-findings-semantic-variability-and-provenance-value'>4. Empirical Findings: Semantic Variability and Provenance Value</h2> Comprehensive analysis of the SlideChain Slides dataset (<a href="/papers/2512.21684" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">Manik et al., 25 Dec 2025</a>) reveals: <ul> <li>Concept/Triple disagreement: Across 4,468 runs (1,117 slides × 4 VLMs),$ D_{\mathrm{concept}} $frequently exceeds 8–12 unique terms per slide;$ D_{\mathrm{triple}} $reflects nontrivial variability.</li> <li>Per-model output density:</li> </ul> | Model | Avg. concepts ($ \overline{|C|} $) | Avg. triples ($ \overline{|T|} $) | |----------------------|-------------------------------|----------------------------| | InternVL3-14B | 12.3 | 3.1 | | Qwen2-VL-7B | 11.7 | 2.8 | | Qwen3-VL-4B | 7.9 | 1.6 | | <a href="https://www.emergentmind.com/topics/llava-onevision" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">LLaVA-OneVision</a> | 7.8 | 1.1 | <ul> <li>Cross-model similarity: Jaccard indices for concepts average 0.26 (SD 0.10); for triples$ <0.05 $—indicating low overlap even among best-in-class VLMs.</li> <li>Lecture variability: Visually dense lectures (e.g., CT reconstruction, Fourier analysis) show elevated$ D_{\mathrm{concept}} $($ \approx14 $); text-centric lectures remain below$ \approx6$.}
Single vs. multi-model coverage: InternVL3 misses 62% of concepts and 40% of triples present in the union of all models’ outputs, supporting the ensemble extraction design.

5. Comparison of Provenance Mechanisms

SlideChain provides properties previously unavailable in existing provenance systems. The following table summarizes their characteristics:

Property	Git/DVC	Centralized Log	SlideChain
Third-party verifiability	No	No	Yes
Tamper resistance	Limited	Limited	Strong (crypto.)
Immutable timestamps	No	No	Yes
Silent overwrite detection	Partial	Partial	Guaranteed
Trust assumptions	Repo owner	Service provider	Decentralized
Long-term auditability	Limited	Limited	Strong

These attributes make SlideChain a suitable foundation for educational and scientific applications requiring transparent, verifiable semantic records.

6. CompSlider Architecture for Disentangled Image Generation ("SlideChain" in T2I)

A second usage of SlideChain appears in the context of compositional slider-based attribute control for text-to-image generation, as implemented in CompSlider (Zhu et al., 31 Aug 2025). Here, SlideChain denotes a compact Diffusion Transformer (DiT) module, which generates conditional priors governing fine-grained, independent manipulation of multiple image attributes (e.g., age, smile) without retraining the foundation model. The pipeline consists of:

Architecture: SlideChain sits upstream of the latent diffusion foundation model.
- Inputs: normalised vector of $N$ continuous slider values $v^{\mathcal S}\in[0,1]^N$ and text prompt embedding $c^{\mathcal T}$ .
- Process: DiT block generates conditional prior $c^{\mathcal I}$ for the denoiser.
Conditional-prior training: Uses standard diffusion loss, novel disentanglement and structure losses (Eqs. above), MLP classifier, and random slider pairs for attribute independence.
Key hyperparameters: DiT: 10 layers, 16 slider tokens (N=16), $d=1024$ ; $B=20$ buckets for $\Delta v$ , $\tau=0.1$ threshold. Training runs for $\approx16$ h on 8 $\times$ A100 GPUs.
Empirical results: Superior continuity (81.1%), consistency (91.0%), scope (59.0%), and minimal entanglement (14.0%) versus ConceptSlider and PromptSlider. LPIPS $\approx0.12$ , CLIP $\approx6.20$ (best).
Operational significance: SlideChain enables truly disentangled, scalable generation of multi-attribute images and videos from arbitrary slider configurations in a single inference pass.

7. Impact, Limitations, and Future Directions

SlideChain (Manik et al., 25 Dec 2025, Zhu et al., 31 Aug 2025) advances the state of provenance and control across two separate AI domains. In educational pipelines, blockchain anchoring supplies deterministic, tamper-evident semantic records enabling long-term auditability and drift detection. In generative modeling, SlideChain (via CompSlider) achieves rapid, precise conditional prior generation for robust attribute disentanglement and smooth interpolation.

Identified limitations include:

No guarantee of semantic correctness or consensus is offered; provenance records capture disagreement for subsequent human adjudication.
Deterministic reproducibility demands fixed inference settings and reliable off-chain storage.
Blockchain deployments in production face latency, cost, and upgrade constraints; hybrid L2/L1 strategies and integration with decentralized storage (IPFS) are recommended.
The T2I SlideChain does not address foundational model biases; rather, it mitigates attribute entanglement via loss engineering.
A plausible implication is that ensemble extraction and cryptographic anchoring may become standard protocol for any AI-driven semantic pipeline subject to update and audit requirements. Future work may explore Merkle-tree aggregation, cross-model consensus algorithms, and user dashboards surfacing uncertainty and drift to stakeholders.

SlideChain provides a principled, scalable paradigm for both semantic provenance in STEM education and disentangled, multi-attribute image synthesis, supporting the development and maintenance of trustworthy, reproducible, and auditable AI-driven systems (Manik et al., 25 Dec 2025, Zhu et al., 31 Aug 2025).

Markdown Report Issue Upgrade to Chat

References (2)

SlideChain: Semantic Provenance for Lecture Understanding via Blockchain Registration (2025)

CompSlider: Compositional Slider for Disentangled Multiple-Attribute Image Generation (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to SlideChain.

SlideChain: Provenance & Image Synthesis

1. Blockchain-Backed Semantic Provenance for Educational Slides

2. Multimodal Data Pipeline and Semantic Extraction

3. Cryptographic Design and Blockchain Performance

5. Comparison of Provenance Mechanisms

6. CompSlider Architecture for Disentangled Image Generation ("SlideChain" in T2I)

7. Impact, Limitations, and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

SlideChain: Provenance & Image Synthesis

1. Blockchain-Backed Semantic Provenance for Educational Slides

2. Multimodal Data Pipeline and Semantic Extraction

3. Cryptographic Design and Blockchain Performance

5. Comparison of Provenance Mechanisms

6. CompSlider Architecture for Disentangled Image Generation ("SlideChain" in T2I)

7. Impact, Limitations, and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research