SlideChain: Provenance & Image Synthesis
- SlideChain is a dual-purpose framework combining blockchain-based semantic provenance for lecture slides with slider-based image synthesis for controlled attribute manipulation.
- It employs multimodal extraction, deterministic normalization, and on-chain cryptographic anchoring to guarantee reproducibility and tamper-evidence.
- The CompSlider variant enables disentangled multi-attribute control in text-to-image diffusion models, achieving precise, scalable image generation.
SlideChain refers to two distinct technical systems in recent academic literature: (i) a blockchain-backed semantic provenance framework designed for verifiable multimodal extraction from lecture slides in educational contexts, and (ii) a compositional slider-based image generation architecture (“CompSlider”) enabling disentangled multi-attribute control in text-to-image diffusion models. Both share a core commitment to rigorous provenance and precise control, but their domains, mechanisms, and impact are orthogonal.
1. Blockchain-Backed Semantic Provenance for Educational Slides
SlideChain (Manik et al., 25 Dec 2025) is engineered to ensure the auditability, reproducibility, and integrity of semantic outputs produced by state-of-the-art vision–LLMs (VLMs) on lecture slides. Its architecture integrates multimodal extraction, cryptographic anchoring, and blockchain registration to address systemic semantic variability and provenance deficiencies in AI-driven instructional pipelines. The principal motivators are:
- Semantic inconsistency: Models (InternVL3, Qwen-VL, LLaVA) yield discordant lists of concepts and relational triples under different inference configurations.
- Lack of provenance: There is no immutable link between a semantic output and the exact model, prompt, or environment used.
- Reproducibility breakdown: Pipeline updates or stochastic variance can silently alter outputs, with conventional version control offering no cryptographic guarantees.
SlideChain extracts concepts and triples from four competitive VLMs for each slide in a curated medical imaging dataset (1,117 slides from 23 lectures), normalizes outputs into unified JSON schemas, hashes each record using Keccak-256, and registers only these cryptographic commitments on a local EVM-compatible blockchain. On-chain constants enable tamper-evidence, immutable timestamps, and third-party verifiability.
2. Multimodal Data Pipeline and Semantic Extraction
The SlideChain data pipeline operates in four principal stages:
- Slide Ingestion: High-resolution slide images and transcript snippets form multimodal prompts.
- Multi-Model Extraction: Each slide is processed by four VLMs. Extracted outputs include concept lists, relational triples, and textual evidence.
- Provenance Construction: Deterministic normalization (null-safe parsing, whitespace regularization, deduplication, lowercasing) yields canonicalized JSON records:
- Hashing & On-Chain Registration: Each slide JSON is serialized with sorted keys and hashed:
The contract function stores entries indexed by , guaranteeing constant gas usage per slide and strong tamper resistance.
Semantic disagreement per slide is quantified by: - -
Similarity is measured using the Jaccard index:
3. Cryptographic Design and Blockchain Performance
SlideChain employs a lightweight Solidity smart contract on the Hardhat Ethereum simulator for local registration. Only the compact hash is stored on-chain; the full JSON remains off-chain for inspection. Key characteristics:
- Gas efficiency: gas/slide at gwei yields $\overline{c} \approx \$0.92G(N)=N\cdot\overline{g}<\$103T\approx1.00J=1.0D_{\mathrm{concept}}D_{\mathrm{triple}}\overline{|C|}\overline{|T|}<0.05D_{\mathrm{concept}}\approx14\approx6$.
- Single vs. multi-model coverage: InternVL3 misses 62% of concepts and 40% of triples present in the union of all models’ outputs, supporting the ensemble extraction design.
5. Comparison of Provenance Mechanisms
SlideChain provides properties previously unavailable in existing provenance systems. The following table summarizes their characteristics:
| Property | Git/DVC | Centralized Log | SlideChain |
|---|---|---|---|
| Third-party verifiability | No | No | Yes |
| Tamper resistance | Limited | Limited | Strong (crypto.) |
| Immutable timestamps | No | No | Yes |
| Silent overwrite detection | Partial | Partial | Guaranteed |
| Trust assumptions | Repo owner | Service provider | Decentralized |
| Long-term auditability | Limited | Limited | Strong |
These attributes make SlideChain a suitable foundation for educational and scientific applications requiring transparent, verifiable semantic records.
6. CompSlider Architecture for Disentangled Image Generation ("SlideChain" in T2I)
A second usage of SlideChain appears in the context of compositional slider-based attribute control for text-to-image generation, as implemented in CompSlider (Zhu et al., 31 Aug 2025). Here, SlideChain denotes a compact Diffusion Transformer (DiT) module, which generates conditional priors governing fine-grained, independent manipulation of multiple image attributes (e.g., age, smile) without retraining the foundation model. The pipeline consists of:
- Architecture: SlideChain sits upstream of the latent diffusion foundation model.
- Inputs: normalised vector of continuous slider values and text prompt embedding .
- Process: DiT block generates conditional prior for the denoiser.
- Conditional-prior training: Uses standard diffusion loss, novel disentanglement and structure losses (Eqs. above), MLP classifier, and random slider pairs for attribute independence.
- Key hyperparameters: DiT: 10 layers, 16 slider tokens (N=16), ; buckets for , threshold. Training runs for h on 8A100 GPUs.
- Empirical results: Superior continuity (81.1%), consistency (91.0%), scope (59.0%), and minimal entanglement (14.0%) versus ConceptSlider and PromptSlider. LPIPS , CLIP (best).
- Operational significance: SlideChain enables truly disentangled, scalable generation of multi-attribute images and videos from arbitrary slider configurations in a single inference pass.
7. Impact, Limitations, and Future Directions
SlideChain (Manik et al., 25 Dec 2025, Zhu et al., 31 Aug 2025) advances the state of provenance and control across two separate AI domains. In educational pipelines, blockchain anchoring supplies deterministic, tamper-evident semantic records enabling long-term auditability and drift detection. In generative modeling, SlideChain (via CompSlider) achieves rapid, precise conditional prior generation for robust attribute disentanglement and smooth interpolation.
Identified limitations include:
- No guarantee of semantic correctness or consensus is offered; provenance records capture disagreement for subsequent human adjudication.
- Deterministic reproducibility demands fixed inference settings and reliable off-chain storage.
- Blockchain deployments in production face latency, cost, and upgrade constraints; hybrid L2/L1 strategies and integration with decentralized storage (IPFS) are recommended.
- The T2I SlideChain does not address foundational model biases; rather, it mitigates attribute entanglement via loss engineering.
- A plausible implication is that ensemble extraction and cryptographic anchoring may become standard protocol for any AI-driven semantic pipeline subject to update and audit requirements. Future work may explore Merkle-tree aggregation, cross-model consensus algorithms, and user dashboards surfacing uncertainty and drift to stakeholders.
SlideChain provides a principled, scalable paradigm for both semantic provenance in STEM education and disentangled, multi-attribute image synthesis, supporting the development and maintenance of trustworthy, reproducible, and auditable AI-driven systems (Manik et al., 25 Dec 2025, Zhu et al., 31 Aug 2025).