Auto-Slides: Automated Academic Slide Generation
- Auto-Slides is an automated system that transforms academic papers into structured, multimodal slide decks, integrating cognitive principles for enhanced learning.
- It employs a multi-agent pipeline—comprising Parser, Planner, Verification, Adjustment, Generator, and Editor Agents—to ensure content fidelity and a coherent pedagogical flow.
- The system facilitates interactive user refinement through human-in-the-loop controls while leveraging cognitive theories such as PMRC and multimedia learning principles.
Auto-Slides refers to automated systems for converting research papers into pedagogically structured, multimodal presentation slides. These systems aim to facilitate comprehension and engagement by integrating cognitive principles with interactive customization. Auto-Slides generates slide decks comprising concise text, diagrams, tables, and other academic elements, providing systematic organization beyond conventional LLM-powered dialogue interfaces. The process emphasizes interactive refinement and verification to optimize the educational value and factual accuracy of generated presentations.
1. Multi-Agent System Architecture
Auto-Slides utilizes a multi-agent framework that orchestrates the transformation of academic papers into slide decks via several specialized agents:
- Parser Agent: Extracts text, figures, tables, and mathematical expressions from the PDF, producing a structured Markdown representation capturing layout and key elements.
- Planner Agent: Reorganizes parsed content into a pedagogically informed “blueprint,” converting the IMRaD (Introduction, Methods, Results, Discussion) structure to PMRC (Problem–Motivation–Results–Conclusion). Outputs a JSON plan encoding slide-wise texts, images (with captions), tables, and speaker notes.
- Verification Agent: Semantically compares the blueprint against the original manuscript to detect omissions or misstatements.
- Adjustment Agent: Repairs identified deficiencies by retrieving and integrating missing details as concise bullet points.
- Generator Agent: Converts the validated slide plan into LaTeX Beamer code, synthesizing formatted slides with faithful inclusion of images, tables, and equations.
- Editor Agent: Enables human-in-the-loop iterative refinements through natural language commands interpreted via a ReAct-style loop, supporting operations such as locate, search, modify, insert, and delete.
This agent-based pipeline ensures systematic extraction, transformation, and synthesis while preserving document structure and domain-specific content fidelity.
2. Pedagogical Structuring and Cognitive Principles
Auto-Slides is informed by established cognitive theories to optimize presentations for learning:
- PMRC Narrative Framework: Generates slide decks aligned with the Problem–Motivation–Results–Conclusion sequence, which is suited to academic discourse and facilitates conceptual progression.
- Cognitive Load Theory: Limits each slide to one key message, minimizing extraneous detail and lowering cognitive burden.
- Multimedia Learning Principles: Employs Mayer’s guidelines for dual coding, integrating textual bullet points with relevant visual aids (figures, equations, tables) in a spatially coherent layout.
- Incremental Complexity: Orders slides to scaffold understanding, building from basic principles toward more advanced results and implications.
This structure enhances clarity, retention, and systematic learning over text-based interaction.
3. Interactive Slide Editor and Human Customization
Auto-Slides provides an interactive editor via the Editor Agent, supporting natural language dialogue for iterative refinement:
- Locate: Finds slides or LaTeX code segments (e.g., \frame blocks) for targeted edits, referencing contextual markers from the blueprint.
- Search: Retrieves supplemental material from references using API queries (arXiv, Semantic Scholar) in response to user requests for background or clarification.
- Modify/Insert/Delete: Applies user commands to rewrite, add, or remove content, updating the LaTeX source through LLM-generated suggestions.
- ReAct Loop: Decomposes user requests into a sequence of actions, executing modifications until the presentation aligns with specified learning goals.
This facility supports epistemic agency by letting learners control scope, depth, and focus while maintaining technical rigor.
4. Verification and Knowledge Retrieval Mechanisms
Quality assurance is achieved through two coordinated components:
- Verification Agent: Implements loose semantic matching to compare the generated blueprint against the parsed manuscript, ensuring that key methodological details, quantitative results, and conclusions are accurately captured.
- Adjustment Agent: Where discrepancies or omissions are detected, this agent pinpoints the missing information and integrates it, maintaining high factual fidelity.
- Knowledge Retrieval: On contextual demand, keywords from the slide are used for external queries to augment material (e.g., pulling related background from literature databases).
This two-stage process minimizes hallucination, context truncation, and assures slide completeness and technical validity.
5. User Study Findings and Comparative Evaluation
Auto-Slides was validated through user studies and expert assessments:
- Learning Enhancement: Undergraduate users rated Auto-Slides significantly above-neutral in domains such as learning gain and perceived control/agency, citing improved focus and clarity.
- Comparative Organization: Structured slide interfaces were preferred over conventional LLM dialogs for providing rapid overviews and clear visual summaries. Engagement was comparable, but users favored a hybrid approach for deeper exploration.
- Expert Metrics: Slides built using PMRC structuring outperformed naive LLM-generated decks in Content Accuracy and Narrative Flow; Information Density was not adversely affected.
- Multimodal Parsing: Automated evaluations demonstrated improved fidelity for complex tables and formulas, with verification–adjustment loops enhancing correctness.
- Sample JSON and LaTeX: Slides are coded in structured JSON and compiled in Beamer:
1 2 3 4 5 6 7
\begin{frame}{Key Concept} \begin{itemize} \item Bullet point summary \item \begin{equation} E = mc^2 \end{equation} \end{itemize} \includegraphics[width=\textwidth]{fig1.jpeg} \end{frame}
This evidence supports the system’s pedagogical and organizational advantages.
6. Technical Characteristics
The system employs:
- Structured JSON Plans: Each slide is characterized by fields containing text, figures, notes, tables; e.g.,
1 2 3 4 5 6
{ "slides": [ { "slide_number": 1, "text": "Introduction to the problem", "figure": "fig1", "notes": "...", "tables": [] } // ... ] }
- Agent-Oriented Editing: The Editor Agent disambiguates requests and orchestrates modification operations through prompting LLMs and updating LaTeX accordingly.
- Verification Algorithms: Semantic matching leverages LLMs to compare content inclusivity between blueprint and source, ensuring preservation of central claims, numerical results, and methodologies.
Incremental design rules—“one key message per slide,” spatial integration of visual elements—are strictly enforced by the planning agent in accordance with cognitive theories.
7. Practical Implications and Significance
Auto-Slides offers a robust solution for academic presentation generation:
- Enables rapid synthesis of pedagogically optimized slide decks from technical manuscripts.
- Supports iterative, user-driven customization accommodating varied expertise and learning goals.
- Maintains accuracy and completeness via multi-agent verification and retrieval processes.
- Demonstrates enhanced comprehension and clarity in empirical validation compared to conventional interfaces.
A plausible implication is the system's utility for scaling educational content creation, supporting customized learning, and promoting systematic understanding of complex research.
Summary Table: Key Pipeline Agents and Their Roles
Agent | Function | Output |
---|---|---|
Parser | Structured extraction (text, media) | Markdown + layouts |
Planner | Pedagogical restructuring | PMRC-ordered JSON blueprint |
Verification | Content fidelity check | Correction triggers |
Adjustment | Error/omission integration | Revised blueprint |
Generator | Slide compilation | LaTeX Beamer slides |
Editor | Interactive human refinement | Updated slide deck |
Auto-Slides makes substantial contributions by architecting a multi-agent system rooted in cognitive science, establishing verification and retrieval procedures, and integrating interactive customization to transform academic manuscripts into layered, multimodal educational presentations (Yang et al., 14 Sep 2025).