Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 63 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 32 tok/s Pro
GPT-5 High 29 tok/s Pro
GPT-4o 88 tok/s Pro
Kimi K2 152 tok/s Pro
GPT OSS 120B 325 tok/s Pro
Claude Sonnet 4.5 32 tok/s Pro
2000 character limit reached

Auto-Slides: Automated Academic Slide Generation

Updated 18 September 2025
  • Auto-Slides is an automated system that transforms academic papers into structured, multimodal slide decks, integrating cognitive principles for enhanced learning.
  • It employs a multi-agent pipeline—comprising Parser, Planner, Verification, Adjustment, Generator, and Editor Agents—to ensure content fidelity and a coherent pedagogical flow.
  • The system facilitates interactive user refinement through human-in-the-loop controls while leveraging cognitive theories such as PMRC and multimedia learning principles.

Auto-Slides refers to automated systems for converting research papers into pedagogically structured, multimodal presentation slides. These systems aim to facilitate comprehension and engagement by integrating cognitive principles with interactive customization. Auto-Slides generates slide decks comprising concise text, diagrams, tables, and other academic elements, providing systematic organization beyond conventional LLM-powered dialogue interfaces. The process emphasizes interactive refinement and verification to optimize the educational value and factual accuracy of generated presentations.

1. Multi-Agent System Architecture

Auto-Slides utilizes a multi-agent framework that orchestrates the transformation of academic papers into slide decks via several specialized agents:

  • Parser Agent: Extracts text, figures, tables, and mathematical expressions from the PDF, producing a structured Markdown representation capturing layout and key elements.
  • Planner Agent: Reorganizes parsed content into a pedagogically informed “blueprint,” converting the IMRaD (Introduction, Methods, Results, Discussion) structure to PMRC (Problem–Motivation–Results–Conclusion). Outputs a JSON plan encoding slide-wise texts, images (with captions), tables, and speaker notes.
  • Verification Agent: Semantically compares the blueprint against the original manuscript to detect omissions or misstatements.
  • Adjustment Agent: Repairs identified deficiencies by retrieving and integrating missing details as concise bullet points.
  • Generator Agent: Converts the validated slide plan into LaTeX Beamer code, synthesizing formatted slides with faithful inclusion of images, tables, and equations.
  • Editor Agent: Enables human-in-the-loop iterative refinements through natural language commands interpreted via a ReAct-style loop, supporting operations such as locate, search, modify, insert, and delete.

This agent-based pipeline ensures systematic extraction, transformation, and synthesis while preserving document structure and domain-specific content fidelity.

2. Pedagogical Structuring and Cognitive Principles

Auto-Slides is informed by established cognitive theories to optimize presentations for learning:

  • PMRC Narrative Framework: Generates slide decks aligned with the Problem–Motivation–Results–Conclusion sequence, which is suited to academic discourse and facilitates conceptual progression.
  • Cognitive Load Theory: Limits each slide to one key message, minimizing extraneous detail and lowering cognitive burden.
  • Multimedia Learning Principles: Employs Mayer’s guidelines for dual coding, integrating textual bullet points with relevant visual aids (figures, equations, tables) in a spatially coherent layout.
  • Incremental Complexity: Orders slides to scaffold understanding, building from basic principles toward more advanced results and implications.

This structure enhances clarity, retention, and systematic learning over text-based interaction.

3. Interactive Slide Editor and Human Customization

Auto-Slides provides an interactive editor via the Editor Agent, supporting natural language dialogue for iterative refinement:

  • Locate: Finds slides or LaTeX code segments (e.g., \frame blocks) for targeted edits, referencing contextual markers from the blueprint.
  • Search: Retrieves supplemental material from references using API queries (arXiv, Semantic Scholar) in response to user requests for background or clarification.
  • Modify/Insert/Delete: Applies user commands to rewrite, add, or remove content, updating the LaTeX source through LLM-generated suggestions.
  • ReAct Loop: Decomposes user requests into a sequence of actions, executing modifications until the presentation aligns with specified learning goals.

This facility supports epistemic agency by letting learners control scope, depth, and focus while maintaining technical rigor.

4. Verification and Knowledge Retrieval Mechanisms

Quality assurance is achieved through two coordinated components:

  • Verification Agent: Implements loose semantic matching to compare the generated blueprint against the parsed manuscript, ensuring that key methodological details, quantitative results, and conclusions are accurately captured.
  • Adjustment Agent: Where discrepancies or omissions are detected, this agent pinpoints the missing information and integrates it, maintaining high factual fidelity.
  • Knowledge Retrieval: On contextual demand, keywords from the slide are used for external queries to augment material (e.g., pulling related background from literature databases).

This two-stage process minimizes hallucination, context truncation, and assures slide completeness and technical validity.

5. User Study Findings and Comparative Evaluation

Auto-Slides was validated through user studies and expert assessments:

  • Learning Enhancement: Undergraduate users rated Auto-Slides significantly above-neutral in domains such as learning gain and perceived control/agency, citing improved focus and clarity.
  • Comparative Organization: Structured slide interfaces were preferred over conventional LLM dialogs for providing rapid overviews and clear visual summaries. Engagement was comparable, but users favored a hybrid approach for deeper exploration.
  • Expert Metrics: Slides built using PMRC structuring outperformed naive LLM-generated decks in Content Accuracy and Narrative Flow; Information Density was not adversely affected.
  • Multimodal Parsing: Automated evaluations demonstrated improved fidelity for complex tables and formulas, with verification–adjustment loops enhancing correctness.
  • Sample JSON and LaTeX: Slides are coded in structured JSON and compiled in Beamer:
    1
    2
    3
    4
    5
    6
    7
    
    \begin{frame}{Key Concept}
      \begin{itemize}
        \item Bullet point summary
        \item \begin{equation} E = mc^2 \end{equation}
      \end{itemize}
      \includegraphics[width=\textwidth]{fig1.jpeg}
    \end{frame}

This evidence supports the system’s pedagogical and organizational advantages.

6. Technical Characteristics

The system employs:

  • Structured JSON Plans: Each slide is characterized by fields containing text, figures, notes, tables; e.g.,
    1
    2
    3
    4
    5
    6
    
    {
      "slides": [
        { "slide_number": 1, "text": "Introduction to the problem", "figure": "fig1", "notes": "...", "tables": [] }
        // ...
      ]
    }
  • Agent-Oriented Editing: The Editor Agent disambiguates requests and orchestrates modification operations through prompting LLMs and updating LaTeX accordingly.
  • Verification Algorithms: Semantic matching leverages LLMs to compare content inclusivity between blueprint and source, ensuring preservation of central claims, numerical results, and methodologies.

Incremental design rules—“one key message per slide,” spatial integration of visual elements—are strictly enforced by the planning agent in accordance with cognitive theories.

7. Practical Implications and Significance

Auto-Slides offers a robust solution for academic presentation generation:

  • Enables rapid synthesis of pedagogically optimized slide decks from technical manuscripts.
  • Supports iterative, user-driven customization accommodating varied expertise and learning goals.
  • Maintains accuracy and completeness via multi-agent verification and retrieval processes.
  • Demonstrates enhanced comprehension and clarity in empirical validation compared to conventional interfaces.

A plausible implication is the system's utility for scaling educational content creation, supporting customized learning, and promoting systematic understanding of complex research.

Summary Table: Key Pipeline Agents and Their Roles

Agent Function Output
Parser Structured extraction (text, media) Markdown + layouts
Planner Pedagogical restructuring PMRC-ordered JSON blueprint
Verification Content fidelity check Correction triggers
Adjustment Error/omission integration Revised blueprint
Generator Slide compilation LaTeX Beamer slides
Editor Interactive human refinement Updated slide deck

Auto-Slides makes substantial contributions by architecting a multi-agent system rooted in cognitive science, establishing verification and retrieval procedures, and integrating interactive customization to transform academic manuscripts into layered, multimodal educational presentations (Yang et al., 14 Sep 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Auto-Slides.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube