Papers
Topics
Authors
Recent
Search
2000 character limit reached

GraphiMind: LLM-centric Interface for Information Graphics Design

Published 24 Jan 2024 in cs.HC | (2401.13245v1)

Abstract: Information graphics are pivotal in effective information dissemination and storytelling. However, creating such graphics is extremely challenging for non-professionals, since the design process requires multifaceted skills and comprehensive knowledge. Thus, despite the many available authoring tools, a significant gap remains in enabling non-experts to produce compelling information graphics seamlessly, especially from scratch. Recent breakthroughs show that LLMs, especially when tool-augmented, can autonomously engage with external tools, making them promising candidates for enabling innovative graphic design applications. In this work, we propose a LLM-centric interface with the agent GraphiMind for automatic generation, recommendation, and composition of information graphics design resources, based on user intent expressed through natural language. Our GraphiMind integrates a Textual Conversational Interface, powered by tool-augmented LLM, with a traditional Graphical Manipulation Interface, streamlining the entire design process from raw resource curation to composition and refinement. Extensive evaluations highlight our tool's proficiency in simplifying the design process, opening avenues for its use by non-professional users. Moreover, we spotlight the potential of LLMs in reshaping the domain of information graphics design, offering a blend of automation, versatility, and user-centric interactivity.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. Adobe Inc. Adobe illustrator, 2023.
  2. Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691, 2022.
  3. Authoring data-driven videos with dataclips. IEEE transactions on visualization and computer graphics, 23(1):501–510, 2016.
  4. Useful junk? the effects of visual embellishment on comprehension and memorability of charts. In Proceedings of the SIGCHI conference on human factors in computing systems, pages 2573–2582, 2010.
  5. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
  6. Instructpix2pix: Learning to follow image editing instructions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18392–18402, 2023.
  7. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  8. Towards automated infographic design: Deep learning-based auto-extraction of extensible timeline. IEEE transactions on visualization and computer graphics, 26(1):917–926, 2019.
  9. Quantifying the creativity support of digital tools through the creativity support index. ACM Transactions on Computer-Human Interaction (TOCHI), 21(4):1–25, 2014.
  10. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
  11. Infomages: Embedding data into thematic images. In Computer Graphics Forum, pages 593–606. Wiley Online Library, 2020.
  12. Text-to-viz: Automatic generation of infographics from proportion-related natural language statements. IEEE transactions on visualization and computer graphics, 26(1):906–916, 2019.
  13. A mixed-initiative approach to reusing infographic charts. IEEE Transactions on Visualization and Computer Graphics, 28(1):173–183, 2021.
  14. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  15. Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378, 2023.
  16. A case study using visualization interaction logs and insight metrics to understand how analysts arrive at insights. IEEE transactions on visualization and computer graphics, 22(1):51–60, 2015.
  17. Visual programming: Compositional visual reasoning without training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14953–14962, 2023.
  18. Isotype visualization: Working memory, performance, and engagement with pictographs. In Proceedings of the 33rd annual ACM conference on human factors in computing systems, pages 1191–1200. ACM, 2015.
  19. Infographic aesthetics: Designing for the first impression. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pages 1187–1190. ACM, 2015.
  20. Metagpt: Meta programming for multi-agent collaborative framework. arXiv preprint arXiv:2308.00352, 2023.
  21. Tool documentation enables zero-shot tool-usage with large language models. arXiv preprint arXiv:2308.00675, 2023.
  22. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In International Conference on Machine Learning, pages 9118–9147. PMLR, 2022.
  23. Draco: bringing life to illustrations with kinetic textures. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 351–360, 2014.
  24. Data-driven guides: Supporting expressive design for information graphics. IEEE transactions on visualization and computer graphics, 23(1):491–500, 2016.
  25. Dataselfie: Empowering people to design personalized visuals to represent their data. In Proceedings of the 2019 CHI conference on human factors in computing systems, pages 1–12, 2019.
  26. Segment anything. arXiv preprint arXiv:2304.02643, 2023.
  27. Sheetcopilot: Bringing software productivity to the next level through large language models. arXiv preprint arXiv:2305.19308, 2023.
  28. Code as policies: Language model programs for embodied control. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 9493–9500. IEEE, 2023a.
  29. Taskmatrix. ai: Completing tasks by connecting foundation models with millions of apis. arXiv preprint arXiv:2303.16434, 2023b.
  30. Data illustrator: Augmenting vector design tools with lazy data binding for expressive visualization authoring. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, pages 1–13, 2018.
  31. Internchat: Solving vision-centric tasks by interacting with chatbots beyond language. arXiv preprint arXiv:2305.05662, 2023.
  32. Exploring visual information flows in infographics. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, pages 1–12, 2020.
  33. Augmented language models: a survey. arXiv preprint arXiv:2302.07842, 2023.
  34. Talm: Tool augmented language models. arXiv preprint arXiv:2205.12255, 2022.
  35. Graphoto: Aesthetically pleasing charts for casual information visualization. IEEE computer graphics and applications, 38(6):67–82, 2018.
  36. Retrieve-then-adapt: Example-based automatic generation for proportion-related infographics. IEEE Transactions on Visualization and Computer Graphics, 27(2):443–452, 2020.
  37. Tool learning with foundation models. arXiv preprint arXiv:2304.08354, 2023.
  38. Tptu: Task planning and tool usage of large language model-based ai agents. arXiv preprint arXiv:2308.03427, 2023.
  39. Content-driven layout for visualization design. In Proceedings of the 15th International Symposium on Visual Information Communication and Interaction, pages 1–8, 2022.
  40. Critical reflections on visualization authoring systems. IEEE transactions on visualization and computer graphics, 26(1):461–471, 2019.
  41. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761, 2023.
  42. Hugginggpt: Solving ai tasks with chatgpt and its friends in huggingface. arXiv preprint arXiv:2303.17580, 2023.
  43. Reverse-engineering information presentations: recovering hierarchical grouping from layouts of visual elements. Visual Intelligence, 1(1):1–14, 2023a.
  44. De-stijl: Facilitating graphics design with interactive 2d color palette recommendation. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, pages 1–19, 2023b.
  45. Mark Smiciklas. The power of infographics: Using pictures to communicate and connect with your audiences. Que Publishing, 2012.
  46. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  47. Infographics Wizard: Flexible Infographics Authoring and Design Exploration. Computer Graphics Forum, 2022.
  48. Infonice: Easy creation of information graphics. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, pages 1–12, 2018.
  49. Animated presentation of static infographics with infomotion. In Computer Graphics Forum, pages 507–518. Wiley Online Library, 2021.
  50. Claes Wohlin. Guidelines for snowballing in systematic literature studies and a replication in software engineering. In Proceedings of the 18th international conference on evaluation and assessment in software engineering, pages 1–10, 2014.
  51. Visual chatgpt: Talking, drawing and editing with visual foundation models. arXiv preprint arXiv:2303.04671, 2023.
  52. Promptchainer: Chaining large language model prompts through visual programming. In CHI Conference on Human Factors in Computing Systems Extended Abstracts, pages 1–10, 2022.
  53. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629, 2022.
  54. Tablegpt: Towards unifying tables, nature language and commands into one gpt. arXiv preprint arXiv:2307.08674, 2023.
  55. Dataquilt: Extracting visual elements from images to craft pictorial visualizations. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, pages 1–13, 2020.
  56. Data-copilot: Bridging billions of data and humans with autonomous workflow. arXiv preprint arXiv:2306.07209, 2023.
Citations (2)

Summary

  • The paper introduces GraphiMind, a dual-interface system that pairs a tool-augmented LLM with graphical editors to automate infographic creation.
  • It employs natural language processing for data collection and Stable Diffusion for image generation, with a DSL enabling dynamic layout customization.
  • User studies demonstrate significant time-savings and enhanced efficiency compared to traditional tools like PowerPoint.

GraphiMind: LLM-Centric Interface for Information Graphics Design

The paper "GraphiMind: LLM-centric Interface for Information Graphics Design" presents an innovative approach to simplifying the creation of information graphics for non-professionals by leveraging LLMs. The system, termed GraphiMind, combines a Textual Conversational Interface with a Graphical Manipulation Interface, streamlining design tasks from information collection to final adjustments.

System Architecture

GraphiMind integrates a textual conversational interface powered by a tool-augmented LLM and a traditional graphical manipulation interface. This dual-interface system allows users to engage in natural language dialogue with an intelligent agent while manipulating graphical elements directly on a canvas. The system's architecture is designed to support seamless user-agent collaboration throughout the design process. Figure 1

Figure 1: The Interface of GraphiMind System: the system integrates a Textual Conversational Interface (on the left), enhanced by a tool-augmented LLM as an agent, with a Graphical Manipulation Interface (on the right).

Key Design Tasks

Information Collection and Visual Element Design

GraphiMind automates the gathering of relevant data and visual elements using ChatGPT for information processing and an API for SVG icon retrieval. Users provide natural language inputs, and the system returns structured data objects suitable for infographic design.

Pivot Figure and Background Design

Utilizing Stable Diffusion, the system generates thematic images (pivot figures and backgrounds) based on user-provided prompts. The agent distinguishes between these two tasks by interpreting context-specific language nuances.

Layout Customization

A Domain-Specific Language (DSL) facilitates the automatic generation of layouts, enabling GPT-4 to design complex infographic structures. After parsing text descriptions, the system renders the layout directly onto the canvas, providing users with customizable templates. Figure 2

Figure 2: The Pipeline of Layout Customization Tool: this process encompasses user-agent interaction, GPT-4 activation, followed by parsing and drawing stages, culminating in the final layout generation.

Evaluation and User Study

A user study comparing GraphiMind with PowerPoint revealed significant time-saving benefits and a streamlined workflow for information graphics design. Participants using GraphiMind reported higher efficiency, particularly in information collection and beginning the creative process. This system demonstrates substantial potential in democratizing graphic design, making it accessible to users without professional expertise. Figure 3

Figure 3: An Example of the Design Process in GraphiMind: users effortlessly communicate their design intention with the LLM agent in natural language, by which a wide range of core design assets are generated, including pivot figures, layouts, visual elements, and more.

Conclusion

GraphiMind provides a compelling, user-friendly solution for creating information graphics by integrating LLMs into the design workflow. By automating complex tasks and facilitating natural language interactions, it lowers the barrier to entry for novice designers. Future work could focus on enhancing personalization, expanding design resource recommendations, and further integrating AI-driven context awareness. As advancements in LLMs continue, GraphiMind stands to benefit, offering a scalable platform for innovative graphic design solutions.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 1 like about this paper.