Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Layout Generation Agents with Large Language Models (2405.08037v1)

Published 13 May 2024 in cs.HC and cs.AI

Abstract: In recent years, there has been an increasing demand for customizable 3D virtual spaces. Due to the significant human effort required to create these virtual spaces, there is a need for efficiency in virtual space creation. While existing studies have proposed methods for automatically generating layouts such as floor plans and furniture arrangements, these methods only generate text indicating the layout structure based on user instructions, without utilizing the information obtained during the generation process. In this study, we propose an agent-driven layout generation system using the GPT-4V multimodal LLM and validate its effectiveness. Specifically, the LLM manipulates agents to sequentially place objects in the virtual space, thus generating layouts that reflect user instructions. Experimental results confirm that our proposed method can generate virtual spaces reflecting user instructions with a high success rate. Additionally, we successfully identified elements contributing to the improvement in behavior generation performance through ablation study.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (11)
  1. GPT-4 technical report. arXiv preprint arXiv:2303.08774.
  2. Layoutgpt: Compositional visual planning and generation with large language models. Advances in Neural Information Processing Systems, 36.
  3. Heewoo Jun and Alex Nichol. 2023. Shap-e: Generating conditional 3d implicit functions. arXiv preprint arXiv:2305.02463.
  4. Scaffolding coordinates to promote vision-language coordination in large multi-modal models. arXiv preprint arXiv:2402.12058.
  5. Steve-1: A generative model for text-to-behavior in minecraft. arXiv preprint arXiv:2306.00937.
  6. Atiss: Autoregressive transformers for indoor scene synthesis. In Advances in Neural Information Processing Systems, volume 34, pages 12013–12026. Curran Associates, Inc.
  7. Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291.
  8. Jarvis-1: Open-world multi-task agents with memory-augmented multimodal language models. arXiv preprint arXiv:2311.05997.
  9. Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems, volume 35, pages 24824–24837. Curran Associates, Inc.
  10. Anyhome: Open-vocabulary generation of structured and textured 3d homes. arXiv preprint arXiv:2312.06644.
  11. Holodeck: Language guided generation of 3d embodied ai environments. arXiv preprint arXiv:2312.09067.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Yuichi Sasazawa (4 papers)
  2. Yasuhiro Sogawa (13 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com

HackerNews