- The paper introduces ComfyUI-Copilot, an intelligent plugin that automates workflow generation and node recommendations on the ComfyUI platform using a hierarchical multi-agent framework.
- It leverages extensive curated knowledge bases—with over 7K nodes, 62K models, and 9K workflows—to generate documentation and guide installation and configuration.
- Offline and online evaluations reveal high recall rates (over 88.5%) and strong user acceptance (65.4% for nodes, 85.9% for workflows), highlighting its practical impact.
This paper introduces ComfyUI-Copilot (2506.05010), an intelligent assistant designed as a plugin for the ComfyUI platform to address challenges users face, such as complex workflow design, limited documentation, and model configuration issues. ComfyUI-Copilot leverages LLMs within a hierarchical multi-agent framework to offer features like automated workflow generation, intelligent node and model recommendations, and ComfyUI-related question answering.
The core of the system relies on a central LLM-based assistant agent that delegates tasks to specialized worker agents supported by curated knowledge bases. The system's key functionalities include:
- Automated Workflow Generation: ComfyUI-Copilot can interpret user instructions, retrieve or synthesize relevant workflows, and load them into the ComfyUI canvas with a single click. If required nodes are missing, it provides installation guidance. Workflows are represented in graph, JSON, or code formats, with conversions enabled. The system explores generating workflows from scratch using code LLMs, including fine-tuning models like Qwen2.5-Coder-7B [hui2024qwen25codertechnicalreport] on collected workflows to improve performance.
- Node and Model Recommendation: The assistant recommends suitable nodes based on user instructions and suggests compatible checkpoint and LoRA models, considering dependencies between components (e.g., recommending LoRAs compatible with the user's diffusion model). Recommendations follow a three-stage pipeline: LLM expansion of user intent, semantic and lexical similarity-based retrieval from knowledge bases, and re-ranking based on re-ranker scores and popularity factors (upvotes, downloads, stars).
- ComfyUI-related Question Answering: The copilot provides detailed information on nodes and models, including usage, installation steps, parameter explanations, and suggestions for downstream subgraphs (e.g., recommending face swapping or image upscaling subgraphs after a KSampler node). It supports multilingual queries and responses.
- Enhanced Features: For experienced users, ComfyUI-Copilot offers prompt writing assistance to refine text prompts for image generation and a parameter search functionality that enables parallel experiments with varying parameters to find optimal settings for workflows.
A crucial component is the construction of extensive knowledge bases covering 7K nodes, 62K models, and 9K workflows. Data is sourced from platforms, GitHub repositories, and the ComfyUI website, with automated filtering for NSFW content. For nodes lacking documentation, the system automatically generates it by setting up a sandbox environment, cloning repositories, analyzing code to extract metadata (class type, parameters), chunking and embedding the code, retrieving relevant snippets, and using an LLM (like GPT-4o) to generate documentation. For workflows and models, LLMs like GPT-4o are used to enhance documentation by leveraging multimodal understanding of community-sourced texts, images, and workflow JSONs. These KBs are continuously updated weekly to incorporate new modules and tasks.
Implemented using LangChain, the framework allows the assistant agent to autonomously select worker agents based on user input and conversation history. Responses are synthesized by integrating outputs from the relevant agents.
The effectiveness of ComfyUI-Copilot is evaluated both offline and online. Offline quantitative evaluations show high recall rates (over 88.5%) for nodes and workflows on a constructed test set using both GPT-4o and DeepSeek-V3. The workflow generation capabilities are evaluated based on pass rate (executability), average number of nodes, and node selection metrics (precision, recall, F1). Fine-tuned open-source models show comparable performance to closed-source models like Claude-3.7-Sonnet, achieving a high node selection F1 score (0.95), although overall workflow generation accuracy has room for improvement. Online user feedback since its release on GitHub demonstrates a 65.4% acceptance rate for recommended nodes and 85.9% for proposed workflows. The project has garnered significant community interest, indicated by GitHub stars, query volume, and user base size across many countries.
In conclusion, ComfyUI-Copilot is presented as the first open-source ComfyUI plugin designed to automate workflow creation and provide intelligent assistance. Its multi-agent framework and curated knowledge bases effectively lower the entry barrier for new users and improve efficiency for experienced ones in AIGC workflow development. Future work aims to integrate community feedback and enhance features like automatic workflow and parameter optimization.