HyperAI

One-sentence Summary

The authors introduce ComfyGPT, a self-optimizing multi-agent system comprising ReformatAgent, FlowAgent, RefineAgent, and ExecuteAgent that automatically generates ComfyUI workflows from task descriptions by focusing on precise node connections rather than entire workflows and leveraging reinforcement learning, while establishing the FlowDataset, FlowBench, and four novel evaluation metrics to demonstrate significant improvements over existing LLM-based methods.

Key Contributions

The paper introduces ComfyGPT, a self-optimizing multi-agent framework that automatically translates natural language task descriptions into functional ComfyUI workflows. The architecture sequentially deploys four specialized agents, including ReformatAgent, FlowAgent, RefineAgent, and ExecuteAgent, to parse inputs, generate workflow diagrams, optimize node connections, and compile the final JSON output.
The methodology improves generation accuracy by targeting precise node connections rather than synthesizing complete workflows in a single step. Reinforcement learning via the GRPO algorithm continuously optimizes the agent reasoning process, which reduces error accumulation and enhances pipeline adaptability.
The work provides FlowDataset, a large-scale collection of 13,571 workflow-description pairs, and establishes FlowBench alongside four novel evaluation metrics to standardize performance assessment. Experimental results demonstrate that the system outperforms existing LLM-based approaches in automated workflow generation.

Introduction

The authors introduce ComfyGPT, a self-optimizing multi-agent system that automates the generation of ComfyUI workflows from natural language instructions. ComfyUI provides a flexible node-based interface for constructing complex image generation pipelines, yet manual workflow design remains a significant bottleneck due to the intricate topology of node connections and the difficulty of adapting existing templates to diverse tasks. Existing LLM-based solutions often rely on open-loop architectures or restricted fine-tuning, which leads to error accumulation, context limitations, and insufficient coverage for advanced multi-stage generation requirements. To overcome these challenges, the authors propose a modular approach that focuses on generating precise node connections rather than entire workflows, leveraging reinforcement learning with GRPO to enable autonomous error correction and iterative refinement. This method allows the system to produce topologically consistent and highly accurate workflows that significantly outperform prior baselines across comprehensive evaluation metrics.

Dataset

Dataset Composition and Sources: The authors compile FlowDataset by crawling workflow-description pairs from major ComfyUI community platforms, including OpenArt, LibLib, ComfyWorkflows, and Civitai. The final collection contains 13,571 entries organized into six core categories and six specialized subcategories.
Subset Details and Sizing: The dataset is partitioned into a training set of 12,571 samples and a dedicated evaluation benchmark called FlowBench. FlowBench comprises 1,000 samples proportionally sampled from each category while preserving workflow length distribution. The six core categories include Text-to-Image Generation, Image Editing, Style Transfer, 3D Generation, Video Editing or Generation, and Others, with Image Editing further divided into HD Upscaling, Redrawing, Out-painting, Character-Based Guidance, Face Swap, and Background Change or Remove.
Data Usage and Training Strategy: The authors use the 12,571-sample training set to develop and fine-tune the FlowAgent model. FlowBench serves as a standardized benchmark to evaluate ComfyGPT, measuring performance through Format Validation, Pass Accuracy, Pass Instruction Alignment, and Pass Node Diversity.
Processing Pipeline and Metadata Construction: Data preparation follows a six-stage pipeline. Initial crawling extracts metadata such as titles, tags, descriptions, and JSON workflows. The cleaning phase enforces strict JSON schema validation, resolves ambiguous node connections, strips redundant elements like Reroute and Note nodes, filters disconnected graph structures, and verifies parameter compatibility. The authors then deploy ChatGPT-4o-mini to transform noisy descriptions into concise functional instructions and automate category assignment. Before final partitioning, they validate execution success, retaining only workflows that achieve at least a 70% pass rate on the ComfyUI server. Each finalized entry pairs a polished natural language instruction with its corresponding JSON workflow and category labels.

Method

The authors leverage ComfyUI's modular architecture, which decomposes model inference into a set of interconnected nodes, to design ComfyGPT—a self-optimizing, multi-agent system for generating ComfyUI workflows from natural language instructions. The system operates as a pipeline composed of four specialized agents: ReformatAgent, FlowAgent, RefineAgent, and ExecuteAgent, each responsible for a distinct stage of the workflow generation process.

ComfyUI's native workflow representation uses a JSON format to store node information, where each node $n_k$ possesses multiple logical inputs $\mathbb{I}^k$ and outputs $\mathbb{O}^k$ . These nodes are interconnected to form a topological workflow structure. However, this JSON format is often lengthy and contains redundant information, making it difficult for large language models (LLMs) to process due to context length constraints. To address this, the ReformatAgent transforms the complex JSON workflow into a simplified and more intuitive logic diagram, denoted as $\mathbb{D}$ , which is represented as a collection of links $l_i$ between nodes. Each link $l_i$ is defined as $[n_{out}, O_{j}^{out}, n_{in}, I_{k}^{in}]$ , capturing the connection from a specific output of an outgoing node to a specific input of an incoming node. This transformation, illustrated in the figure below, focuses on the relationships between nodes rather than their full configuration, thereby reducing complexity and improving readability for the subsequent agents.

The FlowAgent is the core component responsible for generating the workflow diagram based on a user's natural language instruction. It is trained using a two-stage process to overcome the challenges of hallucination and context limitations. In the first stage, Supervised Fine-Tuning (SFT), the model is trained on a dataset of workflow descriptions $desc$ and their corresponding diagram representations $d$ . The objective is to maximize the likelihood of generating the correct sequence of tokens that compose the diagram, as defined by the SFT objective function. In the second stage, Reinforcement Learning (RL) with the GRPO algorithm, the model undergoes self-correction and iterative optimization. The RL objective function incorporates a reward model that penalizes the generation of nodes not present in a predefined valid set $N^T$ , thereby ensuring the generated diagrams are structurally sound and free from fictitious nodes. This two-stage training enables the FlowAgent to generate accurate and reliable workflow diagrams.

Despite the improvements from SFT and RL, the generated diagrams may still contain outdated or incorrect node names due to updates in the ComfyUI ecosystem. The RefineAgent acts as a secondary inspection and correction mechanism to address this issue. It integrates a large language model with a knowledge retrieval capability, utilizing a continuously updated node database $\mathcal{K}$ containing information about 6,362 unique nodes. For an incorrect node $n_{ic}$ , the RefineAgent calculates its semantic similarity to all nodes in $\mathcal{K}$ using embedding vectors and cosine similarity. It then retrieves the top $k$ most similar nodes and prompts the LLM to select the most suitable replacement $n_c$ based on the workflow diagram, user instruction, and the candidate nodes. This process ensures the system remains up-to-date and can handle changes in the underlying platform.

Finally, the ExecuteAgent converts the refined workflow diagram back into a ComfyUI-compatible JSON format. This step reverses the transformation performed by the ReformatAgent, reconstructing the full node configuration. The resulting JSON workflow is then uploaded to the ComfyUI server, where it is executed to generate the desired output image, completing the end-to-end process. The entire pipeline, from user instruction to output, is designed to be robust, flexible, and capable of handling a wide variety of image generation tasks.

Experiment

The evaluation benchmarks ComfyGPT against established multi-agent frameworks, closed-source models, and open-source baselines to validate its automated workflow generation capabilities. Quantitative and qualitative assessments confirm that its specialized agent architecture and reinforcement learning pipeline significantly improve instruction alignment, generation accuracy, and robustness against error propagation. Ablation studies and user feedback further demonstrate that each component meaningfully contributes to system stability, while the framework reliably adapts to complex prompts and supports flexible human-in-the-loop adjustments. Overall, the findings establish ComfyGPT as a highly accurate and practical standard for constructing ComfyUI workflows.

The authors evaluate ComfyGPT against various baselines on a benchmark, showing that ComfyGPT achieves superior performance across multiple metrics, including format validation, pass accuracy, and node diversity. The results indicate that ComfyGPT outperforms both closed-source models and open-source methods, with significant improvements in key areas. Each component of the system contributes to the overall effectiveness, and the use of reinforcement learning further enhances performance. ComfyGPT demonstrates superior performance across all evaluated metrics compared to baseline methods. The integration of reinforcement learning and specialized agents significantly improves workflow accuracy and validation. ComfyGPT achieves higher node diversity and pass accuracy, indicating robust and varied workflow generation.

The authors evaluate their approach, ComfyGPT, against various baselines on two benchmarks, FlowBench and ComfyBench, using multiple metrics to assess workflow generation performance. Results show that ComfyGPT outperforms all compared methods across all metrics, with significant improvements in pass rate and format validation, demonstrating its effectiveness and robustness. ComfyGPT achieves superior performance across all evaluated metrics compared to baseline methods on both benchmarks. The model shows significant improvements in pass rate and format validation, indicating strong task understanding and workflow accuracy. Each component of the ComfyGPT system contributes to the overall performance, with reinforcement learning providing incremental gains in accuracy.

The authors evaluate ComfyGPT's performance on FlowBench by analyzing the contribution of each component in its multi-agent system. Results show that removing any agent leads to a decline in metrics, with the full system achieving the highest performance across all evaluated measures. The ablation study highlights that the combination of all components is essential for optimal results, particularly in format validation, pass accuracy, and pass instruct alignment. Removing any agent from ComfyGPT leads to a decrease in performance across all metrics. The full ComfyGPT system achieves the highest scores in format validation, pass accuracy, and pass instruct alignment. Each component plays a significant role in improving the overall performance of the system.

The authors conduct an ablation study to evaluate the impact of the hyperparameter k on Pass Accuracy (PA), observing that PA increases as k rises from 1 to 5, after which it slightly decreases at k=7. The results indicate that a value of k=5 yields the highest performance, suggesting an optimal balance in the retrieval process. Pass Accuracy improves as the retrieval parameter k increases from 1 to 5. The highest Pass Accuracy is achieved at k=5, with a subsequent slight drop at k=7. The trend suggests that k=5 provides the optimal retrieval configuration for performance.

The authors compare ComfyGPT against baseline methods on a benchmark, showing that ComfyGPT achieves significantly higher pass rates. The results indicate that ComfyGPT outperforms other approaches across the evaluated metrics, demonstrating its effectiveness in generating accurate and aligned workflows. ComfyGPT achieves the highest pass rate compared to other methods. The performance of ComfyGPT is notably superior to baselines across all evaluated metrics. ComfyGPT demonstrates strong capability in generating workflows that align with user instructions.

The authors evaluate ComfyGPT against various closed-source and open-source baselines on the FlowBench and ComfyBench benchmarks to assess its overall workflow generation capabilities. Main comparisons and ablation studies validate that the model consistently surpasses existing methods, confirming that every multi-agent component and the reinforcement learning integration are crucial for maintaining robust accuracy and strict format compliance. Additional hyperparameter tuning experiments identify an optimal retrieval setting that balances performance effectively. Collectively, these results demonstrate that ComfyGPT reliably produces diverse, accurate, and instruction-aligned workflows while outperforming current state-of-the-art approaches.

One-sentence Summary

Key Contributions

The paper introduces ComfyGPT, a self-optimizing multi-agent framework that automatically translates natural language task descriptions into functional ComfyUI workflows. The architecture sequentially deploys four specialized agents, including ReformatAgent, FlowAgent, RefineAgent, and ExecuteAgent, to parse inputs, generate workflow diagrams, optimize node connections, and compile the final JSON output.
The methodology improves generation accuracy by targeting precise node connections rather than synthesizing complete workflows in a single step. Reinforcement learning via the GRPO algorithm continuously optimizes the agent reasoning process, which reduces error accumulation and enhances pipeline adaptability.
The work provides FlowDataset, a large-scale collection of 13,571 workflow-description pairs, and establishes FlowBench alongside four novel evaluation metrics to standardize performance assessment. Experimental results demonstrate that the system outperforms existing LLM-based approaches in automated workflow generation.

Introduction

Dataset

Dataset Composition and Sources: The authors compile FlowDataset by crawling workflow-description pairs from major ComfyUI community platforms, including OpenArt, LibLib, ComfyWorkflows, and Civitai. The final collection contains 13,571 entries organized into six core categories and six specialized subcategories.
Subset Details and Sizing: The dataset is partitioned into a training set of 12,571 samples and a dedicated evaluation benchmark called FlowBench. FlowBench comprises 1,000 samples proportionally sampled from each category while preserving workflow length distribution. The six core categories include Text-to-Image Generation, Image Editing, Style Transfer, 3D Generation, Video Editing or Generation, and Others, with Image Editing further divided into HD Upscaling, Redrawing, Out-painting, Character-Based Guidance, Face Swap, and Background Change or Remove.
Data Usage and Training Strategy: The authors use the 12,571-sample training set to develop and fine-tune the FlowAgent model. FlowBench serves as a standardized benchmark to evaluate ComfyGPT, measuring performance through Format Validation, Pass Accuracy, Pass Instruction Alignment, and Pass Node Diversity.
Processing Pipeline and Metadata Construction: Data preparation follows a six-stage pipeline. Initial crawling extracts metadata such as titles, tags, descriptions, and JSON workflows. The cleaning phase enforces strict JSON schema validation, resolves ambiguous node connections, strips redundant elements like Reroute and Note nodes, filters disconnected graph structures, and verifies parameter compatibility. The authors then deploy ChatGPT-4o-mini to transform noisy descriptions into concise functional instructions and automate category assignment. Before final partitioning, they validate execution success, retaining only workflows that achieve at least a 70% pass rate on the ComfyUI server. Each finalized entry pairs a polished natural language instruction with its corresponding JSON workflow and category labels.

Method

Experiment

One-sentence Summary

Key Contributions

The paper introduces ComfyGPT, a self-optimizing multi-agent framework that automatically translates natural language task descriptions into functional ComfyUI workflows. The architecture sequentially deploys four specialized agents, including ReformatAgent, FlowAgent, RefineAgent, and ExecuteAgent, to parse inputs, generate workflow diagrams, optimize node connections, and compile the final JSON output.
The methodology improves generation accuracy by targeting precise node connections rather than synthesizing complete workflows in a single step. Reinforcement learning via the GRPO algorithm continuously optimizes the agent reasoning process, which reduces error accumulation and enhances pipeline adaptability.
The work provides FlowDataset, a large-scale collection of 13,571 workflow-description pairs, and establishes FlowBench alongside four novel evaluation metrics to standardize performance assessment. Experimental results demonstrate that the system outperforms existing LLM-based approaches in automated workflow generation.

Introduction

Dataset

Dataset Composition and Sources: The authors compile FlowDataset by crawling workflow-description pairs from major ComfyUI community platforms, including OpenArt, LibLib, ComfyWorkflows, and Civitai. The final collection contains 13,571 entries organized into six core categories and six specialized subcategories.
Subset Details and Sizing: The dataset is partitioned into a training set of 12,571 samples and a dedicated evaluation benchmark called FlowBench. FlowBench comprises 1,000 samples proportionally sampled from each category while preserving workflow length distribution. The six core categories include Text-to-Image Generation, Image Editing, Style Transfer, 3D Generation, Video Editing or Generation, and Others, with Image Editing further divided into HD Upscaling, Redrawing, Out-painting, Character-Based Guidance, Face Swap, and Background Change or Remove.
Data Usage and Training Strategy: The authors use the 12,571-sample training set to develop and fine-tune the FlowAgent model. FlowBench serves as a standardized benchmark to evaluate ComfyGPT, measuring performance through Format Validation, Pass Accuracy, Pass Instruction Alignment, and Pass Node Diversity.
Processing Pipeline and Metadata Construction: Data preparation follows a six-stage pipeline. Initial crawling extracts metadata such as titles, tags, descriptions, and JSON workflows. The cleaning phase enforces strict JSON schema validation, resolves ambiguous node connections, strips redundant elements like Reroute and Note nodes, filters disconnected graph structures, and verifies parameter compatibility. The authors then deploy ChatGPT-4o-mini to transform noisy descriptions into concise functional instructions and automate category assignment. Before final partitioning, they validate execution success, retaining only workflows that achieve at least a 70% pass rate on the ComfyUI server. Each finalized entry pairs a polished natural language instruction with its corresponding JSON workflow and category labels.

ComfyUI Chroma Workflow Online Tutorial

Abstract

One-sentence Summary

Key Contributions

Introduction

Dataset

Method

Experiment

Build AI with AI

HyperAI Newsletters

ComfyUI Chroma Workflow Online Tutorial

Abstract

One-sentence Summary

Key Contributions

Introduction

Dataset

Method

Experiment

Build AI with AI

HyperAI Newsletters

ComfyUI Chroma Workflow Online Tutorial

Abstract

One-sentence Summary

Key Contributions

Introduction

Dataset

Method

Experiment

Build AI with AI

HyperAI Newsletters

Command Palette

ComfyUI Chroma Workflow Online Tutorial

Abstract

One-sentence Summary

Key Contributions

Introduction

Dataset

Method

Experiment

Build AI with AI

HyperAI Newsletters

Command Palette

ComfyUI Chroma Workflow Online Tutorial

Abstract

One-sentence Summary

Key Contributions

Introduction

Dataset

Method

Experiment

Build AI with AI

HyperAI Newsletters

Command Palette

ComfyUI Chroma Workflow Online Tutorial

Abstract

One-sentence Summary

Key Contributions

Introduction

Dataset

Method

Experiment

Build AI with AI

HyperAI Newsletters