Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models (2411.05451v1)

Published 8 Nov 2024 in cs.SE, cs.AI, and cs.CL

Abstract: Recent advancements in LLMs have driven a revolutionary paradigm shift in process automation from Robotic Process Automation to Agentic Process Automation by automating the workflow orchestration procedure based on LLMs. However, existing LLMs (even the advanced OpenAI GPT-4o) are confined to achieving satisfactory capability in workflow orchestration. To address this limitation, we present WorkflowLLM, a data-centric framework elaborately designed to enhance the capability of LLMs in workflow orchestration. It first constructs a large-scale fine-tuning dataset WorkflowBench with 106,763 samples, covering 1,503 APIs from 83 applications across 28 categories. Specifically, the construction process can be divided into three phases: (1) Data Collection: we collect real-world workflow data from Apple Shortcuts and RoutineHub, transcribing them into Python-style code. We further equip them with generated hierarchical thought via ChatGPT. (2) Query Expansion: we prompt ChatGPT to generate more task queries to enrich the diversity and complexity of workflows. (3) Workflow Generation: we leverage an annotator model trained on collected data to generate workflows for synthesized queries. Finally, we merge the synthetic samples that pass quality confirmation with the collected samples to obtain the WorkflowBench. Based on WorkflowBench, we fine-tune Llama-3.1-8B to obtain WorkflowLlama. Our experiments show that WorkflowLlama demonstrates a strong capacity to orchestrate complex workflows, while also achieving notable generalization performance on previously unseen APIs. Additionally, WorkflowBench exhibits robust zero-shot generalization capabilities on an out-of-distribution task planning dataset, T-Eval. Our data and code are available at https://github.com/OpenBMB/WorkflowLLM.

Enhancing Workflow Orchestration Capabilities of LLMs

The paper presents a comprehensive framework named WorkflowLLM, specifically designed to augment the workflow orchestration capabilities of LLMs. The research introduces a novel approach called Agentic Process Automation (APA), marking a significant shift from the traditional Robotic Process Automation (RPA), and addressing the inherent limitations of LLMs in managing complex workflows.

Dataset Construction: WorkflowBench

A cornerstone of the WorkflowLLM framework is the construction of a fine-tuning dataset, referred to as WorkflowBench. This dataset comprises 106,763 instances, encapsulating 1,503 APIs from 83 different applications across 28 categories. The dataset construction is meticulously divided into three phases:

  1. Data Collection: This phase involves curation of high-quality shortcuts from RoutineHub, encompassing human annotations, functional descriptions, and API documentation. The shortcuts are transcribed into Python-style code to improve parameter handling and control logic.
  2. Query Expansion: The complexity and diversity of workflows are enriched by generating additional task queries using ChatGPT, expanding beyond the initially collected data set.
  3. Workflow Generation: A workflow annotator model is trained to generate workflows for synthesized queries, ensuring high-quality outputs through an iterative refinement process enabled by ChatGPT.

This dataset not only broadens the scope of APIs and workflow categories but also maintains a high degree of complexity to realistically simulate real-world applications.

Model Development: WorkflowLlama

WorkflowLlama is the product of fine-tuning LLaMA-3.1-8B on the WorkflowBench dataset. This model exhibits enhanced performance in orchestrating complex workflows and demonstrates robust generalization capabilities even on previously unseen APIs. The empirical evaluation employs both CodeBLEU and Pass Rate metrics, where WorkflowLlama significantly outperforms existing models, including GPT-4o with in-context learning.

Implications and Future Prospects

This research has both theoretical and practical implications. Theoretically, it challenges existing paradigms within APA, demonstrating the efficacy of a data-centric approach in refining LLM capabilities. Practically, the enhanced orchestration ability of LLMs opens the door for more sophisticated and automated business process management applications, reducing reliance on manual input and increasing efficiency.

Moreover, the model's ability to handle unseen instructions and APIs suggests potential for adaptive learning environments where continuous data introduction could further evolve LLM capabilities.

Limitations and Future Research

While promising, WorkflowLLM's reliance on Apple Shortcuts data may limit its applicability across diverse fields. Future research could explore incorporating broader data sources and extend evaluation through actual workflow execution to navigate API changes and user permissions.

In conclusion, WorkflowLLM positions itself as a promising development in the field of workflow orchestration, providing a solid foundation for future explorations and advancements in the intersection of process automation and LLMs.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Shengda Fan (1 paper)
  2. Xin Cong (46 papers)
  3. Yuepeng Fu (2 papers)
  4. Zhong Zhang (42 papers)
  5. Shuyan Zhang (13 papers)
  6. Yuanwei Liu (342 papers)
  7. Yesai Wu (11 papers)
  8. Yankai Lin (125 papers)
  9. Zhiyuan Liu (433 papers)
  10. Maosong Sun (337 papers)
Github Logo Streamline Icon: https://streamlinehq.com