- The paper introduces a hybrid control architecture that integrates LLM task planning with formal automata-based safety verification to ensure safe task allocation in dynamic manufacturing.
- It employs a depth-first search reachability analysis on the synchronous product of execution and safety automata, achieving significant improvements in safety rule compliance in multi-robot assembly scenarios.
- The approach enables automated, contextual plan repair and highlights scalability trade-offs, paving the way for future hierarchical and compositional verification strategies.
Logic-Based Verification for LLM-Driven Task Allocation in Multi-Agent Manufacturing Systems
Introduction
The proliferation of adaptive, personalized manufacturing demands flexible control architectures capable of safe, dynamic reconfiguration. While multi-agent systems (MAS) and discrete event systems (DES) provide modularity and structured resource coordination, their conventional instantiations rely on static, predefined task modules, limiting adaptability. LLMs have recently enabled on-the-fly task planning using natural language (NL) requirements, promising unprecedented system flexibility. However, LLM-generated plans can easily violate critical safety constraints due to limited operational context, imprecise instruction parsing, or inability to account for execution-level hazards.
The work titled "Logic-Based Verification of Task Allocation for LLM-Enabled Multi-Agent Manufacturing Systems" (2604.17142) addresses these concerns through a hybrid control architecture that integrates LLM-based task plan synthesis with formal automata-based safety verification. The framework guarantees compliance with user-defined, formally-specified temporal logic safety requirements, bridging the gap between adaptive LLM planning and DES-based systemlevel validation. The efficacy of the proposed architecture is demonstrated through multi-robot assembly scenarios, systematically evaluating safety-rule satisfaction, repair costs, and scalability.
Control Architecture and Task Planning Workflow
The proposed control loop begins with a Product Agent (PA) that leverages an LLM to translate NL product specifications into structured requirements and then generates a candidate task plan as a directed acyclic graph (DAG). This process explicitly captures concurrent tasks, resource requirements, and temporal dependencies, mapping high-level production goals to manufacturable, agent-level subtasks.
Figure 1: NL product requirements are converted into a DAG-based task plan and compiled to a Finite State Automaton (FSA) representation.
A Central Controller Agent (CCA) then oversees safety verification, acting as a global coordinator with full access to all agent states, resource models, and current manufacturing context. The CCA translates NL safety constraints into Linear Temporal Logic on Finite Traces (LTLf) formulas. These are in turn compiled into safety automata, which encode ordering, exclusion, and other safety relationships over observable atomic events induced by the manufacturing process.
The verification problem is formulated as a reachability analysis over the synchronous product of the plan's execution automaton and the set of safety automata derived from LTLf constraints. For each candidate plan, the CCA employs a depth-first search (DFS) to traverse all feasible joint state transitions, systematically evaluating whether any safety violation states can be reached.
Figure 2: CCA translates NL safety constraints into LTLf specifications and compiles them for automata-based validation of LLM-generated plans.
If a plan is determined unsafe, the framework closes the loop by extracting the minimal violating trace (i.e., witnessed sequence of events responsible for violation), generating structured feedback for the PA. The PA, in turn, utilizes the LLM to propose a revised plan based on this precise context—includes unsafe plan fragments, violated constraints, and essential resources—enabling semantically meaningful, context-aware repairs beyond the capacity of standard template-based approaches.
Figure 3: CCA architecture for DFS-based validation and violation accumulation, integrating seamless structured feedback for iterative plan repair.
Case Study: Multi-Robot Assembly Task Allocation
Three multi-robot assembly scenarios (S1, S2, S3) of increasing complexity were analyzed using this architecture. Each scenario provided NL product goals and safety rules, encompassing both ordering and mutual exclusion constraints. The framework's safety-rule satisfaction was compared against an LLM-only baseline (planning using NL constraints but without formal validation or repair).
Figure 4: S1 setup with two robots, the automata showcasing a safety violation, and the repaired version enforcing all safety constraints.
The framework exhibited substantial improvements in safety compliance over pure-LLM approaches:
- S1: Rule satisfaction improved from 50.0% (LLM) to 92.5% (proposed framework)
- S2: 75.9% → 91.7%
- S3: 50.0% → 86.3%
Concurrently, the architecture quantified the cost of safety enforcement: average repair attempts and total verification time grew with increasing scenario complexity, ranging from 1.8 repairs/0.1s in S1 to 3.9 repairs/25.5s in S3. The product state space expanded rapidly as concurrent resource operations were introduced, underscoring the potential need for symbolic or compositional verification approaches in large-scale deployments.
Numerical outcomes demonstrate strong improvements in rule satisfaction with only modest repair overhead; however, they also indicate the fundamental scalability trade-offs inherent in exhaustive reachability-based methods.
Theoretical and Practical Implications
This research makes several concrete contributions:
- Decoupling Planning Flexibility from Safety: LLMs are leveraged exclusively for adaptive plan generation, not for enforcing safety. All safety properties are encoded and validated using temporal logic and automata, providing robust correctness guarantees even as product mix or operational context shifts.
- Automated, Contextual Plan Repair: Structured, automata-based feedback to LLMs supports plan adaptation in situations where simple template fixes would fail, crucial for environments with evolving safety specifications and system topologies.
- Scalability Considerations: The observed state explosion and repair iteration costs at higher scales highlight the necessity of hierarchical, compositional, or incremental verification strategies as system complexity grows.
- Minimal Human Reliance Post-Certification: Provided that NL-to-LTLf constraint translation is initially validated by domain experts, subsequent plan verification and repair can proceed autonomously—a critical step toward deployable AI-driven manufacturing.
Future Directions
While the current framework demonstrates reliable safety enforcement in canonical assembly scenarios, several avenues for further development are evident. Scaling to large facilities will require hierarchical CCA implementations, partitioned state spaces, and compositional/symbolic verification to maintain tractable verification times. Integration of online, runtime verification (beyond pre-execution planning) is necessary for robust operation under uncertainty and dynamic disruptions. Improving LLM-internal understanding of formal safety contexts—potentially through fine-tuned or architectural adaptations—may reduce both repair iterations and reliance on out-of-band specification validation.
Conclusion
This logic-based verification architecture marks a significant convergence of LLM-enabled flexible planning and formal methods within multi-agent manufacturing systems. By tightly integrating automata-based safety validation with adaptive LLM replanning, the framework achieves high safety-rule satisfaction without constraining the expressiveness or adaptability of natural language-based plan generation. The systematic handling of unsafe allocations and the rigorous enforcement of user-defined safety constraints underscore its utility for next-generation, autonomous manufacturing systems. Future work in compositional verification and runtime adaptation will be instrumental to broad industrial adoption and robust operation in highly heterogeneous environments.