AutoFSM: Automated Finite State Machine Framework

Updated 16 December 2025

AutoFSM is a suite of frameworks and algorithms that automate the extraction, synthesis, and modification of finite state machines from descriptions, examples, or data.
It integrates multi-agent pipelines with formal intermediate representations to enhance synthesis accuracy and reduce syntax errors in hardware design and protocol reverse engineering.
Applied in robotics, symbolic optimization, and online adaptation, AutoFSM demonstrates significant gains in performance and debugging efficiency through varied synthesis methodologies.

AutoFSM refers to the family of frameworks, algorithms, and toolchains that automate the construction, modification, inference, and use of finite state machines (FSMs) from descriptions, examples, or data. AutoFSM approaches span multi-agent collaborative code generation, online state discovery and adaptation, program synthesis via SAT or sequence labeling, and automaton-based representations in symbolic computation and inference. These systems have been adopted in hardware design automation, robotics, protocol reverse engineering, controller synthesis, and symbolic optimization, reliably improving synthesis efficiency, correctness, and scalability.

1. Multi-Agent FSM Code Generation: The AutoFSM Framework

Recent frameworks labeled "AutoFSM" employ distributed, multi-agent architectures for automated FSM code generation, driven by advances in LLMs. The canonical instance is a six-agent pipeline targeting RTL FSM synthesis (Luo et al., 12 Dec 2025). The agents, each responsible for a distinct subtask, interact as follows:

FSMExtractor parses English descriptions and generates a JSON-based FSM intermediate representation (IR).
Verifier checks IR completeness and consistency against the specification.
Coder translates verified IR to Verilog via YAML and fsm2sv; it reports syntax errors upstream.
Tester synthesizes SystemC models and testbenches, invoking Verilator for co-simulation.
Judger locates faults by analyzing simulation or compilation outcomes.
Fixer amends IR or testbench code in response to Judger diagnostics, triggering re-synthesis.

This division localizes errors, reduces context drift, and facilitates iterative correction. Integration of a structured IR (see Section 2) with a minimal IR-to-Verilog translation pipeline sharply reduces LLM-induced syntax errors and decouples documentation from implementation.

2. Intermediate Representation and Automatic Translation

The central enabling technology in AutoFSM code generation frameworks is a formal, machine-friendly intermediate representation of FSM logic. The IR is typically expressed as a JSON schema (Luo et al., 12 Dec 2025):

"states": enumerates all state identifiers.
"inputs" and "outputs": enumerate interface signals.
"transitions": each an object ⟨src, cond, dst, action⟩, explicitly capturing source/destination, guard conditions, and output behavior.

Formally, this can be captured using BNF:

$\begin{array}{rcl} \langle\mathit{FSM}\rangle &::=& \{\;\langle\mathit{States}\rangle,\;\langle\mathit{Inputs}\rangle,\;\langle\mathit{Outputs}\rangle,\;\langle\mathit{TransList}\rangle\;\}\ \langle\mathit{Trans}\rangle &::=& \{ "src", "cond", "dst", "action" \} \end{array}$

Automatic translation of IR to Verilog is accomplished by a toolchain that includes schema/semantic validation, IR-to-YAML conversion (for compatibility with tools like fsm2sv), and error propagation to the user or pipeline (Luo et al., 12 Dec 2025). Empirical replacement of direct Verilog generation with the IR→YAML→fsm2sv path reduces syntax errors from 15% to 1%.

3. Model Extraction, Synthesis, and Modification

AutoFSM methodologies encompass a range of approaches for FSM extraction, synthesis, and iterative modification from diverse inputs:

Model Extraction from Text: Sequence-tagging (LinearCRF, BERT+CRF) maps protocol English descriptions to protocol-independent information language (PIIL) tags, enabling rule-based FSM assembly (Pacheco et al., 2022). This enables downstream formal verification or attack synthesis directly from RFC prose.
Program Synthesis from Traces and Properties: SAT-based systems (e.g., fbSAT (Chukharev et al., 2019)) infer minimal FSMs from execution trace sets and LTL specifications via reduction to propositional satisfiability, counterexample-guided synthesis loops (CEGIS), and symbolic minimization of state and guard complexity.
LLM-Guided Code Edits: LLMs, prompted with FSM code and natural-language edit requests, synthesize revised FSM implementations (e.g., robotics FSMs in SMACH/Python (Gan et al., 2024)). Structural correctness is confirmed via parsing and JSON diffing.

Table: Representative AutoFSM Approaches

Approach Type	Input	Output
Multi-agent LLM pipeline	English spec	Verilog RTL, testbench
Sequence-labeling + rules	Protocol docs (RFC)	FSM (Promela, JSON)
SAT+CEGIS synthesis	Traces + LTL	Minimal Moore automaton
LLM-guided code edit	Code + NL instruction	New code, FSM diff

4. AutoFSM in Symbolic Graphical Models and Optimization

Finite state machine representations (AutoFSMs) can concisely encode tabular factors in probabilistic graphical models, enabling scalable inference and optimization (Bistaffa, 2021). Each factor $f:D_1\times...\times D_k\rightarrow\mathbb{R}$ is represented as a weighted deterministic finite automaton $(\Sigma, Q, \delta, q_0, F, \omega)$ such that $f(x_1,...,x_k)=\omega(q_k)$ for the unique accepting run. This allows factor combination via automaton intersection, and elimination (marginalization or maximization) via alphabet layer removal and minimization.

This automata-based "FABE" bucket elimination provides runtime and memory improvements—up to 5 orders of magnitude on high-redundancy problems—relative to tabular and ADD-based solvers. The approach is efficient whenever in-factor redundancy enables compact automaton minimization, and is particularly suited to resource-constrained or high-dimensional inference tasks (Bistaffa, 2021).

5. Benchmarking, Experimental Results, and Metrics

AutoFSM systems have been evaluated in diverse domains, each with domain-specific metrics, datasets, and validation procedures:

Hardware FSM Synthesis (Luo et al., 12 Dec 2025): Assessed on the SKT-FSM benchmark (67 hierarchical FSMs, stratified by lines, states, transitions). Pass rates ( $P$ ) and syntax error rates ( $E$ ) serve as metrics. AutoFSM, under DeepSeek-V3 and GPT-4o LLMs, outperformed MAGE by up to 11.94 percentage points (pp) in pass rate and 17.62 pp in syntax error reduction, with larger gains on "hard" synthesis cases.
Robotic FSM Modification (Gan et al., 2024): Structural correctness (NoDiff, SmallDiff), edit-distance, and developer time saved are measured. LLM-driven tools reduced manual modification time by over 50% (GPT-4) and 97% (LLaMA), achieving 5/6 perfect structural matches on real-world FSM datasets.
Protocol FSM Extraction (Pacheco et al., 2022): PIIL label extraction is quantified using accuracy, weighted/macro F1, and span-matching metrics. Neural models with rule-correction achieved weighted F1 ≈ 64% and strict span match rates ≈ 69% (NeuralCRF). Even partial FSMs enabled attacker-synthesis when mapped to Promela.
SAT-based Model Inference (Chukharev et al., 2019): State- and guard-minimality are guaranteed via iterative minimization; mean solve-time <15 s for 100 random DFA (size up to 8). Compared to two-phase and LTL-synthesis tools, SAT+CEGIS is efficient, reliably yielding small models that satisfy both traces and LTL properties.

6. Online Adaptation and Learning: Evolving FSMs

Certain AutoFSM frameworks enable the online discovery and adaptation of state space via streaming data clustering and continuous matrix updates. The e-FSM framework (Han et al., 2019) represents FSMs as evolving tuples $\langle S_t, A, \{P_t^a\}\rangle$ , where $S_t$ (the set of states) and transition matrices $P_t^a$ grow alongside new observations. New clusters (states) are created as needed by the evolving Takagi–Sugeno algorithm. Probabilities of next state occupancy, recognition rates, and transition-model fidelity (Jensen–Shannon divergence $<$ 0.15) are tracked, demonstrating rapid adaptation—e.g., in collision risk detection for autonomous vehicles.

7. Insights, Scalability, and Future Directions

AutoFSM methods exhibit several recurring principles:

Separation of logic and implementation via IR or symbolic automata enables modularity, scalability, and flexibility (e.g., rapid pattern extension by updating JSON schema or compiled templates (Luo et al., 12 Dec 2025)).
Error mitigation through intermediate formalism and systematic error localization enables higher pass rates and lower debugging times in code generation tasks.
Automated adaptation and inference: Both online learning (e-FSM) and synthesis from traces/properties (SAT+CEGIS) advance the state of the art in self-building and self-adapting controllers, with clear quantitative performance metrics.
Resource and redundancy exploitation via automata-based factorizations compresses representation and accelerates inference on structured, highly redundant domains.

Limitations remain: some methods are restricted to flat or shallowly hierarchical FSMs, depend on external synthesis tools (e.g., fsm2sv, NuSMV), or are sensitive to the complexity of guard conditions and state cardinality. A plausible implication is that progress in richer IRs, first-class integration of symbolic manipulation, and broader EDA toolchain compatibility will further expand the scalability and impact of AutoFSM solutions. Promising directions include parameterized/state-vector FSMs, advanced coverage metrics (assertion/toggling), and plug-ins to mainstream EDA flows for fully automated hardware pipeline synthesis (Luo et al., 12 Dec 2025).

Markdown Upgrade to Chat

References (6)

AutoFSM: A Multi-agent Framework for FSM Code Generation with IR and SystemC-Based Testing (2025)

Automated Attack Synthesis by Extracting Finite State Machines from Protocol Specification Documents (2022)

fbSAT: Automatic Inference of Minimal Finite-State Models of Function Blocks Using SAT Solver (2019)

Can Large Language Models Help Developers with Robotic Finite State Machine Modification? (2024)

Faster Exact MPE and Constrained Optimization with Deterministic Finite State Automata (2021)

An Online Evolving Framework for Modeling the Safe Autonomous Vehicle Control System via Online Recognition of Latent Risks (2019)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to AutoFSM.