Papers
Topics
Authors
Recent
Search
2000 character limit reached

Finite State Machine Learning: ED-Batch

Updated 6 May 2026
  • Finite State Machine Learning (ED-Batch) is a dual-framework approach that unifies dynamic neural network batching via Q-learning with greedy state-merging for automata inference.
  • The ED-Batch method formulates batching as a finite-horizon MDP using frontier-encoded states and PQ-tree memory planning to minimize kernel launches and improve data layout.
  • Batch EDSM employs an evidence-driven state-merging algorithm on trace sets to infer deterministic finite automata with scalable APTA construction and heuristic-based merges.

Finite State Machine Learning (ED-Batch) refers to two distinct formalisms for automatic construction or application of finite state machines, unified by their batch-oriented operation and direct relevance for scientific and engineering workflows: (1) ED-Batch for efficient batching in dynamic neural networks via learned FSM scheduling and PQ-tree-based memory layout (Chen et al., 2023), and (2) batch EDSM (Evidence-Driven State-Merging), a well-known greedy approach to inferring finite automata or Mealy machines from trace sets (Hammerschmidt et al., 2017). Both methods formalize learning over combinatorial state spaces, but differ fundamentally in intent, model class, and algorithmic details.

1. Formalization of the FSM Learning Problem

The application of ED-Batch in dynamic neural networks reinterprets batching policy selection as a finite-horizon MDP. Given a mini-batch of NN input instances, each with a dynamic dataflow graph GG whose nodes are labeled by operation types TT (e.g., LSTMCell, Add), the state at every step is an encoding of the current execution frontier:

  • St=E(Gt)S_t = E(G_t), where Esort(G)E_{sort}(G) is the sorted multiset of operation types ready to execute, ordered by decreasing counts.
  • The action space A=TA = T; at each step at∈Aa_t \in A indicates executing all ready nodes of a given type.
  • State transitions are deterministic: removal of all frontier nodes of the chosen type ata_t yields a new frontier.
  • The reward is designed to minimize kernel launches:

rt=R(St,at)=−1+α⋅∣Frontier(Gtat)∣∣Frontierat(Gt)∣r_t = R(S_t, a_t) = -1 + \alpha \cdot \frac{|\mathrm{Frontier}(G_t^{a_t})|}{|\mathrm{Frontier}_{a_t}(G_t)|}

This formulation acknowledges the NP-hardness of optimal batching, proven via reduction from Shortest Common Supersequence.

By contrast, batch EDSM as introduced in automata learning tasks operates on a sample multiset SS of traces over an alphabet GG0. It constructs the augmented prefix-tree acceptor (APTA) of GG1. The learning goal is to find a deterministic finite automaton GG2 (or Mealy machine) consistent with GG3 and minimized according to a scoring heuristic.

2. ED-Batch Learning for Dynamic Neural Network Batching

ED-Batch applies tabular Q-learning with an GG4-step bootstrap to train the batching policy over the finite frontier-encoding state space (Chen et al., 2023):

  • GG5, updated via

GG6

  • An GG7-greedy policy is adopted during training.
  • Training stops when the batch count per episode converges to the lower bound or stabilizes, quantified as:

GG8

The policy GG9 then defines a deterministic FSM. At inference, this FSM maps each frontier-encoding to the next operation type to batch, generalizing across arbitrary batch sizes for the same computational graph template. The resulting FSM requires low memory (tens to hundreds of states), since the number of unique frontier-encodings is modest for typical network classes.

3. Batch EDSM: Evidence-Driven State-Merging Automata Learning

Batch EDSM begins with a set TT0 of sample traces and constructs the APTA. States are initially two-colored (red/blue). At each step, all pairs TT1 of (red, blue) states are considered for merging. Legal merges are scored according to evidence:

TT2

where TT3 is the number of traces passing through child state TT4. The normalized score is:

TT5

The highest-scoring legal merge above threshold TT6 is executed, states are recolored, and the process iterates until no qualified merges remain or a state bound is reached (Hammerschmidt et al., 2017). The algorithm is greedy, and its computational cost is TT7 in the size TT8 of the APTA, but early stopping and filtering make it practical for many data-driven automata inference applications.

4. PQ-Tree Memory Planning and Adjacency Constraints

ED-Batch incorporates efficient memory planning for batched kernel execution by using a PQ-tree-based algorithm. After the FSM policy selects batching steps, the memory planner ensures that operands for each batch kernel are contiguous and ordered correctly to allow efficient memory access. The PQ-tree formalism expresses all permutations of a variable set TT9 such that each subset (corresponding to a batch's operands) is consecutive.

The algorithm operates in two passes:

  1. Propagation of adjacency constraints: Build a PQ-tree over St=E(Gt)S_t = E(G_t)0 with all batch operand constraints, propagating adjacency requirements breadth-first, and dropping infeasible batches.
  2. Alignment and ordering: Annotate PQ-tree nodes (Q-nodes with direction, P-nodes with permutations), solve for consistent assignments via union-find structures, then extract the final layout by constrained depth-first traversal.

The overall complexity is:

St=E(Gt)S_t = E(G_t)1

where St=E(Gt)S_t = E(G_t)2 is the batch operand count and St=E(Gt)S_t = E(G_t)3 is the maximum per-batch variable set size. This enables single-pass, global memory layout, substantially reducing data movement for static operator invocation (Chen et al., 2023).

5. Empirical Analysis and Theoretical Properties

ED-Batch achieves substantial speedups over prior dynamic batching frameworks across diverse DNN architectures:

  • Chain models (e.g., BiLSTM-Tagger, LSTM-NMT): Matches minimal batch count; end-to-end gain St=E(Gt)S_t = E(G_t)4 1.11–1.20St=E(Gt)S_t = E(G_t)5; static cell memory layout via PQ-tree is 1.52–1.54St=E(Gt)S_t = E(G_t)6 faster than DyNet’s memory allocator.
  • Tree models (TreeLSTM, TreeGRU, MV-RNN): Up to 37% reduction in batch count; throughput speedup 1.46–1.63St=E(Gt)S_t = E(G_t)7 (CPU), 1.23–1.29St=E(Gt)S_t = E(G_t)8 (GPU).
  • Lattice models (LatticeLSTM, LatticeGRU): Up to 3.27St=E(Gt)S_t = E(G_t)9 reduction in batch count; latency reduction 34–35%; end-to-end throughput gain 1.32–2.97Esort(G)E_{sort}(G)0 (CPU), 2.54–3.71Esort(G)E_{sort}(G)1 (GPU).

The global averages: | Model Type | Speedup | |------------|---------| | Chain | 1.15× | | Tree | 1.39× | | Lattice | 2.45× |

The majority of performance gains derive from reduced kernel launch count and the elimination of memory gathers; graph construction and scheduling overheads are unchanged (Chen et al., 2023).

6. Limitations, Extensions, and Connections

Batch EDSM offers no mechanism to recover from early suboptimal merges if they meet the merge-score threshold, which can lead to over-generalization, particularly in sparse or noisy data settings (Hammerschmidt et al., 2017). Interactive extensions expose merge choices for user intervention, but pure batch operation remains greedy. Formal convergence guarantees exist when the target automaton lies within the search space and the trace set is sufficiently representative. Extensions involving global model selection (e.g., AIC, BIC), active queries, or spectral APTA initialization may strengthen empirical performance.

ED-Batch for neural networks learns compact, deterministic FSM policies tuned per network topology and reuses the same logic for arbitrary batch sizes with the same graph template. The PQ-tree layout offers a one-time cost amortized across all subsequent inference runs. Both techniques illustrate the versatility of FSM learning in symbolic modeling and resource optimization. While batch EDSM and ED-Batch are independent in origin and usage, they share a foundation in learning policies or structures over a discrete, combinatorial space via batch-efficient procedures.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Finite State Machine Learning (ED-Batch).