Finite State Machine Learning: ED-Batch
- Finite State Machine Learning (ED-Batch) is a dual-framework approach that unifies dynamic neural network batching via Q-learning with greedy state-merging for automata inference.
- The ED-Batch method formulates batching as a finite-horizon MDP using frontier-encoded states and PQ-tree memory planning to minimize kernel launches and improve data layout.
- Batch EDSM employs an evidence-driven state-merging algorithm on trace sets to infer deterministic finite automata with scalable APTA construction and heuristic-based merges.
Finite State Machine Learning (ED-Batch) refers to two distinct formalisms for automatic construction or application of finite state machines, unified by their batch-oriented operation and direct relevance for scientific and engineering workflows: (1) ED-Batch for efficient batching in dynamic neural networks via learned FSM scheduling and PQ-tree-based memory layout (Chen et al., 2023), and (2) batch EDSM (Evidence-Driven State-Merging), a well-known greedy approach to inferring finite automata or Mealy machines from trace sets (Hammerschmidt et al., 2017). Both methods formalize learning over combinatorial state spaces, but differ fundamentally in intent, model class, and algorithmic details.
1. Formalization of the FSM Learning Problem
The application of ED-Batch in dynamic neural networks reinterprets batching policy selection as a finite-horizon MDP. Given a mini-batch of input instances, each with a dynamic dataflow graph whose nodes are labeled by operation types (e.g., LSTMCell, Add), the state at every step is an encoding of the current execution frontier:
- , where is the sorted multiset of operation types ready to execute, ordered by decreasing counts.
- The action space ; at each step indicates executing all ready nodes of a given type.
- State transitions are deterministic: removal of all frontier nodes of the chosen type yields a new frontier.
- The reward is designed to minimize kernel launches:
This formulation acknowledges the NP-hardness of optimal batching, proven via reduction from Shortest Common Supersequence.
By contrast, batch EDSM as introduced in automata learning tasks operates on a sample multiset of traces over an alphabet 0. It constructs the augmented prefix-tree acceptor (APTA) of 1. The learning goal is to find a deterministic finite automaton 2 (or Mealy machine) consistent with 3 and minimized according to a scoring heuristic.
2. ED-Batch Learning for Dynamic Neural Network Batching
ED-Batch applies tabular Q-learning with an 4-step bootstrap to train the batching policy over the finite frontier-encoding state space (Chen et al., 2023):
- 5, updated via
6
- An 7-greedy policy is adopted during training.
- Training stops when the batch count per episode converges to the lower bound or stabilizes, quantified as:
8
The policy 9 then defines a deterministic FSM. At inference, this FSM maps each frontier-encoding to the next operation type to batch, generalizing across arbitrary batch sizes for the same computational graph template. The resulting FSM requires low memory (tens to hundreds of states), since the number of unique frontier-encodings is modest for typical network classes.
3. Batch EDSM: Evidence-Driven State-Merging Automata Learning
Batch EDSM begins with a set 0 of sample traces and constructs the APTA. States are initially two-colored (red/blue). At each step, all pairs 1 of (red, blue) states are considered for merging. Legal merges are scored according to evidence:
2
where 3 is the number of traces passing through child state 4. The normalized score is:
5
The highest-scoring legal merge above threshold 6 is executed, states are recolored, and the process iterates until no qualified merges remain or a state bound is reached (Hammerschmidt et al., 2017). The algorithm is greedy, and its computational cost is 7 in the size 8 of the APTA, but early stopping and filtering make it practical for many data-driven automata inference applications.
4. PQ-Tree Memory Planning and Adjacency Constraints
ED-Batch incorporates efficient memory planning for batched kernel execution by using a PQ-tree-based algorithm. After the FSM policy selects batching steps, the memory planner ensures that operands for each batch kernel are contiguous and ordered correctly to allow efficient memory access. The PQ-tree formalism expresses all permutations of a variable set 9 such that each subset (corresponding to a batch's operands) is consecutive.
The algorithm operates in two passes:
- Propagation of adjacency constraints: Build a PQ-tree over 0 with all batch operand constraints, propagating adjacency requirements breadth-first, and dropping infeasible batches.
- Alignment and ordering: Annotate PQ-tree nodes (Q-nodes with direction, P-nodes with permutations), solve for consistent assignments via union-find structures, then extract the final layout by constrained depth-first traversal.
The overall complexity is:
1
where 2 is the batch operand count and 3 is the maximum per-batch variable set size. This enables single-pass, global memory layout, substantially reducing data movement for static operator invocation (Chen et al., 2023).
5. Empirical Analysis and Theoretical Properties
ED-Batch achieves substantial speedups over prior dynamic batching frameworks across diverse DNN architectures:
- Chain models (e.g., BiLSTM-Tagger, LSTM-NMT): Matches minimal batch count; end-to-end gain 4 1.11–1.205; static cell memory layout via PQ-tree is 1.52–1.546 faster than DyNet’s memory allocator.
- Tree models (TreeLSTM, TreeGRU, MV-RNN): Up to 37% reduction in batch count; throughput speedup 1.46–1.637 (CPU), 1.23–1.298 (GPU).
- Lattice models (LatticeLSTM, LatticeGRU): Up to 3.279 reduction in batch count; latency reduction 34–35%; end-to-end throughput gain 1.32–2.970 (CPU), 2.54–3.711 (GPU).
The global averages: | Model Type | Speedup | |------------|---------| | Chain | 1.15× | | Tree | 1.39× | | Lattice | 2.45× |
The majority of performance gains derive from reduced kernel launch count and the elimination of memory gathers; graph construction and scheduling overheads are unchanged (Chen et al., 2023).
6. Limitations, Extensions, and Connections
Batch EDSM offers no mechanism to recover from early suboptimal merges if they meet the merge-score threshold, which can lead to over-generalization, particularly in sparse or noisy data settings (Hammerschmidt et al., 2017). Interactive extensions expose merge choices for user intervention, but pure batch operation remains greedy. Formal convergence guarantees exist when the target automaton lies within the search space and the trace set is sufficiently representative. Extensions involving global model selection (e.g., AIC, BIC), active queries, or spectral APTA initialization may strengthen empirical performance.
ED-Batch for neural networks learns compact, deterministic FSM policies tuned per network topology and reuses the same logic for arbitrary batch sizes with the same graph template. The PQ-tree layout offers a one-time cost amortized across all subsequent inference runs. Both techniques illustrate the versatility of FSM learning in symbolic modeling and resource optimization. While batch EDSM and ED-Batch are independent in origin and usage, they share a foundation in learning policies or structures over a discrete, combinatorial space via batch-efficient procedures.