Universal FSA Emulation by Neural Finite-State Machines

Updated 7 December 2025

Universal FSA Emulation is a method where finite-depth feedforward ReLU and threshold networks simulate any deterministic finite automaton by encoding state transitions for bounded-length inputs.
It employs explicit layerwise constructions, including one-hot and binary encodings with two-layer transition modules, to implement regular language recognition.
The approach demonstrates exponential state compression, latent embeddings of Myhill–Nerode equivalence classes, and a formal expressivity boundary for fixed-depth networks.

Universal finite-state automaton (FSA) emulation refers to the capacity of certain neural network architectures—specifically, finite-depth feedforward ReLU and threshold networks—to exactly simulate any deterministic finite automaton (DFA) on bounded-length inputs, thus acting as "neural finite-state machines" (N-FSMs). This is achieved through explicit layerwise constructions that encode DFA state transitions in the network’s depth, enabling precise realization of regular languages and delineating a formal expressivity boundary for such networks. The central results formalize layer and parameter requirements, provide state compression strategies, establish embeddings of Myhill–Nerode equivalence classes into continuous latent spaces, and rigorously show that fixed-depth networks cannot recognize non-regular languages (Dhayalkar, 16 May 2025).

1. Definition and Theoretical Framework

A deterministic finite automaton is defined as a 5-tuple $M = (Q, \Sigma, \delta, q_0, F)$ , where $Q$ is a finite set of $n$ states, $\Sigma$ a finite input alphabet of size $k$ , $\delta: Q \times \Sigma \to Q$ the deterministic transition function, $q_0 \in Q$ the initial state, and $F \subseteq Q$ the accepting states. On input $x = s_1 s_2 \cdots s_T \in \Sigma^T$ , the DFA recursively applies $h_t = \delta(h_{t-1}, s_t)$ for $t = 1, \dots, T$ and accepts if $h_T \in F$ .

An N-FSM corresponding to $M$ is a feedforward network $f_\theta: \mathbb{R}^{Tk} \to \{0,1\}$ satisfying $f_\theta(x) = 1$ if $\hat\delta(q_0, x) \in F$ and $0$ otherwise. At each network layer, the hidden representation encodes the DFA state and the final layer tests for membership in $F$ (Dhayalkar, 16 May 2025).

2. Explicit Construction of Feedforward Emulators

Given a DFA $M = (Q, \Sigma, \delta, q_0, F)$ , the emulation proceeds through explicit neural architectures:

2.1. One-Hot Encodings

Symbols: Each $a \in \Sigma$ is represented by $u^{(a)} \in \{0,1\}^k$ .
States: Each $q_j \in Q$ is mapped to $e^{(j)} \in \{0,1\}^n$ .
At step $t$ , the hidden state $h_{t-1} \in \{0,1\}^n$ is concatenated with $u^{(s_t)}$ to form the input to the transition module.

2.2. Two-Layer Transition Modules

Hidden Layer: For each $(j, a)$ , an “AND-unit” $z_{j,a} = \mathrm{ReLU}(h_{t-1,j} + u^{(a)}_a - 1)$ ensures activation if and only if the network is in state $q_j$ and receives symbol $a$ .
Output Layer: The next state $h_{t,i}$ is computed as a sum over the activated $z_{j,a}$ , with weights $W^{(2)}_{i, (j,a)} = 1$ iff $\delta(q_j, a) = q_i$ .

2.3. Readout Layer

Once $T$ symbols are processed, the network produces $h_T \in \{0,1\}^n$ . The indicator $y = \mathrm{step}(v^\top h_T - \frac{1}{2})$ (with $v_i = 1$ iff $q_i \in F$ ) determines acceptance.

2.4. Depth and Width Bounds

Construction	Depth $D$	Width $W$
One-hot + ReLU	$2T + 1$	$O(nk)$
Binary + threshold	$2T + 1$	$O(nk)$

The construction ensures exact simulation for all $x$ of length $\leq T$ . For inputs restricted to at most $n$ symbols, $D \leq 2n + 1$ .

3. Exponential State Compression

State encodings can be exponentially compressed by representing DFA states as $\lceil \log_2 n\rceil$ -bit binary codes:

Binary State Encoding: Map each $q_j$ to $b^{(j)} \in \{0,1\}^d$ with $d = \lceil \log_2 n \rceil$ .
Transition Realization: Each output bit $\ell$ is a Boolean function $f_\ell: \{0,1\}^{d+k} \to \{0,1\}$ specifying the $\ell$ -th bit of $\mathcal{B}(\delta(q, u))$ .
Threshold Circuits: Classical results guarantee a depth-2 threshold circuit for any finite Boolean function. Each bit is realized via threshold gates.

This approach achieves hidden-state width $d = O(\log n)$ . Depth and overall layer width are preserved at $2T+1$ and $O(nk)$ , respectively (Dhayalkar, 16 May 2025).

4. Myhill–Nerode Equivalence and Latent Embeddings

The Myhill–Nerode relation partitions strings into equivalence classes corresponding to DFA states:

Embedding Theorem: There exists a feedforward network $f_\theta: \Sigma^T \to \mathbb{R}^d$ such that $f_\theta(x) = f_\theta(y)$ iff $x \equiv_\mathcal{L} y$ . If $x \not\equiv_\mathcal{L} y$ , then $f_\theta(x) \neq f_\theta(y)$ .
Construction: Run DFA simulation to obtain $g(x) = e^{(i)} \in \{0,1\}^n$ , then project using $V \in \mathbb{R}^{d \times n}$ mapping each $e^{(i)}$ to distinct $v_i$ in $\mathbb{R}^d$ .
Johnson–Lindenstrauss Compression: The set $\{v_1, \dots, v_n\}$ can be further reduced to dimension $d' = O(\log n)$ while preserving linear separability, thus embedding equivalence classes in low-dimensional latent space (Dhayalkar, 16 May 2025).

5. Expressivity Limitations: Boundary for Regular Languages

Feedforward networks of fixed depth and width possess a finite partitioning capacity:

Linear Region Bound: For depth $D$ and width $W$ , any such network partitions input space $\mathbb{R}^{Tk}$ into $R = O(T^{WD})$ regions.
Non-Regular Language Limitation: Languages such as $\{a^n b^n : n \geq 1\}$ require capacity for infinitely many regions, which exceeds what is possible for fixed $D$ and $W$ . Thus, such networks cannot recognize non-regular languages.
Formal Lower Bound: For every such network, there exists $N = N(D,W)$ such that, for all $n > N$ , correct classification of all strings in $\{a^n b^n\} \cup \{a^n b^{n+1}\}$ is impossible. Only regular languages are exactly recognizable (Dhayalkar, 16 May 2025).

6. Synthesis of Results and Architectural Trade-Offs

Emulation Mode	Depth $D$	Width $W$	State Width
One-hot encoding + ReLU	$2T+1$	$O(nk)$	$n$
Binary encoding + threshold	$2T+1$	$O(nk)$	$O(\log n)$
Myhill–Nerode embedding	$2T+1$	$O(nk)$	$O(\log n)$

State Compression: Exponential compression from $n$ -dim. one-hot to $O(\log n)$ -bit binary without sacrificing expressivity for finite-state computations.
Latent Embeddings: Faithful, linearly-separable vectorial mapping of equivalence classes, with further dimension reduction via random projection possible.
Expressivity Boundary: The constructive approach delineates that regular languages are both the upper and lower bounds of what N-FSMs can recognize with fixed architecture.
Bridging Symbolic and Neural Computation: These results rigorously instantiate a blueprint for realizing symbolic algorithms within neural architectures (Dhayalkar, 16 May 2025).

7. Context and Significance

The established equivalence between neural finite-state machines and DFAs provides a mathematically precise characterization of neural network capacity in symbolic sequence processing, automata simulation, and neural-symbolic integration. The constructive nature of the methods contrasts with prior heuristic or probing-based analyses, supplying explicit network weights, architectures, and representations. This formalization enables principled design of neural models for tasks where regular language structure is fundamental, while also identifying exact limits for problems involving unbounded memory or non-regular languages. As such, universal FSA emulation serves as a foundational result, bridging disciplines and informing further research on the correspondence between discrete automata and continuous neural computation (Dhayalkar, 16 May 2025).

PDF Markdown Chat (Pro)

References (1)

Neural Networks as Universal Finite-State Machines: A Constructive Deterministic Finite Automaton Theory (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Universal FSA Emulation.