Finite-Memory Controllers

Updated 11 May 2026

Finite-memory controllers are control policies implemented via finite automata that update a limited internal state to select actions.
They are pivotal in reactive synthesis, automated planning, and robotics, offering correct-by-design solutions under uncertainty and partial observability.
Synthesis approaches utilize exact, symbolic, learning-based, and robust optimization methods to guarantee performance with proven memory bounds in safety-critical systems.

A finite-memory controller is a control policy representable by a finite automaton or a data structure of bounded size that, by storing and updating a limited vector of memory states, maps system observations or histories to control actions. Such controllers form a central paradigm across reactive synthesis, automated planning, robotics, and formal methods for discrete and hybrid systems, with broad applications in synthesizing correct-by-design controllers, handling partial observability, managing uncertainty, and ensuring computational tractability.

1. Formal Models and Structural Definitions

The mathematical foundation of finite-memory controllers (FMCs) is the Mealy/Moore machine abstraction. For fully observable, discrete-state settings (e.g., finite games or planning), an FMC is a tuple

$\mathcal{M} = (M, m_0, \sigma_u, \sigma_o)$

where:

$M$ is a finite set of memory states.
$m_0 \in M$ is the initial memory state.
$\sigma_u: M \times V \to M$ is the memory update function (with $V$ the system states/observations).
$\sigma_o: M \times V \to A$ is the output (next-action) function, mapping current memory and system state to a control action $A$ .

A controller's operation unfolds as a sequence: at each transition, the memory state is updated via $\sigma_u$ (depending on current state and/or observation), and the control action is chosen per $\sigma_o$ .

For partially observable contexts such as POMDPs, controllers are typically represented as stochastic finite state controllers (FSCs). An FSC is a tuple $(N, n^0, \psi, \eta)$ where:

$M$ 0 is a finite set of controller nodes (internal memory states).
$M$ 1 is the initial node.
$M$ 2 is the action-selection mapping (possibly probabilistic, depending on observed $M$ 3).
$M$ 4 is the (possibly stochastic) memory-update function.

This structure admits both deterministic and stochastic (randomized) policies, essential for handling uncertainty or improving worst-case guarantees (Amato et al., 2012, Simão et al., 2023, Cubuktepe et al., 2020).

For systems modeled as timed automata or hybrid models, the memory structure may include Boolean predicates, counters, and system clocks, as in the memory-efficient real-time controllers for safety objectives (Chatterjee et al., 2011), or the parameter vector maintaining a window of past state estimation errors in FMCs for soft robots (Wu et al., 2023).

2. Synthesis and Optimization Algorithms

Finite-memory controllers are central objects in controller synthesis across both game-theoretic and data-driven contexts. The synthesis methodologies span exact, symbolic, learning-based, and robust optimization approaches.

Synthesis in Games and Reactive Systems

In the setting of reactive synthesis, the construction of a FMC reduces to identifying a memory structure $M$ 5 sufficient to realize the winning objective, then building the product of the system model (game/arena) and $M$ 6, reducing synthesis to a problem over memoryless (positional) strategies on the product arena (Randour, 4 Sep 2025, Roux et al., 2018). Key metatheorems delineate conditions under which finite-memory suffices for a given specification—typically, winning conditions that are $M$ 7-regular or that can be built by Boolean combinations of regularly-predictable objectives admit finite-memory determinacy, and sharp (sometimes exponential or pseudo-polynomial) bounds are established for memory size (Randour, 4 Sep 2025, Roux et al., 2018).

Abstraction and Correctness Formalism

In complex hybrid or continuous-state systems, finite-state abstractions are constructed (by quantization, region-building, or neural surrogates), and control is synthesized on the abstract system; the resulting abstract controller is then refined for implementation on the original system (Tarraf, 2011, Majumdar et al., 2023). This approach guarantees correct-by-construction behavior, provided the abstraction preserves a gain condition or feedback refinement relation.

Learning-based and Data-driven FMCs

For high-dimensional or partially known dynamics, learning-based approaches seek to optimize the finite-memory controller via reinforcement learning, stochastic gradient approaches (VAPS), or direct RL weight optimization for controllers with limited history windows (Wu et al., 2023, Meuleau et al., 2013). In soft robot control, an FMC storing a fixed window of previous tracking errors and outputting an actuation via a weighted sum can be trained efficiently by off-the-shelf RL algorithms (DDPG, DQN, SAC), outperforming both LSTM controllers and full-scale RL policies in convergence speed and data efficiency for tasks with limited long-term temporal dependencies (Wu et al., 2023).

Robust and Uncertain Environments

In POMDPs (and their robust variants), finite-memory controllers are synthesized by solving exact or approximate optimization programs:

Synthesis may proceed by casting controller optimization as a nonlinear or linear program (LP/NLP) with variables representing controller parameters and subject to Bellman-like constraints over the belief or state-memorized space. Efficient dualization and linearization reduce robust FSC synthesis in uncertain POMDPs to a tractable finite LP (Cubuktepe et al., 2020).
In decentralized settings (DEC-POMDPs), joint stochastic FSCs for all agents can be optimized by nonlinear programming with explicit independence (factorization) constraints (Amato et al., 2012).

Estimator-based approaches may leverage finite-history quantization: for a given memory depth $M$ 8, the value loss of the best $M$ 9-memory controller decays exponentially in $m_0 \in M$ 0 under filter stability assumptions (Kara et al., 2020). Explicit quantization and DP or LP solution on the quantized state space guarantee near-optimality with a rigorously stated rate of convergence.

3. Memory Bounds, Representational Complexity, and Trade-offs

Sharp worst-case memory bounds for controllers have been derived for broad classes of specifications:

For safety objectives in timed-automaton games, the minimal memory required is $m_0 \in M$ 1 bits, where $m_0 \in M$ 2 is the number of system clocks (Chatterjee et al., 2011). This linear bound is tight—memoryless region strategies do not suffice.
In games with Boolean combinations of objectives (e.g., conjunctions of reachability, parity, or bounded-energy), memory can scale from linear, to exponential, to non-elementary in the number of objectives (Randour, 4 Sep 2025, Roux et al., 2018). For example, generalized Büchi (conjunction of $m_0 \in M$ 3 Büchi conditions) requires $m_0 \in M$ 4 states; bounded-energy specifications scale with the product of energy bounds; mean-payoff conjunctions may require infinite memory.
In practical deployments, pseudo-polynomial memory implemented as counters or lookup tables (possibly compressed via neural surrogates) suffices for many industrial cases (Majumdar et al., 2023).

These bounds delineate feasibility frontiers of real-world synthesis and motivate the use of amply expressive, yet memory-efficient, representations—e.g., data-driven linear FMCs for soft robots (with $m_0 \in M$ 5 parameters), lookup-tables compressed by neural classifiers, or hierarchical controllers with recursion for generalized planning (Wu et al., 2023, Segovia-Aguas et al., 2019, Majumdar et al., 2023).

4. Applications and Empirical Performance

Finite-memory controllers are pervasive in the engineering of correct, robust, and efficient control systems:

In safety-critical real-time systems, such as those modeled by timed automata, linear-size memory controllers have replaced prior exponential-memory solutions (Chatterjee et al., 2011).
In robotics (soft manipulators), FMCs trained with RL attain sub-milliradian tracking error using as few as four memory parameters, while converging an order of magnitude faster than LSTM or deep RL baselines (Wu et al., 2023).
Hierarchical FSCs in planning drastically reduce the controller size for families of combinatorial tasks (e.g., binary trees, visitation problems) compared to flat controllers—e.g., recursive depth-first search in planning can be realized with $m_0 \in M$ 6 controller states vs. exponential states required by non-hierarchical models (Segovia-Aguas et al., 2019).
For POMDPs and uncertain POMDPs, robust FSC construction yields policies provably satisfying performance and constraint specifications across all admissible models; linear programming methods enable the synthesis of robust FSCs for large-scale air-traffic and spacecraft motion planning problems with very limited controller memory (2-3 nodes) in seconds (Cubuktepe et al., 2020).

Compression techniques, including neural abstraction-based synthesis, enable a reduction by five or more orders of magnitude in online controller memory, replacing explicit lookup tables with compact neural representations without loss of soundness (Majumdar et al., 2023).

5. Limitations, Expressiveness, and Open Problems

While finite-memory controllers offer tractability and implementability, their expressiveness is limited by the chosen memory architecture:

Fixed-memory controllers cannot encode dependencies outside their observation window (for finite window-based FMCs), nor can they capture unbounded temporal patterns unless hierarchies or recursion are exploited (Segovia-Aguas et al., 2019).
There exist control objectives (e.g., conjunctions of unbounded mean-payoff or certain quantitative objective combinations) for which no finite-memory solution exists (Randour, 4 Sep 2025, Roux et al., 2018).
The practical “simplicity” of a controller often depends not only on state count, but on the data structure chosen (e.g., explicit automata, parameterized counters, neural nets, or imperative code) (Randour, 4 Sep 2025). There is ongoing work to develop a representation-agnostic theory of strategy complexity and classify objectives by their minimal representational requirements.

Robustness to modeling uncertainty often necessitates either randomized strategies or conservatism in policy update; whether richer memory can reduce the need for randomness, or vice versa, remains a topic of theoretical investigation (Cubuktepe et al., 2020, Randour, 4 Sep 2025).

6. Synthesis in Practice: Methodologies and Workflows

Synthesis of finite-memory controllers unfolds in multiple frameworks:

Application	Controller Model	Synthesis Method
Timed/safety	State + booleans + clocks	Symbolic game solving, Zielonka tree
POMDP	Finite-state automaton (FSC)	Constrained DP/LP, SPI algorithms
Abstraction/hybrid	Moore/Mealy automata	Product construction, abstraction
Soft robot	Linear windowed error FSC	RL, DDPG, DQN, SAC optimization
Planning	(Hierarchical) FSC, recursion	Compilation to classical planning
Uncertain POMDP	Stochastic FSC	Dualization, LP, robust optimization

Across these domains, certified-by-design and correct-by-construction approaches prevail, with sound abstraction or theory-backed approximate synthesis conferring rigorous guarantees on the closed-loop system (Tarraf, 2011, Majumdar et al., 2023). Learning-based approaches are justified when full models are unavailable or too complex for explicit solution.

7. Future Directions and Research Challenges

Principal directions for ongoing research include:

Representation Agnosticism: Classification of control tasks by true minimal complexity under flexible representations (beyond automata-based models).
Robustness and Learning: Tighter integration of learning-based controller synthesis (particularly with stochastic or robust FSCs) while preserving explicit correctness guarantees.
Open-memory/Hierarchy: Systematic exploitation of hierarchical and recursive memory, especially for generalized planning and combinatorial tasks (Segovia-Aguas et al., 2019).
Hybrid and Deep-Abstraction Synthesis: Scaling up abstraction-based and neural-symbolic controller synthesis pipelines to higher-dimensional, safety-critical domains, with formal soundness maintained throughout (Majumdar et al., 2023).
Trade-offs between Randomness and Memory: Elucidation of the interplay between randomized vs. finite-memory strategies, both from complexity-theoretic and performance perspectives (Randour, 4 Sep 2025).

The theoretical constraints illuminated by recent metatheorems provide a roadmap identifying both the power and the boundaries of finite-memory controllers across modern control and artificial intelligence.