Bisimulation and Causal States

Updated 5 February 2026

Bisimulation and causal states are foundational concepts that formalize behavioral equivalence and minimal information sufficiency for accurate prediction and control.
They underpin methodologies in reinforcement learning, process calculi, and Petri nets, facilitating model abstraction and system verification.
Recent advances employ quantitative bisimulation metrics to enhance sample efficiency in RL and improve robustness in partially observable settings.

Bisimulation and causal states are central concepts in the formal theory of dynamical systems, concurrency, automata, and reinforcement learning. Bisimulation provides a mathematical framework for behavioral equivalence in both discrete and continuous settings, while causal states formalize what information is necessary and sufficient for prediction and control. The two notions are deeply connected: causal states are typically the finest (minimal) bisimulation partition of histories, and bisimulation metrics quantify proximity between representations that differ in causally irrelevant details. These concepts underpin algorithms for abstraction, verification, and learning across disciplines including partial-observability Markov decision processes (POMDPs), process calculi, Petri nets, and effectful automata.

1. Formal Definitions: Causal States and Bisimulation

In partially observable environments, such as POMDPs or perturbed POMDPs (P²OMDPs), the minimal sufficient statistic for future prediction is given by the causal state. Formally, let $h_t$ denote the observation–action history up to time $t$ . Causal equivalence identifies $h_t \sim_\varepsilon h_t'$ iff their conditional distributions over all future observations and actions coincide: $\mathbb{P}\bigl(o_{t+1:\infty},\,a_{t:\infty}~|~h_t\bigr) = \mathbb{P}\bigl(o_{t+1:\infty},\,a_{t:\infty}~|~h_t'\bigr)$ The corresponding causal state is the equivalence class $S_t = \varepsilon(h_t)$ (Zhang et al., 2019).

Bisimulation, in the reinforcement learning tradition, is an equivalence relation $E$ on states such that $s_1 E s_2$ implies:

$R(s_1,a) = R(s_2,a)$ for all actions $a$
$\sum_{s'\in C}P(s'\mid s_1,a) = \sum_{s'\in C}P(s'\mid s_2,a)$ for every equivalence class $C$ .

For continuous state spaces, the bisimulation metric $d$ (the least fixed point of the Ferns–Panangaden–Precup operator) satisfies: $d(s_1, s_2) = \max_a\left\{ (1-\gamma)|r_{s_1}^a - r_{s_2}^a| + \gamma W_{d}(P_{s_1}^a, P_{s_2}^a)\right\}$ where $W_{d}$ is the Wasserstein distance induced by $d$ (Zhang et al., 2019, Li et al., 29 Nov 2025).

In event-structure and concurrency theory, bisimulation is defined coinductively over labeled transition systems or operational models (Petri nets, π-calculus, etc.), with explicit preservation of the causal order of events and concurrency structure (Gorrieri, 2022, Aubert et al., 2022).

2. Theoretical Connection: Causal States as Bisimulation Classes

In both RL and concurrency, the set of causal states coincides with the coarsest (finest-resolution) bisimulation partition sufficient for future prediction. In history-based MDPs, the causal-state partition is itself a bisimulation (Zhang et al., 2019). In the theory of Petri nets, causal bisimulation over "causal case graphs" yields equivalence classes that encode the entire causal history needed for further behavior. These classes are the so-called "causal states" (Bruni et al., 2015). In process calculi, causal-state representations are obtained as configurations (downward-closed, conflict-free sets) of events in an event-structure, with bisimulation preserving the causal partial order of events (Aubert et al., 2022).

In “True Concurrency Can Be Easy,” the equivalence class of a marking under step-net bisimulation is a causal state: all markings with the same unordered, partial-order history of events up to concurrency and conflict are identified. This extends to structure-preserving and causal-net bisimilarity (Gorrieri, 2022).

3. Quantitative Bisimulation Metrics and Value Approximation

Several works extend the qualitative notion of bisimulation to a quantitative metric, particularly in RL and learning theory, to reason about continuous representations and approximation. The bisimulation metric is the canonical behavioral pseudometric, with continuity properties: $|V^*(s_1) - V^*(s_2)| \leq \frac{1}{1-\gamma} d(s_1,s_2)$ where $V^*$ is the value function (Zhang et al., 2019). In recent advances, such as CaDiff (Li et al., 29 Nov 2025), a new bisimulation distance is introduced, leveraging the $p$ -Wasserstein metric between observable and denoised (causal) states: $d(\hat{s}, o) = \max_{a\in A}\left\{C_r W_p(P(r_{+1}\mid\hat{s},a), P(r_{+1}\mid o,a)) + C_s W_p(P(\hat{s}_{+1}\mid\hat{s},a), P(\hat{s}_{+1}\mid o,a))\right\}$ This framework provides explicit error bounds on value-function approximation, decomposing contributions from clustering, reward and transition modeling, and the bisimulation metric learning (Li et al., 29 Nov 2025).

4. Methodologies: Coalgebraic, Categorical, and Diffusion-Based Approaches

Coalgebraic approaches (as in Bruni–Montanari–Sammartino (Bruni et al., 2015)) recast causal bisimulation in the context of presheaf categories over posets, making causal dependencies explicit and enabling abstract minimization using symmetries. In category theory, effectful Mealy machines are captured by a state, initial state, and transition morphism, and bisimulation is characterized both syntactically (via uniform feedback extension) and coalgebraically (via spans of homomorphisms) (Bonchi et al., 2024).

In contemporary learning paradigms, CaDiff introduces asynchronous diffusion models (ADM) to denoise perturbed observation sequences, where forward Ornstein-Uhlenbeck (OU) noise is reversed by score-based networks, and the denoised representations are regularized by a bisimulation metric. This methodology is the first to provide both theoretical guarantees and practical gains for extracting causal states in P²OMDPs (Li et al., 29 Nov 2025).

5. Causal States and Bisimulation in Concurrency and Process Theory

Petri nets, event structures, and process calculi formalize non-interleaving or "true concurrency" systems. Step-net bisimulation, causal-net bisimulation, and history-preserving bisimulation progressively refine the granularity of causal-state tracking:

Step-net bisimilarity: matches sets of concurrently enabled transitions; equivalence = identical concurrency/conflict structure (Gorrieri, 2022).
Causal-net and structure-preserving bisimulation: maintain bijections on events and tokens, ensuring preservation of causal dependencies and identification of causal states.
History-preserving (HP) bisimulation: tracks not only event duration (ST-bisimilarity) but also the exact causal order among events in asynchronous transition systems (Aubert et al., 2022).

These frameworks reveal that bisimulation equivalence classes encode precisely the causal state relevant for predicting or simulating system behavior.

6. Applications, Empirical Results, and Hierarchies

Empirical evaluation of causal-state representations in RL demonstrates substantial gains in sample efficiency and robustness to partial observability and perturbations. In CaDiff, denoising with ADM and enforcing bisimulation regularity outperforms both diffusion-only and bisimulation-only algorithms by at least 14.18% across diverse Roboschool tasks (Li et al., 29 Nov 2025). Theoretical analysis in (Zhang et al., 2019) provides provable upper and lower bounds on suboptimality in planning with learned causal-state embeddings.

In concurrency theory, the hierarchy of bisimulations: $\text{i-bisimilarity} \subset \text{ST-bisimilarity} \subset \text{HP-bisimilarity} \subset \text{HP-failure bisimilarity}$ maps levels of causal-structural precision needed for system analysis, specification, and attacker modeling.

7. Summary Table: Bisimulation and Causal State Approaches

Framework	Causal State Construction	Bisimulation Criterion
RL/POMDPs (Zhang et al., 2019, Li et al., 29 Nov 2025)	Coarsest history partition that predicts future	Reward + transition consistency; possibly metric-based
Petri nets (Gorrieri, 2022, Bruni et al., 2015)	Markings/events up to concurrent causality	Step/causal/structure bisimilarity
π-calculus (Aubert et al., 2022)	Pending event-sets in LATS/event structures	ST-bisimilarity, HP-bisimilarity
Categorical (Bonchi et al., 2024)	State/transition as categorical coalgebras	Uniform feedback span bisimulation

The unifying theme is that causal states provide the minimal granularity at which bisimulation must be enforced to fully capture the system's behavioral information; bisimulation metrics and coalgebraic/categorical constructions enable computation, minimization, and quantitative analysis in both classical and learning-theoretic settings.