Persistent Permutability in Theory and Practice

Updated 1 February 2026

Persistent permutability is a concept where systems lacking explicit permutation rules emulate full permutation behaviors using meta-structural encoding.
In type systems and Petri nets, relabelling and adjacent swaps simulate resource permutations to maintain execution consistency under rigid, constraint-driven frameworks.
In deep neural networks, permutation matrices facilitate layerwise feature matching, preserving semantic alignment and enabling effective layer pruning.

Persistent permutability is a concept that captures when permutations—mechanisms for systematically exchanging the order or position of entities—remain representable and operationally valid under strict syntactic, algebraic, or mechanistic constraints, even in settings where explicit rules for permutation are absent from the formalism. The notion has arisen independently in type theory, deep learning, and discrete event models such as Petri nets, each time illuminating the covert or emergent capacity for systems to simulate or recover “permutation” effects within a rigid, permutation-free structure. The following article provides a comprehensive account of persistent permutability across these domains, organizing definitions, technical results, and field-specific methodologies.

1. Formal Definitions and Mathematical Foundations

Persistent permutability is defined relative to the underlying algebra or system:

Type Systems: In rigid (syntax-directed) systems, e.g., sequence types in the coinductive system S, permutations of resources are not expressible via syntactic rules; however, the system can simulate all behaviors of systems where multiset-based, permutation-inclusive rules are present, by encoding permutation information in labelling or meta-structure (Vial, 2016).
Petri Nets: A marked Petri net $(N, M_0)$ is persistently permutable if every finite (or in some variants, fair infinite) firing sequence can be permuted, via adjacent swaps of independent transitions, into a persistent sequence—i.e., one in which no enabled transition disables another. The property is formalized using permutation equivalence $\equiv$ and “Short Persistent Equivalent” (SPE) and “Fair Persistent Equivalent” (FPE) conditions (Best et al., 25 Jan 2026).
Neural Networks: In mechanistic interpretability, persistent permutability denotes the existence, across consecutive neural network layers, of feature correspondences up to permutation—specifically, that interpretable features from one layer can be matched to features in deeper layers by permutation matrices, retaining meaningful semantic alignment over multiple transitions (Balagansky et al., 2024).

This concept is always anchored in the presence of a permutation-free or syntactically rigid substrate and refers to the ability to recover the expressive or operational equivalence associated with permutation-inclusive environments.

2. Persistent Permutability in Type Systems

System S, the “sequence types” system, exemplifies persistent permutability by being both fully syntax-directed and rigid—sequence types are labelled forests lacking any syntactic permutation rule. Nonetheless, by a representation theorem, every derivation in the multiset-based (permutation-inclusive) intersection system R can be “collapsed” from a derivation in S by erasing track annotations.

Key mechanisms include:

Track labelling: Resource occurrences are labelled with distinct numeric tracks.
Relabelling: Rather than employing permutation rules, swaps of argument occurrences are simulated by consistently relabelling tracks throughout a derivation.
Trivialization of interfaces: Non-trivial isomorphism-induced re-orderings can be reduced, through thread and track analysis, to identity labelling, provided certain obstructions (e.g., “brother chains”) are absent.
Deterministic reduction: Subject reduction in S is deterministic since argument matching is governed by track equality, yet all non-deterministic reduction paths of R can be represented by suitable relabellings.

A summary table clarifies key distinctions:

System	Resource Handling	Permutation Rule	Expressivity
R (multiset)	Bags (multisets)	Present	Permutative, non-deterministic
S (sequence)	Labelled sequences	Absent	Fully expressive via relabelling

Hence, persistent permutability in S allows simulation of any resource permutation in R (Vial, 2016).

3. Mechanistic Feature Permutability in Neural Networks

In deep neural networks, persistent permutability refers to the empirical and algorithmic discovery that interpretable features (e.g., those extracted via Sparse Autoencoders, SAEs) from one hidden layer can, after appropriate permutation, be matched to features in subsequent layers such that semantic identity is preserved.

This is operationalized via:

SAE Match algorithm: Features from layer $l_1$ and $l_2$ are matched using minimization of reconstruction-weighted mean squared error between folded decoder weights, using permutation matrices $P \in \mathcal{P}_F$ .
Parameter folding: Encoder and decoder weights are rescaled to incorporate learned thresholds, ensuring feature scales are comparable across layers.
Layerwise feature matching: Matching is quantified via MSE, external LLM semantic evaluation (“SAME”, “MAYBE”, “DIFFERENT”), and matching score (joint activation probability).
Persistence quantification: Monosemantic features (such as “dates” or “code tokens”) maintain identity under composed permutations for up to 4–6 layers before semantic drift degrades alignment.

This phenomenon not only demonstrates that the latent structure of neural computation permits “hidden” feature permutations across depth, but also provides an effective data-free method for analyzing and pruning layers while maintaining high-fidelity hidden state approximation (Balagansky et al., 2024).

4. Persistent Permutability in Discrete Event Systems: Petri Nets

In Petri net theory, persistence is a global behavioral property (no enabled transition disables another), while persistent permutability is a weaker property stating that every firing sequence can be permuted into a persistent one via adjacent independent swaps. Key definitions include:

Permutation equivalence: $\sigma \equiv \tau$ if $\tau$ can be obtained from $\sigma$ by repeated adjacent swaps of concurrently enabled, independent transitions.
SPE and FPE: A net satisfies SPE if every (finite) sequence has a persistent permutation-equivalent; FPE is the analogous property for fair (possibly infinite) sequences.

Significant results identify classes of nets (equal-conflict, free-choice, and pure dissymmetric-choice) for which SPE implies net persistence. Theorems and corollaries demonstrate that in such structured nets, local, sequence-level permutability suffices to ensure the absence of unwanted global disabling conflicts. Counterexamples show that, outside these classes, SPE does not imply persistence (Best et al., 25 Jan 2026).

5. Illustrative Schematics and Examples

Cross-domain examples highlight persistent permutability:

Type systems:
- In S, swapping argument occurrences (e.g., $(u,v) \to (v,u)$ ) is encoded by relabelling tracks rather than employing a “perm” rule.
Neural networks:
- Stable feature alignment is observed beyond layer 10 in transformer models, with monosemantic features traceable via permutations across several layers.
Petri nets:
- In a DC net, the sequence $d\,c\,a$ disables a choice, but swapping to $c\,d\,a$ yields a persistent execution.

A comparison of instantiations:

Domain	Object Permuted	Mechanism	Persistence Span
Type system S	Type tracks	Relabelling	Unbounded
Neural nets (SAE Match)	SAE features	Permutation matrices	4–6 layers
Petri nets (DC, EC)	Transition orderings	Adjacent swaps	Finite sequences (SPE)

6. Theoretical and Practical Implications

Persistent permutability substantiates several core insights:

Full expressive power in rigid systems: Even permutation-free systems can encode all behaviors present in permutation-inclusive systems through sufficiently rich meta-structural apparatus (e.g., track labels, root-interfaces).
Feature evolution in neural computation: The phenomenon explains how superposed latent features “unmix” and persist across depth, with later layers supporting one-to-one correspondences, enabling robust layerwise mechanistic interpretability and efficient layer pruning (Balagansky et al., 2024).
Algorithmic design for concurrency models: In Petri nets satisfying the structural conditions (EC, pure DC), finite inspection of permutations suffices for global property verification, potentially simplifying model-checking and scheduling algorithms.

7. Domain-Specific Open Problems and Limits

Open questions and limitations articulated in the recent literature include:

In Petri nets, whether the "pure" condition in Theorem 4.3 can be omitted, and whether SPE implies FPE for all safe nets remains unresolved (Best et al., 25 Jan 2026).
For neural models, the extension of persistent permutability to non-adjacent layers (long-range parametric flows $P(h, t\to t+k)$ ) and to non-transformer architectures is an active area of investigation (Balagansky et al., 2024).
Within rigid type systems, a plausible implication is that further generalizations of the representation theorem may be possible for broader classes of resource management languages.

Persistent permutability thus bridges theoretical expressivity, practical interpretability, and concurrency control by capturing how permutation effects survive, covertly or emergently, in fundamentally rigid or non-permutative systems.