Self-Modifying Pushdown Systems

Updated 10 November 2025

Self-Modifying Pushdown Systems (SM-PDS) are formal models that extend traditional pushdown systems by allowing dynamic modification of transition rules to capture self-modifying behaviors in code.
SM-PDS enable automated analysis techniques such as backward/forward reachability and LTL model checking, which are crucial for analyzing obfuscated and malicious software.
Experimental evaluations show that SM-PDS achieve 100% detection of self-modifying malware with improved performance and resource efficiency compared to traditional static analysis methods.

Self-Modifying Pushdown Systems (SM-PDS) are a formal extension of classical pushdown systems (PDS) designed to model and enable automated analysis of self-modifying code, particularly as encountered in obfuscated or malicious software. In SM-PDS, the set of transition rules can change dynamically during execution, reflecting the key ability of self-modifying programs to alter their instruction set at runtime. This modeling paradigm enables the application of reachability analysis, Linear Temporal Logic (LTL) model checking, and malware detection in the presence of code that evades traditional static verification techniques.

1. Formal Definition and Semantics

An SM-PDS is defined as a tuple $\mathcal{P} = (P, \Gamma, \Delta, \Delta_c)$ , where $P$ is a finite set of control points (states), $\Gamma$ is a finite stack alphabet, $\Delta \subseteq (P \times \Gamma) \times (P \times \Gamma^*)$ is the set of standard PDS transition rules, and $\Delta_c \subseteq P \times (\Delta\cup\Delta_c)\times(\Delta\cup\Delta_c)\times P$ encodes self-modifying rules. A configuration is of the form $(\langle p, w\rangle, \theta)$ where $p \in P$ , $w \in \Gamma^*$ (stack content), and $\theta \subseteq \Delta \cup \Delta_c$ is the current set of enabled rules (referred to as the “phase”).

There are two key operational semantics:

Ordinary rule application (Std): If $\langle p, \gamma\rangle \hookrightarrow \langle p', w'\rangle \in \theta \cap \Delta$ , then

$(\langle p, \gamma u\rangle, \theta) \Rightarrow_{\mathcal{P}} (\langle p', w'u\rangle, \theta).$

Self-modification (Mod): If $r = p\#(r_1, r_2)p' \in \theta \cap \Delta_c$ and $r_1 \in \theta$ , then

$(\langle p, w\rangle, \theta) \Rightarrow_{\mathcal{P}} (\langle p', w\rangle, \theta'),$

where $\theta' = (\theta \setminus \{r_1\}) \cup \{r_2\}$ .

If $\Delta_c = \varnothing$ , $\mathcal{P}$ reduces to an ordinary PDS. The system thus permits arbitrary switching of its currently enabled transition rules, allowing a precise encoding of program phases for code that changes its control-flow or stack behavior at runtime (Touili et al., 2019, Touili et al., 2019).

2. Reachability Analysis

The central analysis tasks for SM-PDS are backward reachability ( $pre^*$ ) and forward reachability ( $post^*$ ), computed with automata-theoretic saturation algorithms.

Backward Reachability ( $pre^*$ )

Given a regular target set $C$ of configurations, the aim is to compute all configurations from which $C$ is reachable. $C$ is represented by a $P$ -automaton $\mathcal{A} = (Q, \Gamma, T, P, F)$ over configurations, accepting $(\langle p, w\rangle, \theta)$ iff $(p, \theta) \xrightarrow{w}_T q$ for $q \in F$ . Saturation uses two rules:

$\alpha_1$ (ordinary rules): If $\langle p, \gamma\rangle \to \langle p', w\rangle \in \Delta$ and $(p', \theta) \xrightarrow{w}_{T'} q$ , then $(p, \theta) \xrightarrow{\gamma}_{T'} q$ is added.
$\alpha_2$ (self-modifying rules): If $r = p\#(r_1, r_2)p_1 \in \Delta_c$ and $(p_1, \theta') \xrightarrow{\gamma}_{T'} q$ , with $\theta' = (\theta \setminus \{r_1\}) \cup \{r_2\}$ , then $(p, \theta) \xrightarrow{\gamma}_{T'} q$ is added.

The fixpoint $\mathcal{A}_{pre^*}$ thus accepts exactly $pre^*(C)$ (Touili et al., 2019).

Forward Reachability ( $post^*$ )

Forward reachability is constructed similarly, with four rules $\beta_1, \beta_2, \beta_3, \beta_4$ manipulating transitions depending on the operation (pop, single push, double push, and self-modification). This procedure ensures that $\mathcal{A}_{post^*}$ accepts $post^*(C)$ .

Both analyses scale in the number of changing rules $m = |\Delta| + |\Delta_c|$ , with time and space $O(2^m)$ , but only generate reachable phases in practice, allowing effective analysis even for real-world code.

3. LTL Model Checking and SM-BPDS

LTL model checking for self-modifying code is achieved by extending SM-PDS with a Büchi acceptance condition, forming a Self-Modifying Büchi Pushdown System (SM-BPDS) $\mathfrak{B} = (P, \Gamma, \Delta, \Delta_c, G)$ , where $G \subseteq P$ is the set of accepting control points. A run $\pi = c_0c_1…$ is accepting if infinitely many $c_i$ have a control point in $G$ .

Given an SM-PDS $\mathcal{P}$ with labeling $\nu: P \to 2^{At}$ and an LTL formula $\varphi$ over $At$ , the algorithm computes the synchronous product $\mathfrak{B}_\varphi = (P \times Q, \Gamma, \Delta', \Delta'_c, G = P \times F)$ , where $(Q, 2^{At}, \eta, q_0, F)$ is a Büchi automaton for $\varphi$ . Correctness is established: $(\mathcal{P}, \nu) \models \varphi$ iff $\mathfrak{B}_{\varphi}$ has an accepting run from $(\langle (p_0, q_0), w_0 \rangle, \operatorname{prod}(\theta_0))$ (Touili et al., 2019).

LTL model checking thus reduces to the emptiness problem for SM-BPDS: Does there exist an accepting run? Deciding emptiness is performed by constructing the head-reachability graph $\mathcal{G}$ over all possible heads $((p, \gamma), \theta)$ . A head is repeating if $\mathcal{G}$ contains a cycle through it labeled with a $1$ (visiting $G$ ). SAT-based saturation computes labeled pre* automata to add all relevant edges efficiently, with total complexity singly-exponential in $|\Delta| + |\Delta_c|$ and polynomial in $|P|, |\Gamma|$ . This is in contrast to PDS-translation approaches yielding full state-space blowup and doubly-exponential cost.

4. Implementation Approaches and Toolchain

A direct LTL model checker for SM-PDS has been implemented in OCaml (Touili et al., 2019). The analysis pipeline comprises:

Disassembly of binaries with Jakstab, recovering the control-flow graph and coarse memory/register over-approximation.
Translation of non-self-modifying instructions to $\Delta$ and self-modifying mov instructions to $\Delta_c$ , yielding the SM-PDS $\mathcal{P}$ .
Formula Compilation: Given an LTL property $\varphi$ or a library of malware behavior formulas, the product SM-BPDS $\mathfrak{B}_\varphi$ is constructed.
Graph Construction: The head-reachability graph $\mathcal{G}$ is built using repeated labeled $pre^*$ computations.
Emptiness/Model Checking: Search for a $1$-labeled cycle in $\mathcal{G}$ ; if found, $\varphi$ holds.

Reachability algorithms for $pre^*$ and $post^*$ are also implemented directly, manipulating configurations and phases via P-automata and sparse exploration, thus avoiding construction of the exponentially larger “phase-encoded” classical PDS.

5. Experimental Evaluation and Applications

Benchmarks are reported for both LTL model checking and reachability analysis:

Malware Detection: The SM-PDS LTL checker was applied to 892 self-modifying malware samples from VirusShare, MalShare, VX-Heavens, and NGVCK, as well as 19 benign programs with injected self-modifying unpackers and 205 generated NGVCK malwares. Detection properties were encoded as LTL formulas, e.g., for registry injection, data-stealing, or keylogging behaviors.
Performance: Direct SM-PDS LTL model checking took seconds to minutes per sample, compared to PDS-translation plus Moped, which timed out (20-minute cutoff) or required hours to days.
Detection Rates: The SM-PDS approach achieved 100% detection of 892 self-modifying malware, outperforming commercial static antivirus tools (max. ~68%). For 205 NGVCK samples, completeness was 100% versus 31.2% (BitDefender), 53.1% (Kaspersky), and 82.4% (Symantec).

Tool / Antivirus	Detection Rate (%)
SM-PDS LTL Checker	100
BitDefender	31.2
Kaspersky	53.1
Symantec	82.4

Synthetic benchmarks on randomly generated SM-PDSs with up to several thousand rules confirm that direct saturation is 10–1000 times faster and uses far less memory than phase-encoded PDS translation.

6. Illustrative Example

A representative SM-PDS instance is provided (Touili et al., 2019):

$P = \{p_1, p_2, p_3, p_4\}$
$\Gamma = \{\gamma_1, \gamma_2\}$
$\Delta = \{ r_1 : \langle p_1, \gamma_1 \rangle \hookrightarrow \langle p_2, \gamma_2 \gamma_1 \rangle, \ \ r_2 : \langle p_2, \gamma_2 \rangle \hookrightarrow \langle p_3, \epsilon \rangle \}$
$\Delta_c = \{r' : p_3\#(r_1, r_2)p_4\}$
Initial phase $\theta_0 = \{ r_1, r_2, r' \}$

A forward trace applies $r_1$ , then $r_2$ , then $r'$ , modifying the phase, thus modeling a changing transition set. This captures dynamic unpacking behavior as found in malicious binaries, which is not representable within standard PDS frameworks.

7. Limitations, Complexity, and Future Directions

The head-reachability graphs and automata manipulations introduce an exponential cost only in the number of distinct rules ( $|\Delta| + |\Delta_c|$ ), not the overall program size. Direct saturation avoids the doubly-exponential complexity suffered by PDS translations with phase encoding. In practice, the number of self-modifying rules is small, making these analyses tractable for real-world malware and synthetic cases.

Possible future directions and open problems include:

Incorporation of richer modification patterns, such as multi-rule swap operations.
Branching-time logics (e.g., CTL) and higher-order stack extensions.
Symbolic on-the-fly representations of phases to further reduce the $2^{|\Delta|+|\Delta_c|}$ state explosion.
Theoretical lower bounds for LTL model checking on SM-PDS.
Hybridization with dynamic analysis for improved static-dynamic verification capabilities (Touili et al., 2019, Touili et al., 2019).

Self-Modifying Pushdown Systems thus provide a mathematically rigorous, scalable, and practically effective foundation for the static analysis and model checking of self-modifying code, with demonstrated utility for advanced malware detection in automated toolchains.

PDF Markdown Chat (Pro)

References (2)

Reachability Analysis of Self Modifying Code (2019)

LTL Model Checking of Self Modifying Code (2019)

Follow Topic

Get notified by email when new papers are published related to Self-Modifying PushDown Systems (SM-PDS).