Papers
Topics
Authors
Recent
2000 character limit reached

Self-Modifying Pushdown Systems

Updated 10 November 2025
  • Self-Modifying Pushdown Systems (SM-PDS) are formal models that extend traditional pushdown systems by allowing dynamic modification of transition rules to capture self-modifying behaviors in code.
  • SM-PDS enable automated analysis techniques such as backward/forward reachability and LTL model checking, which are crucial for analyzing obfuscated and malicious software.
  • Experimental evaluations show that SM-PDS achieve 100% detection of self-modifying malware with improved performance and resource efficiency compared to traditional static analysis methods.

Self-Modifying Pushdown Systems (SM-PDS) are a formal extension of classical pushdown systems (PDS) designed to model and enable automated analysis of self-modifying code, particularly as encountered in obfuscated or malicious software. In SM-PDS, the set of transition rules can change dynamically during execution, reflecting the key ability of self-modifying programs to alter their instruction set at runtime. This modeling paradigm enables the application of reachability analysis, Linear Temporal Logic (LTL) model checking, and malware detection in the presence of code that evades traditional static verification techniques.

1. Formal Definition and Semantics

An SM-PDS is defined as a tuple P=(P,Γ,Δ,Δc)\mathcal{P} = (P, \Gamma, \Delta, \Delta_c), where PP is a finite set of control points (states), Γ\Gamma is a finite stack alphabet, Δ(P×Γ)×(P×Γ)\Delta \subseteq (P \times \Gamma) \times (P \times \Gamma^*) is the set of standard PDS transition rules, and ΔcP×(ΔΔc)×(ΔΔc)×P\Delta_c \subseteq P \times (\Delta\cup\Delta_c)\times(\Delta\cup\Delta_c)\times P encodes self-modifying rules. A configuration is of the form (p,w,θ)(\langle p, w\rangle, \theta) where pPp \in P, wΓw \in \Gamma^* (stack content), and θΔΔc\theta \subseteq \Delta \cup \Delta_c is the current set of enabled rules (referred to as the “phase”).

There are two key operational semantics:

  • Ordinary rule application (Std): If p,γp,wθΔ\langle p, \gamma\rangle \hookrightarrow \langle p', w'\rangle \in \theta \cap \Delta, then

(p,γu,θ)P(p,wu,θ).(\langle p, \gamma u\rangle, \theta) \Rightarrow_{\mathcal{P}} (\langle p', w'u\rangle, \theta).

  • Self-modification (Mod): If r=p#(r1,r2)pθΔcr = p\#(r_1, r_2)p' \in \theta \cap \Delta_c and r1θr_1 \in \theta, then

(p,w,θ)P(p,w,θ),(\langle p, w\rangle, \theta) \Rightarrow_{\mathcal{P}} (\langle p', w\rangle, \theta'),

where θ=(θ{r1}){r2}\theta' = (\theta \setminus \{r_1\}) \cup \{r_2\}.

If Δc=\Delta_c = \varnothing, P\mathcal{P} reduces to an ordinary PDS. The system thus permits arbitrary switching of its currently enabled transition rules, allowing a precise encoding of program phases for code that changes its control-flow or stack behavior at runtime (Touili et al., 2019, Touili et al., 2019).

2. Reachability Analysis

The central analysis tasks for SM-PDS are backward reachability (prepre^*) and forward reachability (postpost^*), computed with automata-theoretic saturation algorithms.

Backward Reachability (prepre^*)

Given a regular target set CC of configurations, the aim is to compute all configurations from which CC is reachable. CC is represented by a PP-automaton A=(Q,Γ,T,P,F)\mathcal{A} = (Q, \Gamma, T, P, F) over configurations, accepting (p,w,θ)(\langle p, w\rangle, \theta) iff (p,θ)wTq(p, \theta) \xrightarrow{w}_T q for qFq \in F. Saturation uses two rules:

  • α1\alpha_1 (ordinary rules): If p,γp,wΔ\langle p, \gamma\rangle \to \langle p', w\rangle \in \Delta and (p,θ)wTq(p', \theta) \xrightarrow{w}_{T'} q, then (p,θ)γTq(p, \theta) \xrightarrow{\gamma}_{T'} q is added.
  • α2\alpha_2 (self-modifying rules): If r=p#(r1,r2)p1Δcr = p\#(r_1, r_2)p_1 \in \Delta_c and (p1,θ)γTq(p_1, \theta') \xrightarrow{\gamma}_{T'} q, with θ=(θ{r1}){r2}\theta' = (\theta \setminus \{r_1\}) \cup \{r_2\}, then (p,θ)γTq(p, \theta) \xrightarrow{\gamma}_{T'} q is added.

The fixpoint Apre\mathcal{A}_{pre^*} thus accepts exactly pre(C)pre^*(C) (Touili et al., 2019).

Forward Reachability (postpost^*)

Forward reachability is constructed similarly, with four rules β1,β2,β3,β4\beta_1, \beta_2, \beta_3, \beta_4 manipulating transitions depending on the operation (pop, single push, double push, and self-modification). This procedure ensures that Apost\mathcal{A}_{post^*} accepts post(C)post^*(C).

Both analyses scale in the number of changing rules m=Δ+Δcm = |\Delta| + |\Delta_c|, with time and space O(2m)O(2^m), but only generate reachable phases in practice, allowing effective analysis even for real-world code.

3. LTL Model Checking and SM-BPDS

LTL model checking for self-modifying code is achieved by extending SM-PDS with a Büchi acceptance condition, forming a Self-Modifying Büchi Pushdown System (SM-BPDS) B=(P,Γ,Δ,Δc,G)\mathfrak{B} = (P, \Gamma, \Delta, \Delta_c, G), where GPG \subseteq P is the set of accepting control points. A run π=c0c1\pi = c_0c_1… is accepting if infinitely many cic_i have a control point in GG.

Given an SM-PDS P\mathcal{P} with labeling ν:P2At\nu: P \to 2^{At} and an LTL formula φ\varphi over AtAt, the algorithm computes the synchronous product Bφ=(P×Q,Γ,Δ,Δc,G=P×F)\mathfrak{B}_\varphi = (P \times Q, \Gamma, \Delta', \Delta'_c, G = P \times F), where (Q,2At,η,q0,F)(Q, 2^{At}, \eta, q_0, F) is a Büchi automaton for φ\varphi. Correctness is established: (P,ν)φ(\mathcal{P}, \nu) \models \varphi iff Bφ\mathfrak{B}_{\varphi} has an accepting run from ((p0,q0),w0,prod(θ0))(\langle (p_0, q_0), w_0 \rangle, \operatorname{prod}(\theta_0)) (Touili et al., 2019).

LTL model checking thus reduces to the emptiness problem for SM-BPDS: Does there exist an accepting run? Deciding emptiness is performed by constructing the head-reachability graph G\mathcal{G} over all possible heads ((p,γ),θ)((p, \gamma), \theta). A head is repeating if G\mathcal{G} contains a cycle through it labeled with a $1$ (visiting GG). SAT-based saturation computes labeled pre* automata to add all relevant edges efficiently, with total complexity singly-exponential in Δ+Δc|\Delta| + |\Delta_c| and polynomial in P,Γ|P|, |\Gamma|. This is in contrast to PDS-translation approaches yielding full state-space blowup and doubly-exponential cost.

4. Implementation Approaches and Toolchain

A direct LTL model checker for SM-PDS has been implemented in OCaml (Touili et al., 2019). The analysis pipeline comprises:

  1. Disassembly of binaries with Jakstab, recovering the control-flow graph and coarse memory/register over-approximation.
  2. Translation of non-self-modifying instructions to Δ\Delta and self-modifying mov instructions to Δc\Delta_c, yielding the SM-PDS P\mathcal{P}.
  3. Formula Compilation: Given an LTL property φ\varphi or a library of malware behavior formulas, the product SM-BPDS Bφ\mathfrak{B}_\varphi is constructed.
  4. Graph Construction: The head-reachability graph G\mathcal{G} is built using repeated labeled prepre^* computations.
  5. Emptiness/Model Checking: Search for a $1$-labeled cycle in G\mathcal{G}; if found, φ\varphi holds.

Reachability algorithms for prepre^* and postpost^* are also implemented directly, manipulating configurations and phases via P-automata and sparse exploration, thus avoiding construction of the exponentially larger “phase-encoded” classical PDS.

5. Experimental Evaluation and Applications

Benchmarks are reported for both LTL model checking and reachability analysis:

  • Malware Detection: The SM-PDS LTL checker was applied to 892 self-modifying malware samples from VirusShare, MalShare, VX-Heavens, and NGVCK, as well as 19 benign programs with injected self-modifying unpackers and 205 generated NGVCK malwares. Detection properties were encoded as LTL formulas, e.g., for registry injection, data-stealing, or keylogging behaviors.
  • Performance: Direct SM-PDS LTL model checking took seconds to minutes per sample, compared to PDS-translation plus Moped, which timed out (20-minute cutoff) or required hours to days.
  • Detection Rates: The SM-PDS approach achieved 100% detection of 892 self-modifying malware, outperforming commercial static antivirus tools (max. ~68%). For 205 NGVCK samples, completeness was 100% versus 31.2% (BitDefender), 53.1% (Kaspersky), and 82.4% (Symantec).
Tool / Antivirus Detection Rate (%)
SM-PDS LTL Checker 100
BitDefender 31.2
Kaspersky 53.1
Symantec 82.4

Synthetic benchmarks on randomly generated SM-PDSs with up to several thousand rules confirm that direct saturation is 10–1000 times faster and uses far less memory than phase-encoded PDS translation.

6. Illustrative Example

A representative SM-PDS instance is provided (Touili et al., 2019):

  • P={p1,p2,p3,p4}P = \{p_1, p_2, p_3, p_4\}
  • Γ={γ1,γ2}\Gamma = \{\gamma_1, \gamma_2\}
  • Δ={r1:p1,γ1p2,γ2γ1,  r2:p2,γ2p3,ϵ}\Delta = \{ r_1 : \langle p_1, \gamma_1 \rangle \hookrightarrow \langle p_2, \gamma_2 \gamma_1 \rangle, \ \ r_2 : \langle p_2, \gamma_2 \rangle \hookrightarrow \langle p_3, \epsilon \rangle \}
  • Δc={r:p3#(r1,r2)p4}\Delta_c = \{r' : p_3\#(r_1, r_2)p_4\}
  • Initial phase θ0={r1,r2,r}\theta_0 = \{ r_1, r_2, r' \}

A forward trace applies r1r_1, then r2r_2, then rr', modifying the phase, thus modeling a changing transition set. This captures dynamic unpacking behavior as found in malicious binaries, which is not representable within standard PDS frameworks.

7. Limitations, Complexity, and Future Directions

The head-reachability graphs and automata manipulations introduce an exponential cost only in the number of distinct rules (Δ+Δc|\Delta| + |\Delta_c|), not the overall program size. Direct saturation avoids the doubly-exponential complexity suffered by PDS translations with phase encoding. In practice, the number of self-modifying rules is small, making these analyses tractable for real-world malware and synthetic cases.

Possible future directions and open problems include:

  • Incorporation of richer modification patterns, such as multi-rule swap operations.
  • Branching-time logics (e.g., CTL) and higher-order stack extensions.
  • Symbolic on-the-fly representations of phases to further reduce the 2Δ+Δc2^{|\Delta|+|\Delta_c|} state explosion.
  • Theoretical lower bounds for LTL model checking on SM-PDS.
  • Hybridization with dynamic analysis for improved static-dynamic verification capabilities (Touili et al., 2019, Touili et al., 2019).

Self-Modifying Pushdown Systems thus provide a mathematically rigorous, scalable, and practically effective foundation for the static analysis and model checking of self-modifying code, with demonstrated utility for advanced malware detection in automated toolchains.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Self-Modifying PushDown Systems (SM-PDS).