Self-Modifying Pushdown Systems
- Self-Modifying Pushdown Systems (SM-PDS) are formal models that extend traditional pushdown systems by allowing dynamic modification of transition rules to capture self-modifying behaviors in code.
- SM-PDS enable automated analysis techniques such as backward/forward reachability and LTL model checking, which are crucial for analyzing obfuscated and malicious software.
- Experimental evaluations show that SM-PDS achieve 100% detection of self-modifying malware with improved performance and resource efficiency compared to traditional static analysis methods.
Self-Modifying Pushdown Systems (SM-PDS) are a formal extension of classical pushdown systems (PDS) designed to model and enable automated analysis of self-modifying code, particularly as encountered in obfuscated or malicious software. In SM-PDS, the set of transition rules can change dynamically during execution, reflecting the key ability of self-modifying programs to alter their instruction set at runtime. This modeling paradigm enables the application of reachability analysis, Linear Temporal Logic (LTL) model checking, and malware detection in the presence of code that evades traditional static verification techniques.
1. Formal Definition and Semantics
An SM-PDS is defined as a tuple , where is a finite set of control points (states), is a finite stack alphabet, is the set of standard PDS transition rules, and encodes self-modifying rules. A configuration is of the form where , (stack content), and is the current set of enabled rules (referred to as the “phase”).
There are two key operational semantics:
- Ordinary rule application (Std): If , then
- Self-modification (Mod): If and , then
where .
If , reduces to an ordinary PDS. The system thus permits arbitrary switching of its currently enabled transition rules, allowing a precise encoding of program phases for code that changes its control-flow or stack behavior at runtime (Touili et al., 2019, Touili et al., 2019).
2. Reachability Analysis
The central analysis tasks for SM-PDS are backward reachability () and forward reachability (), computed with automata-theoretic saturation algorithms.
Backward Reachability ()
Given a regular target set of configurations, the aim is to compute all configurations from which is reachable. is represented by a -automaton over configurations, accepting iff for . Saturation uses two rules:
- (ordinary rules): If and , then is added.
- (self-modifying rules): If and , with , then is added.
The fixpoint thus accepts exactly (Touili et al., 2019).
Forward Reachability ()
Forward reachability is constructed similarly, with four rules manipulating transitions depending on the operation (pop, single push, double push, and self-modification). This procedure ensures that accepts .
Both analyses scale in the number of changing rules , with time and space , but only generate reachable phases in practice, allowing effective analysis even for real-world code.
3. LTL Model Checking and SM-BPDS
LTL model checking for self-modifying code is achieved by extending SM-PDS with a Büchi acceptance condition, forming a Self-Modifying Büchi Pushdown System (SM-BPDS) , where is the set of accepting control points. A run is accepting if infinitely many have a control point in .
Given an SM-PDS with labeling and an LTL formula over , the algorithm computes the synchronous product , where is a Büchi automaton for . Correctness is established: iff has an accepting run from (Touili et al., 2019).
LTL model checking thus reduces to the emptiness problem for SM-BPDS: Does there exist an accepting run? Deciding emptiness is performed by constructing the head-reachability graph over all possible heads . A head is repeating if contains a cycle through it labeled with a $1$ (visiting ). SAT-based saturation computes labeled pre* automata to add all relevant edges efficiently, with total complexity singly-exponential in and polynomial in . This is in contrast to PDS-translation approaches yielding full state-space blowup and doubly-exponential cost.
4. Implementation Approaches and Toolchain
A direct LTL model checker for SM-PDS has been implemented in OCaml (Touili et al., 2019). The analysis pipeline comprises:
- Disassembly of binaries with Jakstab, recovering the control-flow graph and coarse memory/register over-approximation.
- Translation of non-self-modifying instructions to and self-modifying
movinstructions to , yielding the SM-PDS . - Formula Compilation: Given an LTL property or a library of malware behavior formulas, the product SM-BPDS is constructed.
- Graph Construction: The head-reachability graph is built using repeated labeled computations.
- Emptiness/Model Checking: Search for a $1$-labeled cycle in ; if found, holds.
Reachability algorithms for and are also implemented directly, manipulating configurations and phases via P-automata and sparse exploration, thus avoiding construction of the exponentially larger “phase-encoded” classical PDS.
5. Experimental Evaluation and Applications
Benchmarks are reported for both LTL model checking and reachability analysis:
- Malware Detection: The SM-PDS LTL checker was applied to 892 self-modifying malware samples from VirusShare, MalShare, VX-Heavens, and NGVCK, as well as 19 benign programs with injected self-modifying unpackers and 205 generated NGVCK malwares. Detection properties were encoded as LTL formulas, e.g., for registry injection, data-stealing, or keylogging behaviors.
- Performance: Direct SM-PDS LTL model checking took seconds to minutes per sample, compared to PDS-translation plus Moped, which timed out (20-minute cutoff) or required hours to days.
- Detection Rates: The SM-PDS approach achieved 100% detection of 892 self-modifying malware, outperforming commercial static antivirus tools (max. ~68%). For 205 NGVCK samples, completeness was 100% versus 31.2% (BitDefender), 53.1% (Kaspersky), and 82.4% (Symantec).
| Tool / Antivirus | Detection Rate (%) |
|---|---|
| SM-PDS LTL Checker | 100 |
| BitDefender | 31.2 |
| Kaspersky | 53.1 |
| Symantec | 82.4 |
Synthetic benchmarks on randomly generated SM-PDSs with up to several thousand rules confirm that direct saturation is 10–1000 times faster and uses far less memory than phase-encoded PDS translation.
6. Illustrative Example
A representative SM-PDS instance is provided (Touili et al., 2019):
- Initial phase
A forward trace applies , then , then , modifying the phase, thus modeling a changing transition set. This captures dynamic unpacking behavior as found in malicious binaries, which is not representable within standard PDS frameworks.
7. Limitations, Complexity, and Future Directions
The head-reachability graphs and automata manipulations introduce an exponential cost only in the number of distinct rules (), not the overall program size. Direct saturation avoids the doubly-exponential complexity suffered by PDS translations with phase encoding. In practice, the number of self-modifying rules is small, making these analyses tractable for real-world malware and synthetic cases.
Possible future directions and open problems include:
- Incorporation of richer modification patterns, such as multi-rule swap operations.
- Branching-time logics (e.g., CTL) and higher-order stack extensions.
- Symbolic on-the-fly representations of phases to further reduce the state explosion.
- Theoretical lower bounds for LTL model checking on SM-PDS.
- Hybridization with dynamic analysis for improved static-dynamic verification capabilities (Touili et al., 2019, Touili et al., 2019).
Self-Modifying Pushdown Systems thus provide a mathematically rigorous, scalable, and practically effective foundation for the static analysis and model checking of self-modifying code, with demonstrated utility for advanced malware detection in automated toolchains.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free