Papers
Topics
Authors
Recent
Search
2000 character limit reached

Process Hollowing Analysis

Updated 21 January 2026
  • Process hollowing is a binary analysis technique that replaces a benign process’s code with injected code, enabling stealthy dynamic instrumentation.
  • The HALF framework uses a decoupled architecture with an instrumentation module, a container process, and a kernel monitor to maintain low overhead and precise analysis.
  • Empirical benchmarks reveal that HALF significantly reduces memory footprint and runtime slowdown compared to traditional dynamic taint analysis methods.

Process hollowing is a binary program analysis and evasion technique where a benign process’s original code is replaced or “hollowed out” in memory and malicious or instrumented code is injected in its place. In modern security research, process hollowing has been adapted as a primitive for constructing isolated, containerized analysis environments that facilitate fine-grained, low-overhead dynamic binary instrumentation and taint-tracking, notably circumventing the memory fragmentation and address-space exhaustion endemic to legacy approaches. A prominent example of this paradigm is the HALF (Hollowing-Assisted Lightweight Framework) architecture, which leverages customized process hollowing to decouple program execution from heavyweight analysis operations on the Windows OS (Long et al., 26 Dec 2025).

1. Component Architecture of Process Hollowing for Analysis

Process hollowing as instrumented in the HALF framework consists of three core cooperating entities:

  • Instrumentation Module (User-Mode, Target Process): Employs dynamic binary instrumentation (DBI) via engines such as DynamoRIO, inserting just-in-time (JIT) stubs before each memory-access instruction. Each stub records (Addr, Regs, InsnID) to a per-thread ring buffer, thereby collecting execution trace data without substantial in-process shadow memory.
  • Container Process (User-Mode, Hollowed Analysis Sandbox): Initiated as a minimal stub with only essential runtime components (ntdll and CRT). Non-essential sections (e.g., PEB, TEB, default stacks, loaded DLLs) are unmapped from memory. The container shares select virtual-address regions (instrumentation DLL image, analysis stubs, record buffers) with the target and binds analysis threads to each target thread for asynchronous analysis.
  • Kernel Monitor (Kernel-Mode Driver): Hooks kernel APIs such as ZwAllocateVirtualMemory and ZwFreeVirtualMemory to maintain synchronized memory mappings between target and container. Handles __guard_page faults for buffer swaps, and intercepts synchronization syscalls (e.g., NtWaitForSingleObject) to trigger buffer flushing and ensure event ordering.

This architecture relocates costly taint or DFT operations out of the target process, minimizing memory pollution while maintaining fidelity of the original execution environment (Long et al., 26 Dec 2025).

2. Detailed Process Hollowing Procedure

HALF’s process hollowing mechanism is executed in discrete steps:

  1. Stub Creation: A new suspended process (“hollow.exe”) is created using CreateProcessW with the CREATE_SUSPENDED flag.
  2. Kernel-Unmap and Section Remapping: The kernel monitor enumerates all memory regions (e.g., via QueryVirtualMemory) and invokes ZwUnmapViewOfSection on every non-code section, excluding the pre-mapped instrumentation DLL. This step removes PEB, TEB, stacks, and extraneous DLLs, leaving only the minimal execution environment.
  3. Analysis Region Reservation: In the target, the DBI reserves three contiguous virtual address (VA) ranges:
    • RcodeR_{code}: instrumentation DLL image
    • RsharedCodeR_{sharedCode}: analysis stubs
    • RbufR_{buf}: record buffers per thread The kernel then reserves the same ranges in the container with MEM_RESERVE and PAGE_NOACCESS.
  4. On-Demand Commitment and VA Mirroring: When ZwAllocateVirtualMemory is called in the target for a region, the kernel mirrors this commit in the container, maintaining equivalent permissions. Upon a page fault in the container (indicative of first access), pages are (re-)committed accordingly.

Formally, for the commit mapping:

  • Let VtargetV_{\text{target}} denote the set of committed VA pages in the target.
  • Maintain a bijection f:VtargetVcontainerf: V_{\text{target}} \to V_{\text{container}} such that for all pVtarget, Protcont(p)=Prottgt(p)p \in V_{\text{target}},\ \text{Prot}_{\text{cont}}(p) = \text{Prot}_{\text{tgt}}(p).

This shared address-space schema ensures both import rebasing and code instrumentation can operate without address-space contention (Long et al., 26 Dec 2025).

3. Kernel Module Orchestration and Event Brokering

The kernel driver underpinning the analysis coordinates three principal domains:

  • DBI Extension: All virtual memory allocations (commits and frees) in the target are mirrored in the container. Per-thread buffer allocations are tracked, mapping a buffer in the analysis container for each thread in the target.
  • Fault Management: A kernel callback is registered for __guard_page faults, enabling buffer wrap-around logic and buffer assignment without user-mode code intervention.
  • Synchronization Management: Pre- and Post- callbacks are installed on synchronization syscalls (NtWaitX, NtSetEvent) to enforce early buffer flushes, reducing trace lag and enforcing partial FIFO ordering in event analysis.

A device IOCTL (IOCTL_HALF_REGISTER_THREAD) ties analysis threads in the container to those in the target, recording shared buffer bases and sizes. This permits precise event correlation and race mitigation during multi-threaded execution (Long et al., 26 Dec 2025).

4. Formalization of Decoupled Execution and Analysis

Execution decoupling in HALF is formalized as follows:

  • Let TT denote the target thread and AA its paired analysis thread.
  • The shared buffer B={B1,B2,,BN}\mathcal{B} = \{ B_1, B_2, \ldots, B_N \} is a sequence of record pages.
  • Events eiEe_i \in \mathcal{E} are written to pages BjB_j as the target executes; upon buffer wrap/flush, BjB_j is enqueued onto a work queue QQ.
  • The analysis thread AA dequeues from QQ and processes events in strict FIFO order.

The analysis trace σA\sigma_A is a causally delayed but order-preserving function of the execution trace σT\sigma_T, constrained by the buffer swap latency δ\delta:

e:eσT    eσA,\forall e : e \in \sigma_T \implies e \in \sigma_A,

and ordering is preserved: if eie_i precedes eje_j in σT\sigma_T, AA observes eie_i before eje_j modulo δ\delta.

Partial-order preservation is further enforced by flushing QQ at every synchronization syscall, ensuring accurate taint propagation and data-flow integrity. This structure allows expensive dynamic taint analysis to be performed out-of-band with respect to the primary program execution (Long et al., 26 Dec 2025).

5. Performance, Resource Overhead, and Empirical Benchmarks

Quantitative analysis demonstrates the efficiency of process hollowing-based frameworks:

  • Memory Usage:
    • Traditional DFT via libdft64 reserves/commits extensive shadow memory: Mshadow=MVA+Mphysical512M_{\text{shadow}} = M_{\text{VA}} + M_{\text{physical}} \approx 512 MB (virtual).
    • HALF requires only MHALF=Mcode+Nthreads×BbufM_{\text{HALF}} = M_{\text{code}} + N_{\text{threads}} \times B_{\text{buf}}, with Bbuf=512B_{\text{buf}} = 512 KB typically.
  • Instrumentation Overhead:

The total runtime cost:

THALFT0+NbbCrec+ffaultCfault+Cctx(Q),T_{\text{HALF}} \approx T_0 + N_{\text{bb}} \cdot C_{\text{rec}} + f_{\text{fault}} \cdot C_{\text{fault}} + C_{\text{ctx}}(Q),

where: - NbbN_{\text{bb}}: number of basic blocks - CrecC_{\text{rec}}: per-stub recording cost - ffaultf_{\text{fault}}, CfaultC_{\text{fault}}: fault frequency and cost - Cctx(Q)C_{\text{ctx}}(Q): context switching (buffer swap) overhead

  • Empirical Slowdown and Footprint:
Program libdft64 Slowdown HALF Slowdown libdft64 VA (MB) HALF_512KB (MB)
505.mcf_r 950× 11× 950 2.5
525.x264_r 203× 25×
531.deepsjeng_r 719× 15× 719 3.0

A plausible implication is that process hollowing-based decoupling dramatically reduces the virtual address footprint and performance overhead compared to legacy DFT implementations (Long et al., 26 Dec 2025).

6. Practical Case Studies: Exploit and Malware Analysis

Process hollowing-based instrumentation facilitates successful tracing of memory corruption exploits and evasive malware:

  • Memory-Corruption Exploit Tracing:

Five CVEs across Office, VLC, Firefox, Chrome, Acrobat were tested. HALF successfully analyzed payload execution in scenarios where libdft64 and ASan-alloc failed due to their aggressive shadow memory reservation colliding with heap-sprayed buffers. HALF’s hollowed container preserves the original heap layout, enabling intact exploit reproduction.

CVE libdft ASan‐alloc HALF
2017-11882
2018-11529
2023-21608
  • Sandbox-Evasion Malware (Cobalt Strike):

The instrumentation taints network-receive buffers by hooking NtDeviceIoControlFile, and indirect branch analysis verifies code provenance. The system detects execution of foreign code within 150 ms of the first network packet, with no false positives, preserving stealth and native workflow (Long et al., 26 Dec 2025).

7. Security Considerations, Stealth, and Countermeasures

Process hollowing-based frameworks have specific operational strengths and inherent limitations:

  • Stealth and Deployability:
    • No dependency on virtualization; operates natively under Windows.
    • Minimal kernel API (Zw*) hooks; DBI and analysis reside in a disjoint process.
    • Components are injected via standard CreateRemoteThread and a kernel-mode signed driver.
  • Counter-detection and Defensive Monitoring:
    • Hollowed containers may be detected by malware by probing for absent standard DLLs in PEB_LDR_DATA.
    • Host-based defenses can monitor ZwUnmapViewOfSection calls and alert on large-scale unmaps.
    • Integrity monitoring via module list comparison can detect deviations from standard process layouts.
  • Implementation Issues:
    • Kernel driver signing (Win10/11) requires EV certificates or test mode.
    • Address space layout randomization (ASLR) must be managed to avoid import-table misalignments.
    • In heavily synchronized, multi-threaded targets, buffer flush heuristics may not fully guarantee event ordering (observed GSR ≈ 80% in worst case), which may result in rare race anomalies (Long et al., 26 Dec 2025).

A plausible implication is that while process hollowing-based frameworks offer substantial gains in stealth and deployability, sophisticated adversaries or system-level rootkit monitors may develop specialized detection strategies.

8. Summary and Perspectives

By leveraging process hollowing to construct isolated analysis containers and orchestrating synchronization via a kernel monitor, modern frameworks such as HALF achieve fine-grained dynamic data-flow tracing with dramatically reduced address-space and memory overhead relative to user-mode DFT tools. Empirical evidence demonstrates the ability to analyze memory-sensitive exploits and stage-less, covert malware with low performance impact and high detection coverage. The architecture’s native deployment, minimal system footprint, and robust event ordering make process hollowing a compelling primitive for real-time, low-impact binary analysis in adversarial and production settings (Long et al., 26 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Process Hollowing Analysis.