NEUTAINT: Neural Saliency-Based Taint Analysis

Updated 5 February 2026

NEUTAINT is an end-to-end dynamic taint analysis framework that leverages neural embeddings and gradient-based saliency to precisely trace data flows.
It employs a neural model trained on execution traces to infer source-to-sink influences, achieving higher hot-byte accuracy and faster runtime than traditional methods.
The framework enhances vulnerability triage and fuzzing guidance by accurately identifying critical input bytes, streamlining exploit tracking and software debugging.

Neural Saliency-Based Taint (NEUTAINT) is an end-to-end dynamic taint analysis framework that leverages neural networks and saliency mapping to efficiently and accurately track information flow within software systems. Unlike traditional rule-based dynamic taint analysis (DTA) tools, which propagate taint through hand-crafted rules for each program operation, NEUTAINT employs neural program embeddings trained on black-box execution traces. This neural approach infers which input bytes (“taint sources”) influence particular program values (“taint sinks”) by directly modeling their relationship, and then uses gradient-based saliency techniques to recover fine- and coarse-grained source-to-sink influence mappings. The methodology enables precise identification of critical (“hot”) bytes for applications such as vulnerability triage, exploit analysis, and fuzzer guidance, while achieving significant improvements in performance and scalability compared to conventional DTA systems (She et al., 2019).

1. Architecture and Workflow

NEUTAINT’s workflow consists of the following sequential phases:

Trace Collection: The target program is treated as a black box, with taint sources (e.g., specific input bytes) and taint sinks (e.g., branch condition variables or pointer values) identified in advance. An LLVM-based instrumentation pass logs, for each test input vector $x$ , the resulting concrete values of all sink variables $y(x)$ . Input diversity is maximized by generating approximately 2,000 distinct test cases per program using simple or coverage-guided fuzzing, ensuring broad source-to-sink path coverage.
Neural Program Embedding: Pairs of input bytes and sink variables $\{(x, y)\}$ form the empirical dataset used to train a feed-forward neural network $f_\theta$ , with the objective that $f_\theta(x) \approx y$ . After training, this network serves as a differentiable proxy for the program’s input–output influence structure, enabling scalable influence queries without further binary instrumentation.
Saliency-Based Taint Inference: To localize which input bytes causally affect specific sinks, NEUTAINT computes input gradients (Jacobian $\partial f_\theta/\partial x$ $\partial f_{θ} / \partial x$ ) with respect to the trained model. From these, it constructs:
- Coarse-grained saliency maps ( $S_{\rm coarse}$ ): Aggregate importance of each input byte across all sinks.
- Fine-grained saliency maps ( $S_{\rm fine}^{(i)}$ ): Per-sink importance, isolating the most influential input bytes for individual sinks.
Use Cases: Fine-grained saliency enables identification of offsets driving specific behaviors (e.g., bug triggers), while coarse-grained maps prioritize influential bytes for input mutation (e.g., taint-guided fuzzing).

2. Neural Embedding Model

The neural program embedding $f_\theta$ models the relation between normalized input bytes $x \in \mathbb{R}^m$ and sink variables $y \in \mathbb{R}^n$ (which may be binary or real-valued). The architecture is summarized as: $h = \phi(W^{(1)} x + b^{(1)}) \in \mathbb{R}^d, \qquad \hat{y} = \sigma(W^{(2)} h + b^{(2)}) \in \mathbb{R}^n,$ where $d=4096$ (hidden layer size), $\phi = \text{ReLU}$ , and $\sigma$ is an element-wise sigmoid (for binary outputs) or the identity (for real-valued sink variables).

The entire mapping $f_\theta: x \mapsto \hat{y}$ constitutes the neural embedding, which can be interpreted as a learned approximation of the program’s information flow between sources and sinks.

3. Saliency Map Construction and Inference

Once trained, $f_\theta$ permits analysis of source-sink dependencies via gradient-based saliency. The Jacobian of the output with respect to each input byte is defined as: $J_{i, j}(x) = \frac{\partial f_i(\theta, x)}{\partial x_j}, \quad i=1\dots n,\ j=1\dots m.$

Coarse-Grained Saliency: For input byte $j$ ,

$S_{\rm coarse}(x)[j] = \sum_{i=1}^n |J_{i, j}(x)|.$

This summarizes the aggregated influence of a byte across all sink variables.

Fine-Grained Saliency: For sink $i$ and input byte $j$ ,

$S_{\rm fine}^{(i)}(x)[j] = |J_{i, j}(x)|.$

The $k$ input bytes with the highest $S_{\rm fine}^{(i)}(x)[j]$ values ( $H_i(k)$ ) are considered the most causally responsible for the value of sink $i$ .

This dual granularity allows tailored analysis: fine-grained saliency for debugging and exploit tracing at a sink level; coarse-grained for global influence and fuzzer guidance.

4. Model Training and Optimization

NEUTAINT trains the neural program embedding with objectives appropriate to sink variable types:

Binary Sinks ( $y_i \in \{0,1\}$ ):

$L(\theta) = \sum_{(x, y)} \sum_{i=1}^n \left[ -y_i \log \hat{y}_i - (1 - y_i) \log (1 - \hat{y}_i) \right]$

Real-Valued Sinks: (e.g., counters, pointer offsets)

$L(\theta) = \sum_{(x, y)} \|f_\theta(x) - y\|_2^2$

Optimization employs Adam ( $\text{lr}=0.01$ , decayed ×0.7 every 10 epochs, 100 epochs total, batch size 16), typically converging within one minute per program on a GPU given $\sim 2\text{K}$ input–output pairs. No explicit regularization (dropout, weight decay) was found necessary.

5. Empirical Evaluation

NEUTAINT was benchmarked against Libdft, Triton, and DFSan (all state-of-the-art rule-based DTA tools) across six real-world parsers (readelf, harfbuzz, mupdf, libxml, libjpeg, zlib). Key performance metrics are summarized below:

Metric	NEUTAINT	Libdft (2nd-Best)	Triton/DFSan
Hot-byte accuracy	68%	58%	<58%
False-positive rate (bytes outside ground-truth)	2.1%	2.5–3.9%	2.5–3.9%
Runtime (2,000 inputs, GPU/CPU)	4 min / 15 min	~161 min	>24 h
Fuzzing edge coverage (rel. NEUTAINT)	100%	61%	oft. <50%

Hot-Byte Accuracy: The fraction of top 5% “hot” bytes detected by each tool falling within ground-truth regions was 68% for NEUTAINT, a 10-point improvement over Libdft, with lower false positives (2.1% vs. 2.5–3.9%).
Runtime Overhead: Total end-to-end overhead (trace, train, saliency) was 4 minutes (GPU) or 15 minutes (CPU) for NEUTAINT, 40 $\times$ –10 $\times$ faster than Libdft; Triton and DFSan required over 24 hours for some benchmarks.
Fuzzing Guidance: On 24-hour fuzzing runs, NEUTAINT-guided mutation covered 100% of target program edges, compared to 61% for Libdft, and substantially lower (<50%) for others.
Exploit Tracking: NEUTAINT recovered true source–sink influences for five real CVEs (buffer/heap overflows, divide-by-zero, out-of-bounds read) by analysis of per-sink saliency.
End-to-End Flow Recovery: For four small programs with $\sim$ 18,000 known flows, NEUTAINT uncovered 98.7%, compared to 78% for Triton.

6. Applications and Impact

NEUTAINT eliminates the manual rule-writing and per-instruction inspection characteristic of rule-based DTA, replacing it with an end-to-end, differentiable approach capable of precise and efficient source-to-sink information flow mapping. Notable applications include:

Bug and Vulnerability Analysis: NEUTAINT’s fine-grained saliency enables identification of input bytes responsible for critical or vulnerable sinks, assisting in root-cause analysis and exploit development triage.
Fuzzing: Coarse-grained saliency maps prioritize high-impact input bytes for targeted mutation, achieving substantially greater edge coverage in taint-guided fuzzing compared to existing approaches.
Program Understanding: Global saliency characterizations can guide software testing, maintenance, and formal verification efforts by highlighting most influential inputs.

Observed outcomes are a $\sim$ 10 percentage-point gain in hot-byte accuracy, 40-fold speedup in analysis time, and 61% greater fuzzing coverage versus the next-best tools, all while avoiding the complexity and performance cost of tracking taint rules for every instruction (She et al., 2019). A plausible implication is that neural saliency-based taint approaches generalize more readily to unseen program states or novel instruction sequences than fixed rule systems.

Prior to NEUTAINT, dynamic taint analysis predominantly relied on rule-based propagation systems (e.g., Libdft, Triton, DFSan), which struggle with accuracy, coverage of corner cases, and severe performance overhead due to the need for per-instruction rule enforcement. Accumulating overtaint and undertaint errors undermined the tractability of fine-grained, large-scale analyses.

NEUTAINT’s central innovation is the use of neural networks to implicitly encode the entire input–output influence structure of a program, and the exploitation of standard machine learning saliency techniques to recover precise and actionable taint maps in both coarse and fine granularity. This approach obviates manual taint rule engineering and delivers higher throughput for real-world scale binaries.

Potential areas for future exploration include extension to dynamic taint tracking in concurrent or distributed systems, integration with symbolic execution and richer program synthesis methods for input generation, and investigation of neural model transferability between related codebases. This suggests fertile ground for systematizing neural-based program analysis beyond the domain of taint tracking (She et al., 2019).

Markdown Report Issue Upgrade to Chat

References (1)

Neutaint: Efficient Dynamic Taint Analysis with Neural Networks (2019)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Neural Saliency-Based Taint (NEUTAINT).

NEUTAINT: Neural Saliency-Based Taint Analysis

1. Architecture and Workflow

2. Neural Embedding Model

3. Saliency Map Construction and Inference

4. Model Training and Optimization

5. Empirical Evaluation

6. Applications and Impact

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

NEUTAINT: Neural Saliency-Based Taint Analysis

1. Architecture and Workflow

2. Neural Embedding Model

3. Saliency Map Construction and Inference

4. Model Training and Optimization

5. Empirical Evaluation

6. Applications and Impact

7. Related Work and Context

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research