Learning to Execute Programs with Instruction Pointer Attention Graph Neural Networks (2010.12621v1)

Published 23 Oct 2020 in cs.LG

Abstract: Graph neural networks (GNNs) have emerged as a powerful tool for learning software engineering tasks including code completion, bug finding, and program repair. They benefit from leveraging program structure like control flow graphs, but they are not well-suited to tasks like program execution that require far more sequential reasoning steps than number of GNN propagation steps. Recurrent neural networks (RNNs), on the other hand, are well-suited to long sequential chains of reasoning, but they do not naturally incorporate program structure and generally perform worse on the above tasks. Our aim is to achieve the best of both worlds, and we do so by introducing a novel GNN architecture, the Instruction Pointer Attention Graph Neural Networks (IPA-GNN), which achieves improved systematic generalization on the task of learning to execute programs using control flow graphs. The model arises by considering RNNs operating on program traces with branch decisions as latent variables. The IPA-GNN can be seen either as a continuous relaxation of the RNN model or as a GNN variant more tailored to execution. To test the models, we propose evaluating systematic generalization on learning to execute using control flow graphs, which tests sequential reasoning and use of program structure. More practically, we evaluate these models on the task of learning to execute partial programs, as might arise if using the model as a heuristic function in program synthesis. Results show that the IPA-GNN outperforms a variety of RNN and GNN baselines on both tasks.

PDF Abstract

This paper (Bieber et al., 2020 ) addresses the challenge of training neural networks to reason about program execution directly from static program representations like source code and control flow graphs (CFGs), without actually running the code. Existing models like Recurrent Neural Networks (RNNs) are good at sequential processing but don't naturally handle program structure like branches and loops encoded in a CFG. Graph Neural Networks (GNNs) are good at leveraging structure but struggle with the long, sequential dependencies typical of program execution traces. The authors propose the Instruction Pointer Attention Graph Neural Network (IPA-GNN) to combine the strengths of both, designed to mimic the step-by-step execution logic of a classical interpreter.

The core idea of the IPA-GNN is to model execution state and control flow explicitly. Instead of a single state for the entire program or states per node without temporal sequence, the IPA-GNN maintains a hidden state $h_{t, n}$ for each statement (node) $n$ at each time step $t$ . It also uses a "soft instruction pointer" $p_{t, n}$ , which is a distribution over all statements, indicating the probability that the program is executing statement $n$ at time $t$ .

The update mechanism for one step of the IPA-GNN (from $t-1$ to $t$ ) closely follows the interpreter analogy:

Execution/State Proposal: For each node $n$ , an RNN (specifically, an LSTM or GRU in practice) processes the statement's representation $x_n$ using the previous hidden state $h_{t-1, n}$ for that node. This produces a state proposal $a^{(1)}_{t, n}$ . This step is analogous to an interpreter executing a single instruction and updating its state.

$a^{(1)}_{t, n} = \text{RNN}(h_{t-1, n}, x_n)$
Branch Decision: For nodes $n$ with multiple outgoing edges (branches, e.g., from an if or while condition), a dense layer predicts a soft distribution $b_{t, n, n'}$ over the possible next statements $n'$ in $out(n)$ , based on the state proposal $a^{(1)}_{t, n}$ . This mimics the interpreter evaluating a condition and deciding which path to take.

$b_{t, n, n'} = \text{Softmax}(\text{Dense}(a^{(1)}_{t, n}))$ for $n' \in out(n)$
Aggregation/State Update: The new hidden state $h_{t, n}$ for a node $n$ is computed by aggregating state proposals $a^{(1)}_{t, n'}$ from all its incoming neighbors $n'$ in the CFG ( $n' \in in(n)$ ). This aggregation is weighted by how likely the previous step was executing $n'$ (given by $p_{t-1, n'}$ ) and how likely the branch from $n'$ led to $n$ (given by $b_{t, n', n}$ ).

$h_{t, n} = \sum_{n' \in in(n)} p_{t-1, n'} \cdot b_{t, n', n} \cdot a^{(1)}_{t, n'}$
Instruction Pointer Update: The soft instruction pointer $p_{t, n}$ for node $n$ is updated similarly, summing the probabilities of arriving at $n$ from its incoming neighbors $n'$ based on the previous instruction pointer $p_{t-1, n'}$ and the branch decisions $b_{t, n', n}$ .

$p_{t, n} = \sum_{n' \in in(n)} p_{t-1, n'} \cdot b_{t, n', n}$

The initial state $h_{0, n}$ and $p_{0, n}$ are set up such that $p_{0, 0} = 1$ (starting at the first statement) and $p_{0, n} = 0$ for $n > 0$ , with $h_{0, n}$ typically initialized to a zero vector or an embedding of the statement $x_n$ . This temporal process is repeated for a fixed number of steps $T$ . The final prediction (e.g., the program's output) is derived from the final hidden state of the exit node $h_{T, \text{exit}}$ .

This formulation shows that IPA-GNN is a type of message-passing GNN, where messages are passed along CFG edges, but the message computation and aggregation are guided by the "instruction pointer attention" mechanism ( $p_{t, n'}$ and $b_{t, n', n}$ ) that simulates control flow. This makes it distinct from standard GNNs like GGNN or R-GAT, which aggregate messages based on fixed edge types or learned attention weights that don't explicitly model a sequential execution flow probability.

For practical implementation, the authors represent each statement $x_n$ as a tokenized 4-tuple (indentation level, operation, variable, operand). This representation is embedded into a vector space. The RNN cell (LSTM/GRU) and dense layers are standard neural network components. The number of steps $T$ is a hyperparameter; the paper uses a bounded execution setting where $T$ is set to be less than the length of the ground truth trace, forcing the model to learn shortcuts.

The paper evaluates the IPA-GNN on two tasks: learning to execute full programs and learning to execute partial programs (with a masked statement), using a dataset of synthetically generated Python-subset programs. The programs are generated according to a grammar allowing variables, arithmetic, if-else, and while loops. A key aspect of the evaluation is systematic generalization, where models are trained on simple programs (length $\le 10$ ) and tested on more complex ones (length $> 10$ ). The target output is the final value of a specific variable modulo 1000.

Results show that the IPA-GNN significantly outperforms baseline models, including standard RNNs, GNNs (GGNN, R-GAT), and ablation models (NoControl, NoExecute) lacking the full IPA-GNN structure, especially on the unseen, longer programs. This suggests that explicitly modeling the interpreter's causal structure, including probabilistic control flow, helps in generalizing to programs with more complex execution traces. The attention visualizations show that the learned soft instruction pointer often becomes nearly one-hot, effectively making discrete branch decisions, and that the model learns to skip steps compared to a literal execution trace.

Implementation Considerations and Applications:

Data Representation: Programs are converted to CFGs and statements are tokenized and embedded. This requires parsing and control flow analysis of the source code, which are standard static analysis techniques.
Model Size: The model requires hidden states $h_{t, n}$ and soft instruction pointer values $p_{t, n}$ for all nodes $n$ at each time step $t$ . This can lead to significant memory usage, especially for large programs or many time steps. Bounded execution (limiting $T$ ) is crucial for practical training and inference on longer programs.
Scalability: While the current experiments are on synthetic programs, applying this to real-world codebases requires handling more complex language features, larger CFGs, and potentially millions of execution steps. The paper suggests that the systematic generalization found on simpler examples might help scale to longer traces, but explicit techniques for handling very long sequences (e.g., attention mechanisms over traces, hierarchical models) might be necessary.
Bounded Execution: Training with a limited number of steps forces the model to learn shortcut execution strategies. This might be beneficial for tasks where predicting the exact final state quickly is desired, but might not perfectly capture the full semantics for all possible execution paths.
Partial Programs: The model's ability to handle masked statements makes it suitable for tasks like program synthesis or repair, where a model might need to reason about incomplete code or predict properties of candidate programs without full execution. The IPA-GNN could serve as a learned heuristic function in search-based program synthesis, evaluating the potential utility of partial programs.
Interpretable Attention: The soft instruction pointer $p_{t,n}$ provides some insight into which parts of the program the model is "attending" to during its simulated execution, which could aid in debugging the model or understanding its learned strategies.

In summary, the IPA-GNN provides a principled way to incorporate the sequential nature of program execution into a GNN framework by explicitly modeling the instruction pointer and branch decisions. This architectural choice leads to improved systematic generalization on learning-to-execute tasks, suggesting its potential for practical applications in static analysis and program synthesis tools.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

David Bieber (11 papers)
Charles Sutton (74 papers)
Hugo Larochelle (87 papers)
Daniel Tarlow (41 papers)

Citations (42)

View on Semantic Scholar

Learning to Execute Programs with Instruction Pointer Attention Graph Neural Networks (2010.12621v1)

Related Papers