Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning to Execute Programs with Instruction Pointer Attention Graph Neural Networks (2010.12621v1)

Published 23 Oct 2020 in cs.LG

Abstract: Graph neural networks (GNNs) have emerged as a powerful tool for learning software engineering tasks including code completion, bug finding, and program repair. They benefit from leveraging program structure like control flow graphs, but they are not well-suited to tasks like program execution that require far more sequential reasoning steps than number of GNN propagation steps. Recurrent neural networks (RNNs), on the other hand, are well-suited to long sequential chains of reasoning, but they do not naturally incorporate program structure and generally perform worse on the above tasks. Our aim is to achieve the best of both worlds, and we do so by introducing a novel GNN architecture, the Instruction Pointer Attention Graph Neural Networks (IPA-GNN), which achieves improved systematic generalization on the task of learning to execute programs using control flow graphs. The model arises by considering RNNs operating on program traces with branch decisions as latent variables. The IPA-GNN can be seen either as a continuous relaxation of the RNN model or as a GNN variant more tailored to execution. To test the models, we propose evaluating systematic generalization on learning to execute using control flow graphs, which tests sequential reasoning and use of program structure. More practically, we evaluate these models on the task of learning to execute partial programs, as might arise if using the model as a heuristic function in program synthesis. Results show that the IPA-GNN outperforms a variety of RNN and GNN baselines on both tasks.

This paper (Bieber et al., 2020 ) addresses the challenge of training neural networks to reason about program execution directly from static program representations like source code and control flow graphs (CFGs), without actually running the code. Existing models like Recurrent Neural Networks (RNNs) are good at sequential processing but don't naturally handle program structure like branches and loops encoded in a CFG. Graph Neural Networks (GNNs) are good at leveraging structure but struggle with the long, sequential dependencies typical of program execution traces. The authors propose the Instruction Pointer Attention Graph Neural Network (IPA-GNN) to combine the strengths of both, designed to mimic the step-by-step execution logic of a classical interpreter.

The core idea of the IPA-GNN is to model execution state and control flow explicitly. Instead of a single state for the entire program or states per node without temporal sequence, the IPA-GNN maintains a hidden state ht,nh_{t, n} for each statement (node) nn at each time step tt. It also uses a "soft instruction pointer" pt,np_{t, n}, which is a distribution over all statements, indicating the probability that the program is executing statement nn at time tt.

The update mechanism for one step of the IPA-GNN (from t1t-1 to tt) closely follows the interpreter analogy:

  1. Execution/State Proposal: For each node nn, an RNN (specifically, an LSTM or GRU in practice) processes the statement's representation xnx_n using the previous hidden state ht1,nh_{t-1, n} for that node. This produces a state proposal at,n(1)a^{(1)}_{t, n}. This step is analogous to an interpreter executing a single instruction and updating its state.

    at,n(1)=RNN(ht1,n,xn)a^{(1)}_{t, n} = \text{RNN}(h_{t-1, n}, x_n)

  2. Branch Decision: For nodes nn with multiple outgoing edges (branches, e.g., from an if or while condition), a dense layer predicts a soft distribution bt,n,nb_{t, n, n'} over the possible next statements nn' in out(n)out(n), based on the state proposal at,n(1)a^{(1)}_{t, n}. This mimics the interpreter evaluating a condition and deciding which path to take.

    bt,n,n=Softmax(Dense(at,n(1)))b_{t, n, n'} = \text{Softmax}(\text{Dense}(a^{(1)}_{t, n})) for nout(n)n' \in out(n)

  3. Aggregation/State Update: The new hidden state ht,nh_{t, n} for a node nn is computed by aggregating state proposals at,n(1)a^{(1)}_{t, n'} from all its incoming neighbors nn' in the CFG (nin(n)n' \in in(n)). This aggregation is weighted by how likely the previous step was executing nn' (given by pt1,np_{t-1, n'}) and how likely the branch from nn' led to nn (given by bt,n,nb_{t, n', n}).

    ht,n=nin(n)pt1,nbt,n,nat,n(1)h_{t, n} = \sum_{n' \in in(n)} p_{t-1, n'} \cdot b_{t, n', n} \cdot a^{(1)}_{t, n'}

  4. Instruction Pointer Update: The soft instruction pointer pt,np_{t, n} for node nn is updated similarly, summing the probabilities of arriving at nn from its incoming neighbors nn' based on the previous instruction pointer pt1,np_{t-1, n'} and the branch decisions bt,n,nb_{t, n', n}.

    pt,n=nin(n)pt1,nbt,n,np_{t, n} = \sum_{n' \in in(n)} p_{t-1, n'} \cdot b_{t, n', n}

The initial state h0,nh_{0, n} and p0,np_{0, n} are set up such that p0,0=1p_{0, 0} = 1 (starting at the first statement) and p0,n=0p_{0, n} = 0 for n>0n > 0, with h0,nh_{0, n} typically initialized to a zero vector or an embedding of the statement xnx_n. This temporal process is repeated for a fixed number of steps TT. The final prediction (e.g., the program's output) is derived from the final hidden state of the exit node hT,exith_{T, \text{exit}}.

This formulation shows that IPA-GNN is a type of message-passing GNN, where messages are passed along CFG edges, but the message computation and aggregation are guided by the "instruction pointer attention" mechanism (pt,np_{t, n'} and bt,n,nb_{t, n', n}) that simulates control flow. This makes it distinct from standard GNNs like GGNN or R-GAT, which aggregate messages based on fixed edge types or learned attention weights that don't explicitly model a sequential execution flow probability.

For practical implementation, the authors represent each statement xnx_n as a tokenized 4-tuple (indentation level, operation, variable, operand). This representation is embedded into a vector space. The RNN cell (LSTM/GRU) and dense layers are standard neural network components. The number of steps TT is a hyperparameter; the paper uses a bounded execution setting where TT is set to be less than the length of the ground truth trace, forcing the model to learn shortcuts.

The paper evaluates the IPA-GNN on two tasks: learning to execute full programs and learning to execute partial programs (with a masked statement), using a dataset of synthetically generated Python-subset programs. The programs are generated according to a grammar allowing variables, arithmetic, if-else, and while loops. A key aspect of the evaluation is systematic generalization, where models are trained on simple programs (length 10\le 10) and tested on more complex ones (length >10> 10). The target output is the final value of a specific variable modulo 1000.

Results show that the IPA-GNN significantly outperforms baseline models, including standard RNNs, GNNs (GGNN, R-GAT), and ablation models (NoControl, NoExecute) lacking the full IPA-GNN structure, especially on the unseen, longer programs. This suggests that explicitly modeling the interpreter's causal structure, including probabilistic control flow, helps in generalizing to programs with more complex execution traces. The attention visualizations show that the learned soft instruction pointer often becomes nearly one-hot, effectively making discrete branch decisions, and that the model learns to skip steps compared to a literal execution trace.

Implementation Considerations and Applications:

  • Data Representation: Programs are converted to CFGs and statements are tokenized and embedded. This requires parsing and control flow analysis of the source code, which are standard static analysis techniques.
  • Model Size: The model requires hidden states ht,nh_{t, n} and soft instruction pointer values pt,np_{t, n} for all nodes nn at each time step tt. This can lead to significant memory usage, especially for large programs or many time steps. Bounded execution (limiting TT) is crucial for practical training and inference on longer programs.
  • Scalability: While the current experiments are on synthetic programs, applying this to real-world codebases requires handling more complex language features, larger CFGs, and potentially millions of execution steps. The paper suggests that the systematic generalization found on simpler examples might help scale to longer traces, but explicit techniques for handling very long sequences (e.g., attention mechanisms over traces, hierarchical models) might be necessary.
  • Bounded Execution: Training with a limited number of steps forces the model to learn shortcut execution strategies. This might be beneficial for tasks where predicting the exact final state quickly is desired, but might not perfectly capture the full semantics for all possible execution paths.
  • Partial Programs: The model's ability to handle masked statements makes it suitable for tasks like program synthesis or repair, where a model might need to reason about incomplete code or predict properties of candidate programs without full execution. The IPA-GNN could serve as a learned heuristic function in search-based program synthesis, evaluating the potential utility of partial programs.
  • Interpretable Attention: The soft instruction pointer pt,np_{t,n} provides some insight into which parts of the program the model is "attending" to during its simulated execution, which could aid in debugging the model or understanding its learned strategies.

In summary, the IPA-GNN provides a principled way to incorporate the sequential nature of program execution into a GNN framework by explicitly modeling the instruction pointer and branch decisions. This architectural choice leads to improved systematic generalization on learning-to-execute tasks, suggesting its potential for practical applications in static analysis and program synthesis tools.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. David Bieber (11 papers)
  2. Charles Sutton (74 papers)
  3. Hugo Larochelle (87 papers)
  4. Daniel Tarlow (41 papers)
Citations (42)