Start
Training set: XOR OR NAND AND
Input 1 Input 2 Target Prediction
0 0 ? ?
0 1 ? ?
1 0 ? ?
1 1 ? ?
Average Error: 0.0  ·  Epoch 0

Neural Network

This visualization implements a simple neural network with backpropagation that can learn basic logical operators in real time.

What You're Looking At

The canvas displays a three-layer neural network: two input neurons on the left, five hidden neurons in the middle, and one output neuron on the right. The lines connecting them represent the weights between neurons. Light lines indicate positive weights and dark lines indicate negative weights. Thicker lines mean the weight has a larger absolute value.

Click Start to begin training. The neural network will repeatedly adjust its weights to learn the selected logical operator. You can switch between XOR, OR, NAND, and AND at any time and watch the network adapt to the new function.

How It Works

The network uses a process called feedforward to make predictions. Each neuron computes a weighted sum of its inputs, adds a bias term, and passes the result through a sigmoid activation function, which squashes the output to a value between 0 and 1. The input values flow from left to right through the network until the output neuron produces a prediction.

Learning happens through backpropagation. After each prediction, the network compares its output to the correct answer and calculates the error. It then works backwards through the layers, computing how much each weight contributed to that error and adjusting each weight by a small amount (determined by the learning rate, set to 0.5 here) in the direction that reduces the error. This is an application of the chain rule from calculus, decomposing the total error into partial derivatives with respect to each weight.

The training process:

  1. Pick a random sample from the training set (e.g., inputs [0, 1] with target output 1 for XOR).
  2. Feed forward. Pass the inputs through the network to get a prediction.
  3. Compute error. Find the difference between the prediction and the target.
  4. Backpropagate. Starting from the output layer, calculate each weight's contribution to the error and update it accordingly.
  5. Repeat. Do this thousands of times.

The Training Sets

Each logical operator defines a different mapping from two binary inputs to one binary output:

  • XOR (exclusive or) returns 1 only when exactly one input is 1.
  • OR returns 1 when at least one input is 1.
  • NAND (not and) returns 0 only when both inputs are 1.
  • AND returns 1 only when both inputs are 1.

XOR is the most interesting case. Unlike the others, it is not linearly separable, meaning no single straight line can divide the 1-outputs from the 0-outputs in input space. This is exactly why a hidden layer is necessary. A single-layer perceptron (like the one in the perceptron visualization) can learn OR, NAND, and AND, but it cannot learn XOR. The hidden layer gives the network the ability to construct internal representations that make the problem separable.

Why It Matters

The inability of single-layer networks to learn XOR was famously demonstrated by Minsky and Papert in 1969, which contributed to reduced interest in neural network research for over a decade. The rediscovery and popularization of backpropagation in the 1980s by Rumelhart, Hinton, and Williams showed that multi-layer networks could overcome this limitation.

The network you see here, while small, uses the same fundamental algorithm that powers modern deep learning. Every layer in a large language model or image classifier is doing the same thing: computing weighted sums, applying activation functions, and adjusting weights through backpropagation. The difference is scale: modern networks have billions of parameters organized into dozens or hundreds of layers, but the core principle of learning by gradient descent is unchanged.

Things to Try

Switch to AND or OR and notice how quickly the network converges, often within a few hundred epochs. Then switch to XOR and observe that it takes significantly longer. This reflects the fundamental difference in complexity: the linearly separable functions have simpler error surfaces, while XOR requires the hidden layer to discover a nonlinear mapping.

If you switch operators while the network is running, you'll see a spike in the average error as the network encounters a function its current weights are poorly suited for, followed by a gradual reduction as it readjusts.

Originally published October 15, 2014