What is a Neural Network
What it is
A neural network is a stack of tiny calculators that learn by adjusting themselves until their answer matches reality.
Each calculator — called a neuron — takes a few numbers in, multiplies them by its own weights, adds a small offset, and passes the result through a simple "should I fire?" function. Wire thousands or millions of those neurons together in layers, give the network examples of what the right answer looks like, and it slowly tunes its weights until the answers it produces line up with the truth.
The brain analogy is loose. Real biological neurons inspired the original idea in the 1940s, but a modern artificial neural network is just linear algebra plus a non-linear squish, repeated many times. No biology required.
A neural network turns inputs into outputs by passing numbers through layers of weighted sums and non-linear functions, with the weights learned from examples.
The building block — a neuron
Every neuron does the same four-part computation. Once you understand one neuron, you understand the whole network.
The numbers coming in — either raw features (pixel values, sensor readings) or the outputs of an earlier layer.
One per input. They say how much each input matters. These are the numbers the network learns.
A small offset added at the end. It lets the neuron fire even when every input happens to be zero.
A non-linear function applied to the sum. Without it, stacking layers would collapse into a single line.
output = f(w₁·x₁ + w₂·x₂ + … + wₙ·xₙ + b)
Common choices for f: ReLU (zero if negative, otherwise pass-through — the workhorse), sigmoid (squashes into 0…1, used for probabilities), and tanh (squashes into −1…1).
The architecture — layers
Neurons are arranged in columns called layers. The output of one layer becomes the input of the next, and information flows left to right.
One neuron per feature. For an image, that's one per pixel; for tabular data, one per column.
Where the actual pattern-finding happens. Each layer learns to detect features built from the previous layer's features.
One neuron per possible output. For "cat or dog", two. For a single price prediction, one.
"Deep" just means more than one hidden layer. Modern networks routinely stack dozens or hundreds — each layer building on the last to spot more abstract patterns (edges → shapes → objects).
Watch the network think
The animation below trains a tiny 3 → 4 → 2 network on a toy task: tell apple from orange using three features (color, weight, sweetness). Watch the inputs flow forward into a prediction, the loss appear, and backpropagation nudge the weights — then the same forward pass run again with a smaller error.
How it learns
Training is just a loop of three steps, repeated on every example in the dataset.
Push the example through the network layer by layer. The output is the network's current guess.
Compare the prediction to the true answer with a loss function. A bigger gap means a bigger loss.
Push the error backward through the network. Each weight learns which direction to move to shrink the loss — that move is gradient descent.
Repeat forward → loss → backprop over millions of examples until the loss stops shrinking. The final weights are the trained model.
Where it's used
Anywhere there's a lot of data and a pattern that's too messy to write down by hand.
Face unlock, photo tagging, medical scan analysis, self-driving perception, defect detection on factory lines.
Translation, search, chatbots, code assistants, summarisation — every modern LLM is a giant neural network.
Speech-to-text, voice assistants, music generation, noise cancellation, speaker identification.
What Netflix shows next, what YouTube autoplays, what Spotify slides into your discover queue.
Self-driving steering, drone stabilisation, robot arms learning to grasp unfamiliar objects.
Protein structure prediction, weather forecasting, particle-physics event classification, drug discovery.
Common types — a quick map
The neuron is the same everywhere; the wiring changes to fit the data.
Layers fully connected to the next. The kind drawn in the visualiser above. Good for tabular data.
Convolutional layers slide small filters across the image to detect edges, textures, and shapes.
Loops back on itself to handle one token at a time. Largely replaced by transformers for language.
Attention lets every position look at every other position in parallel. Powers modern LLMs, vision, and audio models.
When it works — and when it doesn't
- Lots of data is available — usually thousands to millions of examples
- The pattern is real but hard to write down as rules
- You have GPU compute to train and serve the model
- Being right most of the time is good enough
- Data is tiny — a simpler model usually wins
- You need to explain why a decision was made
- The task demands strict guarantees (safety-critical rules, accounting)
- The world shifts away from the data the model was trained on