What is a Neural Network · Suman Bhadra Notes

What it is

A neural network is a stack of tiny calculators that learn by adjusting themselves until their answer matches reality.

Each calculator — called a neuron — takes a few numbers in, multiplies them by its own weights, adds a small offset, and passes the result through a simple "should I fire?" function. Wire thousands or millions of those neurons together in layers, give the network examples of what the right answer looks like, and it slowly tunes its weights until the answers it produces line up with the truth.

The brain analogy is loose. Real biological neurons inspired the original idea in the 1940s, but a modern artificial neural network is just linear algebra plus a non-linear squish, repeated many times. No biology required.

In one sentence

A neural network turns inputs into outputs by passing numbers through layers of weighted sums and non-linear functions, with the weights learned from examples.

The building block — a neuron

Every neuron does the same four-part computation. Once you understand one neuron, you understand the whole network.

Inputs x₁, x₂, x₃ …

The numbers coming in — either raw features (pixel values, sensor readings) or the outputs of an earlier layer.

Weights w₁, w₂, w₃ …

One per input. They say how much each input matters. These are the numbers the network learns.

Bias b

A small offset added at the end. It lets the neuron fire even when every input happens to be zero.

Activation f(·)

A non-linear function applied to the sum. Without it, stacking layers would collapse into a single line.

The neuron equation

output = f(w₁·x₁ + w₂·x₂ + … + wₙ·xₙ + b)

Common choices for f: ReLU (zero if negative, otherwise pass-through — the workhorse), sigmoid (squashes into 0…1, used for probabilities), and tanh (squashes into −1…1).

The architecture — layers

Neurons are arranged in columns called layers. The output of one layer becomes the input of the next, and information flows left to right.

Input layer your features

One neuron per feature. For an image, that's one per pixel; for tabular data, one per column.

Hidden layer(s) the workhorse

Where the actual pattern-finding happens. Each layer learns to detect features built from the previous layer's features.

Output layer the answer

One neuron per possible output. For "cat or dog", two. For a single price prediction, one.

Why "deep" learning

"Deep" just means more than one hidden layer. Modern networks routinely stack dozens or hundreds — each layer building on the last to spot more abstract patterns (edges → shapes → objects).

Watch the network think

The animation below trains a tiny 3 → 4 → 2 network on a toy task: tell apple from orange using three features (color, weight, sweetness). Watch the inputs flow forward into a prediction, the loss appear, and backpropagation nudge the weights — then the same forward pass run again with a smaller error.

How it learns

Training is just a loop of three steps, repeated on every example in the dataset.

1 — Forward pass inputs → prediction

Push the example through the network layer by layer. The output is the network's current guess.

2 — Loss how wrong was it?

Compare the prediction to the true answer with a loss function. A bigger gap means a bigger loss.

3 — Backpropagation nudge the weights

Push the error backward through the network. Each weight learns which direction to move to shrink the loss — that move is gradient descent.

Training in one line

Repeat forward → loss → backprop over millions of examples until the loss stops shrinking. The final weights are the trained model.

Where it's used

Anywhere there's a lot of data and a pattern that's too messy to write down by hand.

Vision pixels → meaning

Face unlock, photo tagging, medical scan analysis, self-driving perception, defect detection on factory lines.

Language text → text

Translation, search, chatbots, code assistants, summarisation — every modern LLM is a giant neural network.

Audio sound → signal

Speech-to-text, voice assistants, music generation, noise cancellation, speaker identification.

Recommendations behaviour → ranking

What Netflix shows next, what YouTube autoplays, what Spotify slides into your discover queue.

Robotics & control sensors → action

Self-driving steering, drone stabilisation, robot arms learning to grasp unfamiliar objects.

Science data → discovery

Protein structure prediction, weather forecasting, particle-physics event classification, drug discovery.

Common types — a quick map

The neuron is the same everywhere; the wiring changes to fit the data.

Feedforward (MLP) the basic shape

Layers fully connected to the next. The kind drawn in the visualiser above. Good for tabular data.

CNN images

Convolutional layers slide small filters across the image to detect edges, textures, and shapes.

RNN / LSTM sequences

Loops back on itself to handle one token at a time. Largely replaced by transformers for language.

Transformer today's default

Attention lets every position look at every other position in parallel. Powers modern LLMs, vision, and audio models.

When it works — and when it doesn't

Works well when

Lots of data is available — usually thousands to millions of examples
The pattern is real but hard to write down as rules
You have GPU compute to train and serve the model
Being right most of the time is good enough

Struggles when

Data is tiny — a simpler model usually wins
You need to explain why a decision was made
The task demands strict guarantees (safety-critical rules, accounting)
The world shifts away from the data the model was trained on