Filters & Kernels — What They Learn

A kernel is a pattern detector

The numbers inside a convolution kernel decide what it responds to. One set of numbers detects vertical edges; another blurs; another sharpens. The kernel "lights up" wherever the image matches its pattern.

In classic image processing, people hand-designed these kernels. In a CNN, the network learns them from data — discovering whatever detectors best solve the task.

Same image, different kernels

Watch one little image transformed by an edge kernel, a blur kernel, and a sharpen kernel — then the feature hierarchy a deep CNN builds.

The feature hierarchy

Early layers edges & colors

The first conv layers learn simple detectors: edges at various angles, color blobs, gradients.

Middle layers textures & patterns

Combinations of edges become corners, curves, textures, repeating motifs.

Deep layers parts & objects

Combinations of textures become eyes, wheels, faces — whole object parts.

Channels & depth

A conv layer learns many filters (e.g. 64), each producing a feature map (a channel). The next layer's filters span all those channels, combining lower features into higher ones.

Why learned beats hand-designed

Let the data decide

Hand-crafted kernels capture what humans think matters. Learned kernels capture what actually minimizes the loss — often subtle detectors no one would design. That shift is a big part of why deep learning overtook classical computer vision.