What is a Vector Database

vector db embeddings similarity search ann

What it is

A vector database is a database built to store and search embedding vectors — fixed-length lists of numbers that capture the meaning of text, images, or audio.

Instead of looking for rows that match a keyword or a number, it answers a different kind of question: "of all the items I've stored, which ones mean the most similar thing to this one?" It does that in milliseconds, even across millions of vectors.

In one sentence

It finds items by meaning, not by matching keywords.

Why a regular database isn't enough

Traditional databases — Postgres, MySQL, MongoDB — are brilliant at the questions they were built for. They start to creak the moment the question becomes "what is similar to this?".

Keyword search exact words only

A search for "refund" won't find a document that says "money back" — even though they mean the same thing.

Structured filters great for WHERE

SQL nails price > 100 AND city = 'NYC'. It has no opinion about whether two paragraphs of text feel alike.

Embeddings change the question vectors as the unit

Once each item is a vector, the question becomes "which of my million vectors is closest to this one?" — and you need an engine that can answer that fast.

A tiny worked example

Imagine you store just four sentences:

An embedding model turns each sentence into a list of numbers. Texts with similar meaning land at nearby points in that high-dimensional space, so geometric closeness becomes a proxy for semantic similarity. A and B are about returns, so their vectors point in nearly the same direction. C and D are about completely different things, so their vectors land far away — both from each other and from the returns cluster.

Now a customer asks: "How do I send something back?" The vector database embeds that question with the same model and returns the two stored vectors closest to it — A and B — even though the question shares no words with either. Closeness is usually measured with cosine similarity: the cosine of the angle between two vectors, ranging from -1 (opposite) through 0 (unrelated) to 1 (identical direction).

Why we draw 2-D

Real embedding models produce vectors with 384, 768, or 1536 dimensions. We can't picture that, so the diagrams below squash everything down to two — the geometry still tells the story.

How it works

Under the hood, every vector database does the same four things — even when the marketing pages disagree about what to call them.

Step 1 Embed

Run each input — a paragraph, image, or audio clip — through an embedding model. Out comes a vector of fixed length.

Step 2 Index

Organise the vectors so neighbours can be found fast. Two common ANN structures: HNSW — a layered graph with shortcut links between similar vectors; IVF — group vectors into clusters and only search the nearest few clusters at query time.

Step 3 Query

The query (text, image, whatever) is embedded with the same model so it lives in the same space as the stored vectors.

Step 4 Search

Cosine similarity (or dot product) returns the top-k nearest vectors, along with the original payload — the actual text, image URL, or row id you stored next to each one.

Watch the flow

The animation runs end-to-end: four sentences get embedded into a 2-D vector space, the database connects neighbours into an index, a query arrives, gets embedded into the same space, and the nearest matches light up.

Where it's used

Anywhere the question is "show me things like this", a vector database is probably involved.

RAG grounding LLMs

Fetch the most relevant document chunks for each question and paste them into the prompt — so the model answers from your data, not its training. See How RAG Works.

Semantic search intent over keywords

Search bars that understand "how to cancel" means the same as "end subscription" — without anyone hand-curating synonyms.

Recommendations "more like this"

Users, products, songs, articles — embed each one, and "similar to what you just liked" becomes a nearest-neighbour lookup.

Image & audio similarity beyond text

Reverse image search, duplicate detection, copyright matching, voice-print lookup — all the same shape: embed, index, query.

Vector DB vs. traditional database

They aren't rivals — they answer different questions. Most production systems run both side by side.

Reach for a vector DB when
  • The question is "what is similar to X?"
  • The data is unstructured — text, images, audio, code
  • You already have (or can compute) embeddings
  • You need top-k semantic results, not exact matches
Stay with a traditional DB when
  • You need exact matches on ids, emails, SKUs
  • The query is a rangeprice BETWEEN …, created_at > …
  • You rely on joins across many tables
  • You need transactions and strong consistency on structured fields
Common pitfalls
  • Different embedding model at index vs. query time — the vectors land in incompatible spaces and similarity becomes meaningless. Pin the model version.
  • Stale index — when the embedding model changes, every stored vector needs to be recomputed. Skipping that quietly degrades quality.
  • ANN ≠ exact — approximate search trades a small amount of recall for huge speed gains. Recall is tunable, not 100%.
  • Forgetting metadata filters — pure similarity is rarely enough in production. Most systems combine vector search with WHERE tenant_id = … filters so a user only matches their own data.

Popular options

You'll meet these names in the wild — they all implement the same idea behind different APIs and trade-offs.

Pinecone, Weaviate, Qdrant, and Milvus are dedicated vector databases (managed or self-hosted). Chroma is a lightweight embedded option popular in prototypes. pgvector is a Postgres extension — handy when you already run Postgres and want vector search next to your relational data. FAISS is a library, not a server: a fast in-process index from Meta that many systems wrap underneath.