Prompt Engineering
The prompt is the program
An LLM's weights are frozen — you can't retrain it on the fly. The one lever you control is the prompt: the text you feed in. Because the model learns patterns from its context (in-context learning), the way you frame a request can be the difference between a vague answer and exactly what you wanted.
State the task, nothing else. Works for easy, common requests.
Add a handful of input → output examples so the model copies the pattern and format.
"Think step by step" — let the model write its reasoning before the answer, which boosts accuracy on hard tasks.
Build the prompt up, layer by layer
The task stays fixed — classify the sentiment of a tricky, mixed review. Watch each layer get added to the prompt on the left, and the answer quality on the right climb from a coin-flip guess to a reliable, well-formatted reply.
Practical levers that move the needle
Say who the model should act as and the exact output shape you want ("reply with one word", "return JSON").
Few-shot examples pin down formatting and edge cases far better than a long description.
For math, logic and multi-step tasks, ask for the working first. Pairs well with low temperature.
If you need the model to reliably know facts it wasn't trained on, reach for RAG. If you need a permanent change in style or behaviour across thousands of calls, consider fine-tuning. Prompting first — it's the cheapest knob.
The animation walked one path — but the ingredients combine in any order, and they interact. Mix your own prompt below (the question is always included). Every combination shows the answer you'd typically get, its reliability, and what the extra structure costs in tokens.
Try chain-of-thought alone: the model reasons nicely but rambles, because nothing pinned the output format. Examples alone fix the format but skip the reasoning. The ingredients cover each other's weaknesses — that's why real prompts stack them, and why each one costs tokens on every single call.
Zero-shot vs few-shot, at a glance
- No format guidance → inconsistent output
- Ambiguous tasks get guessed
- Fine for simple, everyday asks
- Examples lock in the output format
- Edge cases are demonstrated, not described
- A system prompt keeps tone & rules consistent