Prompt Engineering · Suman Bhadra Notes

The prompt is the program

An LLM's weights are frozen — you can't retrain it on the fly. The one lever you control is the prompt: the text you feed in. Because the model learns patterns from its context (in-context learning), the way you frame a request can be the difference between a vague answer and exactly what you wanted.

Zero-shot just ask

State the task, nothing else. Works for easy, common requests.

Few-shot show examples

Add a handful of input → output examples so the model copies the pattern and format.

Chain-of-thought ask it to reason

"Think step by step" — let the model write its reasoning before the answer, which boosts accuracy on hard tasks.

Build the prompt up, layer by layer

The task stays fixed — classify the sentiment of a tricky, mixed review. Watch each layer get added to the prompt on the left, and the answer quality on the right climb from a coin-flip guess to a reliable, well-formatted reply.

Practical levers that move the needle

Be specific role + format

Say who the model should act as and the exact output shape you want ("reply with one word", "return JSON").

Give examples 2–5 is plenty

Few-shot examples pin down formatting and edge cases far better than a long description.

Let it think reason → answer

For math, logic and multi-step tasks, ask for the working first. Pairs well with low temperature.

When prompting isn't enough

If you need the model to reliably know facts it wasn't trained on, reach for RAG. If you need a permanent change in style or behaviour across thousands of calls, consider fine-tuning. Prompting first — it's the cheapest knob.

The animation walked one path — but the ingredients combine in any order, and they interact. Mix your own prompt below (the question is always included). Every combination shows the answer you'd typically get, its reliability, and what the extra structure costs in tokens.

Try chain-of-thought alone: the model reasons nicely but rambles, because nothing pinned the output format. Examples alone fix the format but skip the reasoning. The ingredients cover each other's weaknesses — that's why real prompts stack them, and why each one costs tokens on every single call.

Zero-shot vs few-shot, at a glance

Zero-shot can wobble

No format guidance → inconsistent output
Ambiguous tasks get guessed
Fine for simple, everyday asks

Few-shot + structure

Examples lock in the output format
Edge cases are demonstrated, not described
A system prompt keeps tone & rules consistent