AI Agents
What it is
An AI agent is an LLM put in a loop and handed some tools. Instead of answering your question once and stopping, it can think, take an action, observe the result, and repeat — over and over — until the goal is actually met.
The difference is the same as asking a person for directions versus hiring them to run the errand. Ask for directions and you get one reply: a list of turns. Hire someone to go fetch your parcel and they will walk out the door, check the address, ask at the desk, wait in the queue, and come back with the parcel in hand — taking real actions and adjusting as they learn what's actually there. A plain LLM gives directions. An agent runs the errand.
The model itself doesn't change. The LLM is the brain — a next-token predictor that decides what to do next. What turns it into an agent is the scaffolding around it: a loop that keeps calling the model, a set of tools it is allowed to invoke, and somewhere to keep track of what has happened so far. Retrieval is just one such tool — the agent can decide to look something up, but it can equally decide to run code, hit an API, or do arithmetic.
An AI agent is an LLM that can act in a loop — reason about a goal, call a tool, read the result, and keep going until it's done.
So the recipe is short: take a capable language model, wrap a loop around it, and give it a few tools it's allowed to call. Everything that follows on this page — the loop's four beats, the powers that make it useful, single versus multi-agent designs, and the risks — is just an elaboration of that one idea.
The core loop
Strip away the jargon and every agent runs the same four-beat cycle. The LLM never gets the whole job done in one shot — it nudges the task forward one step, looks at what came back, and decides the next step. That cycle repeats until the goal is reached or the agent gives up.
Gather what the agent currently knows: the original goal, plus everything it has done and seen so far. On the first turn that's just the user's request; later it includes the results of earlier tool calls.
The model reasons about what to do next: do I have enough to answer, or do I need to take an action first? If an action is needed, it picks which tool to call and what arguments to pass.
The agent runs the chosen tool — a web search, a code interpreter, an API request, a calculator, a database query. This is the step that reaches outside the model and touches the real world.
The tool returns something — search hits, a number, an error. That result is appended to the running context and fed back into step 1. Now the model knows more than it did a moment ago.
This interleaving of reasoning and acting has a name: ReAct (Reason + Act). The model emits a short thought ("I need today's temperature"), then an action (call the search tool), then reads the observation (the result) — and the cycle continues. Writing the thought down before acting noticeably improves the quality of the actions the model chooses, because it has reasoned about the step before committing to it.
What gives an agent its powers
The loop is the skeleton. Four capabilities layered on top are what make an agent genuinely useful rather than a chatbot that talks to itself.
The agent is told, in its prompt, about a menu of functions it may invoke — each with a name, a description, and an argument schema. Tools are how it reaches beyond its frozen training data: search the live web, run code, query a database, send an email, control a browser.
Short-term memory is the running context of the current task — the scratchpad of thoughts, actions, and observations. Long-term memory is an external store (often a vector database) the agent can write facts to and recall across sessions, so it isn't starting from zero every time.
For anything non-trivial the agent first decomposes a big, vague goal into an ordered list of smaller, concrete sub-tasks — then works through them. Good planning keeps a long task on the rails instead of wandering.
After an action — or a whole attempt — the agent looks back and asks "did that work? is this right?" If a tool errored or the result looks wrong, it can revise its plan and try again rather than blindly marching on.
Watch an agent solve a task
The animation follows a single concrete request — "What's the weather in Tokyo in Fahrenheit?" — through the agent loop. The LLM sits in the centre as the brain; two tools, Search and Calculator, sit on either side. Each step highlights the active node and the edge being used, and the running scratchpad on the right records every thought, action, and observation — exactly the ReAct trace the model is building.
Plain LLM vs agent — see the difference
Why bother with the loop and tools at all? Give the same task to a plain LLM and to an agent, and the gap is obvious. The task below needs a fresh, computed answer that a frozen model can't reliably produce on its own. Toggle between the two modes and click Step to advance.
Single-agent vs multi-agent
Once you have one agent that loops with tools, the natural next move is to wire several of them together. More agents can mean more capability — but also more ways for things to go sideways.
- One LLM, one loop — a single reason–act–observe cycle drives the whole task
- Simpler to build and debug — there is one trace to read, one place decisions are made
- Cheaper and more predictable — fewer model calls, fewer moving parts
- Best when the task is well-scoped and a handful of tools cover it
- Specialised agents collaborate — e.g. a planner, a researcher, a coder, a critic
- Often coordinated by an orchestrator that routes sub-tasks and merges results
- More powerful on big, multi-part problems — divide and conquer with focused prompts
- Harder to control — agents can talk past each other, loop, and the cost multiplies
Watch a multi-agent team work
Here a single big goal is handed to an orchestrator, which splits it into focused subtasks, routes each to a specialist agent (they can run in parallel), then merges what comes back into one answer. Step through the hand-offs.
Where they're used
Agents shine wherever a goal takes several steps, the steps aren't known in advance, and live tools or data are needed along the way.
Read a codebase, write or change files, run the tests, see what failed, and iterate — a loop of acting on the repo and observing the results until the task builds and passes.
Issue many searches, follow links, read pages, cross-check claims, and stitch the findings into a sourced answer — far beyond what one prompt could retrieve.
Look up the customer's order, check policy, issue a refund through an API, and send the confirmation — taking real actions on back-end systems, not only answering questions.
Drive a real browser or desktop: fill forms, click buttons, scrape a result, book the appointment — automating multi-step workflows across apps that have no convenient API.
Risks & limits
Agents are powerful precisely because they act on their own — which is also exactly what makes them risky. The same loop that gets work done can also get stuck, run up a bill, or take a wrong action confidently.
- Automating multi-step work that would otherwise need a human to babysit each step
- Using live tools and data — search, code, APIs — so they aren't trapped behind the knowledge cutoff
- Adapting on the fly — reacting to errors and unexpected results instead of following a fixed script
- Handling open-ended goals where the exact sequence of steps isn't known up front
- Loops and dead ends — an agent can get stuck repeating the same failed action without progress
- Hallucinated tool calls — inventing a tool, a bad argument, or a fact, then acting on it
- Cost and latency — every step is another LLM call, so long tasks are slow and expensive
- Compounding errors — a wrong early step poisons every step that builds on it
Because an agent takes real actions — spending money, sending messages, changing files — guardrails matter as much as capability. Limit which tools it can reach, cap how many steps it may run, and require human approval before anything costly or irreversible. The more autonomy you grant, the more a person needs to stay watching the loop.