AI Agents · Suman Bhadra Notes

What it is

An AI agent is an LLM put in a loop and handed some tools. Instead of answering your question once and stopping, it can think, take an action, observe the result, and repeat — over and over — until the goal is actually met.

The difference is the same as asking a person for directions versus hiring them to run the errand. Ask for directions and you get one reply: a list of turns. Hire someone to go fetch your parcel and they will walk out the door, check the address, ask at the desk, wait in the queue, and come back with the parcel in hand — taking real actions and adjusting as they learn what's actually there. A plain LLM gives directions. An agent runs the errand.

The model itself doesn't change. The LLM is the brain — a next-token predictor that decides what to do next. What turns it into an agent is the scaffolding around it: a loop that keeps calling the model, a set of tools it is allowed to invoke, and somewhere to keep track of what has happened so far. Retrieval is just one such tool — the agent can decide to look something up, but it can equally decide to run code, hit an API, or do arithmetic.

In one sentence

An AI agent is an LLM that can act in a loop — reason about a goal, call a tool, read the result, and keep going until it's done.

So the recipe is short: take a capable language model, wrap a loop around it, and give it a few tools it's allowed to call. Everything that follows on this page — the loop's four beats, the powers that make it useful, single versus multi-agent designs, and the risks — is just an elaboration of that one idea.

The core loop

Strip away the jargon and every agent runs the same four-beat cycle. The LLM never gets the whole job done in one shot — it nudges the task forward one step, looks at what came back, and decides the next step. That cycle repeats until the goal is reached or the agent gives up.

1 — Perceive read the goal + latest observation

Gather what the agent currently knows: the original goal, plus everything it has done and seen so far. On the first turn that's just the user's request; later it includes the results of earlier tool calls.

2 — Think / Reason the LLM decides the next step

The model reasons about what to do next: do I have enough to answer, or do I need to take an action first? If an action is needed, it picks which tool to call and what arguments to pass.

3 — Act call a tool

The agent runs the chosen tool — a web search, a code interpreter, an API request, a calculator, a database query. This is the step that reaches outside the model and touches the real world.

4 — Observe feed the result back

The tool returns something — search hits, a number, an error. That result is appended to the running context and fed back into step 1. Now the model knows more than it did a moment ago.

The ReAct pattern

This interleaving of reasoning and acting has a name: ReAct (Reason + Act). The model emits a short thought ("I need today's temperature"), then an action (call the search tool), then reads the observation (the result) — and the cycle continues. Writing the thought down before acting noticeably improves the quality of the actions the model chooses, because it has reasoned about the step before committing to it.

What gives an agent its powers

The loop is the skeleton. Four capabilities layered on top are what make an agent genuinely useful rather than a chatbot that talks to itself.

Tools functions it can call

The agent is told, in its prompt, about a menu of functions it may invoke — each with a name, a description, and an argument schema. Tools are how it reaches beyond its frozen training data: search the live web, run code, query a database, send an email, control a browser.

Memory short-term + long-term

Short-term memory is the running context of the current task — the scratchpad of thoughts, actions, and observations. Long-term memory is an external store (often a vector database) the agent can write facts to and recall across sessions, so it isn't starting from zero every time.

Planning break a goal into steps

For anything non-trivial the agent first decomposes a big, vague goal into an ordered list of smaller, concrete sub-tasks — then works through them. Good planning keeps a long task on the rails instead of wandering.

Reflection critique and retry

After an action — or a whole attempt — the agent looks back and asks "did that work? is this right?" If a tool errored or the result looks wrong, it can revise its plan and try again rather than blindly marching on.

Watch an agent solve a task

The animation follows a single concrete request — "What's the weather in Tokyo in Fahrenheit?" — through the agent loop. The LLM sits in the centre as the brain; two tools, Search and Calculator, sit on either side. Each step highlights the active node and the edge being used, and the running scratchpad on the right records every thought, action, and observation — exactly the ReAct trace the model is building.

Plain LLM vs agent — see the difference

Why bother with the loop and tools at all? Give the same task to a plain LLM and to an agent, and the gap is obvious. The task below needs a fresh, computed answer that a frozen model can't reliably produce on its own. Toggle between the two modes and click Step to advance.

LLM / brain tool active step correct wrong

Single-agent vs multi-agent

Once you have one agent that loops with tools, the natural next move is to wire several of them together. More agents can mean more capability — but also more ways for things to go sideways.

Single agent with tools

One LLM, one loop — a single reason–act–observe cycle drives the whole task
Simpler to build and debug — there is one trace to read, one place decisions are made
Cheaper and more predictable — fewer model calls, fewer moving parts
Best when the task is well-scoped and a handful of tools cover it

Multi-agent systems

Specialised agents collaborate — e.g. a planner, a researcher, a coder, a critic
Often coordinated by an orchestrator that routes sub-tasks and merges results
More powerful on big, multi-part problems — divide and conquer with focused prompts
Harder to control — agents can talk past each other, loop, and the cost multiplies

Watch a multi-agent team work

Here a single big goal is handed to an orchestrator, which splits it into focused subtasks, routes each to a specialist agent (they can run in parallel), then merges what comes back into one answer. Step through the hand-offs.

Where they're used

Agents shine wherever a goal takes several steps, the steps aren't known in advance, and live tools or data are needed along the way.

Coding assistants read, edit, run, fix

Read a codebase, write or change files, run the tests, see what failed, and iterate — a loop of acting on the repo and observing the results until the task builds and passes.

Research / deep search gather and synthesise

Issue many searches, follow links, read pages, cross-check claims, and stitch the findings into a sourced answer — far beyond what one prompt could retrieve.

Customer support resolve, not just reply

Look up the customer's order, check policy, issue a refund through an API, and send the confirmation — taking real actions on back-end systems, not only answering questions.

Computer / browser use click, type, navigate

Drive a real browser or desktop: fill forms, click buttons, scrape a result, book the appointment — automating multi-step workflows across apps that have no convenient API.

Risks & limits

Agents are powerful precisely because they act on their own — which is also exactly what makes them risky. The same loop that gets work done can also get stuck, run up a bill, or take a wrong action confidently.

What they're good at

Automating multi-step work that would otherwise need a human to babysit each step
Using live tools and data — search, code, APIs — so they aren't trapped behind the knowledge cutoff
Adapting on the fly — reacting to errors and unexpected results instead of following a fixed script
Handling open-ended goals where the exact sequence of steps isn't known up front

Where they struggle

Loops and dead ends — an agent can get stuck repeating the same failed action without progress
Hallucinated tool calls — inventing a tool, a bad argument, or a fact, then acting on it
Cost and latency — every step is another LLM call, so long tasks are slow and expensive
Compounding errors — a wrong early step poisons every step that builds on it

Keep a human in the loop

Because an agent takes real actions — spending money, sending messages, changing files — guardrails matter as much as capability. Limit which tools it can reach, cap how many steps it may run, and require human approval before anything costly or irreversible. The more autonomy you grant, the more a person needs to stay watching the loop.