A 34-minute working session on AI agents — what an agent really is, how tool calling works, the reason → act → observe loop, planning and memory, when an agent beats a plain prompt, and the frameworks that wire it all together.
A plain prompt is a single round trip: you ask, the model answers, and it is done. An agent wraps that same model in a loop and hands it tools — so it can search, read a file, call an API, check the result, and decide what to do next, all on its own, until the task is actually finished.
Good when the model already knows enough to answer in one go.
Good when the task needs tools, fresh data, or several steps.
Like the difference between asking a colleague a quick question and getting one reply, versus handing them a task and letting them open files, run searches, and check results until the job is done.
Agents are not a new kind of model — Claude, GPT, Gemini, and open-weight models like Llama all act as the "brain". The agent is the pattern you build around them.
This is the one mechanic that makes agents possible. You describe a set of tools to the model. When it wants one, it does not run anything — it asks for it by name, with arguments, and pauses. Your code runs the real function and hands the result back. Then the model keeps going.
get_weather({ city: "Berlin" }) — but it cannot execute code. Your program runs the function and returns the output as the next message.The model requests getWeather; your code executes it and feeds the result back as a message.
The model reads each tool's description like documentation. Clear names and one-line "use this when…" descriptions are the difference between the right tool and a wrong guess.
Arguments come back as structured JSON validated against your schema. Reject bad input early — a model can hallucinate an argument just as easily as a value.
The tool's output is appended to the conversation as a new message. The model now "sees" it and reasons about what to do next — answer, or call another tool.
Like a chef calling out an order to the kitchen. The chef decides what is needed and names it; the line cooks actually do the work and hand back the plate. The chef never touches the stove.
Once a model can call tools, an agent is just a small loop around it. The model reasons about the goal, acts by calling a tool, observes the result, and goes around again — adding each new observation to the conversation — until it decides it is done.
Each lap adds an observation; the loop exits the moment the model answers without asking for a tool.
finish()tool it must call to declare success, so "done" is explicit.Each turn the model sees the latest result, so it can recover from a failed call, refine a search, or change plan — something a single prompt can never do. That feedback is the whole point.
Like a mechanic: look, try something, check whether it worked, try the next thing — not a single guess with the hood closed.
A two-step task survives on the raw loop. A ten-step task wanders unless the agent plans — breaks the goal into ordered steps — and unless it has memory to carry useful facts forward instead of re-deriving them every turn.
Planning turns a vague goal into steps the loop can execute and you can audit.
The running conversation — every message, tool call, and result so far. It is the agent's working memory, but it is finite: too many steps and early details fall out or get summarized away.
Facts, past decisions, or documents kept in a database the agent can search — often a vector database. It pulls back only the relevant pieces when it needs them, instead of holding everything in the window.
An agent trades determinism for autonomy. That autonomy is exactly what makes it useful on open-ended work — and exactly what makes it unpredictable, slow, and expensive when the task did not need it. The skill is knowing which jobs deserve a loop.
Without a cap a stuck agent loops forever, burning tokens. A step limit bounds the damage.
You can hand-write the loop — and for a simple agent you often should. But frameworks handle the fiddly parts: tool wiring, state, memory, retries, streaming, and tracing. Four lead the field; the right one depends on your stack and how much control you want.
LangChain is a broad toolkit (Python and JS) of chains, tools, and integrations. LangGraph is its graph-based engine for stateful agents — explicit nodes, branches, loops, and human-in-the-loop checkpoints.
Reach for it when you need complex, long-running, stateful agents and want explicit control over the graph of steps.
A TypeScript SDK for integrating LLMs into products: generateText / streamText with tool calling and a built-in agent loop, one API across providers, and first-class streaming and MCP support.
Reach for it when you are shipping agent or chat features inside a TypeScript / Next.js product UI.
A Python framework for orchestrating several agents with roles and tasks — a "researcher", a "writer", a "reviewer" — that collaborate on a goal.
Reach for it when a problem maps naturally onto a few specialized roles working together.
A Python framework from Microsoft for multi-agent systems built around agents that talk to each other (and to tools) to solve a task, with strong research-grade conversation patterns.
Reach for it when you are exploring multi-agent conversation patterns or research-style prototypes.
Put it together: a small support-triage agent. The goal comes in; the model reasons, calls tools, observes results, and loops until it can answer — capped and logged the whole way.
One lap of reason → act → observe, then a final answer — the whole pattern in miniature.
Five quick questions on agents, tool calling, the loop, memory, and the tooling — instant feedback, no sign-in.
Navigate with ← → or scroll · back to library