Library
00/07 · ~34 min
GUIDEDECK · for engineers wiring up LLMs

LLMs that take
actions,
not just answers.

A 34-minute working session on AI agents — what an agent really is, how tool calling works, the reason → act → observe loop, planning and memory, when an agent beats a plain prompt, and the frameworks that wire it all together.

~34 MINBEGINNER → INTERMEDIATEMODEL-AGNOSTIC
SCROLL
01 · What an agent is 4 min

An agent is an LLM
that uses tools in a loop.

A plain prompt is a single round trip: you ask, the model answers, and it is done. An agent wraps that same model in a loop and hands it tools — so it can search, read a file, call an API, check the result, and decide what to do next, all on its own, until the task is actually finished.

Agent — an LLM running inside a loop that can call tools (functions you give it) and read the results, repeating until it reaches a goal. The model supplies the decision-making; your code supplies the tools and runs the loop. Three words to keep straight: LLM (the model that predicts text), tool (a function the model may ask you to run), and loop (calling the model again and again until it is done).
A single prompt — one shot
prompt LLM answer ONE PASS · NO ACTIONS

Good when the model already knows enough to answer in one go.

An agent — a loop with tools
goal LLM tools done? REASON · ACT · OBSERVE · REPEAT

Good when the task needs tools, fresh data, or several steps.

The same model, two postures

  • Prompt: you do the thinking about which steps to run; the model just fills in one blank.
  • Agent: you hand over the goal and the tools, and the model decides the steps — including how many.
  • Nothing about the model changes. What changes is the harness around it — the loop and the tools.

Like the difference between asking a colleague a quick question and getting one reply, versus handing them a task and letting them open files, run searches, and check results until the job is done.

Agents are not a new kind of model — Claude, GPT, Gemini, and open-weight models like Llama all act as the "brain". The agent is the pattern you build around them.

02 · Tool calling 5 min

The model picks the tool.
You run it.

This is the one mechanic that makes agents possible. You describe a set of tools to the model. When it wants one, it does not run anything — it asks for it by name, with arguments, and pauses. Your code runs the real function and hands the result back. Then the model keeps going.

Tool (a.k.a. function call) — a function you expose to the model: a name, a plain-language description of when to use it, and a schema for its arguments. The model can read the description and emit a request like get_weather({ city: "Berlin" }) — but it cannot execute code. Your program runs the function and returns the output as the next message.
const getWeather = tool({ description: "Look up the current weather for a city", parameters: { city: z.string() }, // the arg schema execute: async ({ city }) => { return await weatherApi(city) // YOUR code runs here }, }) // the model only ever asks for getWeather — it never runs it
user: weather? LLM asks for a tool getWeather({ city: "Berlin" }) your code runs it → 18°C, clear LLM uses the result to answer

The model requests getWeather; your code executes it and feeds the result back as a message.

name + description

How the model chooses

The model reads each tool's description like documentation. Clear names and one-line "use this when…" descriptions are the difference between the right tool and a wrong guess.

schema

Typed arguments

Arguments come back as structured JSON validated against your schema. Reject bad input early — a model can hallucinate an argument just as easily as a value.

result

Back into context

The tool's output is appended to the conversation as a new message. The model now "sees" it and reasons about what to do next — answer, or call another tool.

Like a chef calling out an order to the kitchen. The chef decides what is needed and names it; the line cooks actually do the work and hand back the plate. The chef never touches the stove.

03 · The agent loop 5 min

Reason → act → observe,
then repeat.

Once a model can call tools, an agent is just a small loop around it. The model reasons about the goal, acts by calling a tool, observes the result, and goes around again — adding each new observation to the conversation — until it decides it is done.

The agent loop — sometimes called ReAct (reason + act) — is the repeating cycle: think about what is needed, call a tool, read the result, repeat. Each turn the model sees everything that happened so far, so it can course-correct. The loop ends when the model answers with no tool call — or when you stop it.
let messages = [{ role: "user", content: goal }] for (let step = 0; step < 10; step++) { // always cap the steps const reply = await model({ messages, tools }) if (!reply.toolCall) return reply.text // no tool → done const result = await run(reply.toolCall) // act messages.push(reply, toolResult(result)) // observe }
reason LLM act · tool observe no tool → answer

Each lap adds an observation; the loop exits the moment the model answers without asking for a tool.

When does it stop?

  • Natural stop — the model replies with a final answer and no tool call. That is the happy path.
  • Step cap — a hard limit (say 10 turns) so a confused agent cannot loop forever. Non-negotiable.
  • Budget / timeout — stop after N tokens, N dollars, or N seconds, whichever comes first.
  • Done-signal — give the agent a finish()tool it must call to declare success, so "done" is explicit.

Why the loop is powerful

Each turn the model sees the latest result, so it can recover from a failed call, refine a search, or change plan — something a single prompt can never do. That feedback is the whole point.

Like a mechanic: look, try something, check whether it worked, try the next thing — not a single guess with the hood closed.

04 · Multi-step · planning · memory 5 min

Bigger tasks need a plan
and a memory.

A two-step task survives on the raw loop. A ten-step task wanders unless the agent plans — breaks the goal into ordered steps — and unless it has memory to carry useful facts forward instead of re-deriving them every turn.

Planning — having the model lay out the steps (and revise them as it learns), rather than improvising one tool call at a time. Memory — what the agent can recall: short-term is the conversation in the context window; long-term is an external store it can search when the window is too small to hold everything.

Two ways to plan

  • Plan-then-execute: the model writes the whole list of steps up front, then works through it. Predictable, easy to inspect.
  • Interleaved (ReAct): it plans one step, acts, observes, and decides the next step from what it learned. More adaptive, harder to predict.
  • Many real agents mix both: a rough plan, revised as results come in.
goal 1 · search the docs 2 · read top result 3 · write the answer ONE GOAL → ORDERED STEPS

Planning turns a vague goal into steps the loop can execute and you can audit.

short-term memory

The context window

The running conversation — every message, tool call, and result so far. It is the agent's working memory, but it is finite: too many steps and early details fall out or get summarized away.

long-term memory

An external store

Facts, past decisions, or documents kept in a database the agent can search — often a vector database. It pulls back only the relevant pieces when it needs them, instead of holding everything in the window.

Long-term memory is usually retrieval: embed your data, store it, and search it on demand. That is exactly the RAG pattern — see the RAG & Vector Search deck for embeddings, chunking, and vector databases.
05 · When agents help vs hurt 5 min

A loop is power and risk at once.

An agent trades determinism for autonomy. That autonomy is exactly what makes it useful on open-ended work — and exactly what makes it unpredictable, slow, and expensive when the task did not need it. The skill is knowing which jobs deserve a loop.

Rule of thumb: reach for the simplest thing that works. If the task is a fixed sequence of known steps, a plain prompt or a hard-coded workflow is cheaper, faster, and far more reliable. Add an agent only when the steps are unknown ahead of time and depend on what the tools return.
An agent earns its keep
  • The path is open-ended — the next step depends on the last result.
  • The task genuinely needs tools: live data, files, APIs, code.
  • The number of steps varies from run to run.
  • Mistakes are recoverable and a human can review the output.
Skip the agent
  • The steps are fixed and known — just script them.
  • One prompt with the right context already answers it.
  • You need a guaranteed, repeatable result every time.
  • Latency or cost per request must be tight and predictable.
NO CAP ∞ loops cost ↑ time ↑ CAPPED stop ≤ N bounded

Without a cap a stuck agent loops forever, burning tokens. A step limit bounds the damage.

Failure modes to plan for

  • Runaway loops — the agent never decides it is done; cost and latency spiral. Cap steps and set a budget.
  • Compounding errors — one wrong observation poisons every later step. Validate tool inputs and outputs.
  • Prompt injection — a web page or document the agent reads can contain instructions that hijack it. Treat tool results as untrusted data, not commands.
  • Dangerous actions — sending email, spending money, deleting data. Gate these behind human approval.
  • No visibility — you cannot debug what you cannot see. Log every step (tracing) from day one.
06 · The tooling — agent frameworks 6 min

Libraries that run the
loop so you don't.

You can hand-write the loop — and for a simple agent you often should. But frameworks handle the fiddly parts: tool wiring, state, memory, retries, streaming, and tracing. Four lead the field; the right one depends on your stack and how much control you want.

Agent framework — a library that provides the loop, tool plumbing, memory, and state as building blocks, so you describe what the agent should do instead of re-implementing the machinery. Start simple: a hand-rolled loop is often enough before you reach for one of these.

LangChain / LangGraph — the big ecosystem

LangChain is a broad toolkit (Python and JS) of chains, tools, and integrations. LangGraph is its graph-based engine for stateful agents — explicit nodes, branches, loops, and human-in-the-loop checkpoints.

Pro
Huge integration ecosystem; LangGraph gives precise control over agent state and loops.
Con
Large API surface; the abstractions can feel heavy for a small agent.

Reach for it when you need complex, long-running, stateful agents and want explicit control over the graph of steps.

Vercel AI SDK — TypeScript-first for apps

A TypeScript SDK for integrating LLMs into products: generateText / streamText with tool calling and a built-in agent loop, one API across providers, and first-class streaming and MCP support.

Pro
Clean TypeScript DX, great streaming, provider-agnostic — ideal for web and Next.js apps.
Con
Lighter on heavy orchestration than LangGraph; its agent abstractions are newer.

Reach for it when you are shipping agent or chat features inside a TypeScript / Next.js product UI.

CrewAI — role-based multi-agent crews

A Python framework for orchestrating several agents with roles and tasks — a "researcher", a "writer", a "reviewer" — that collaborate on a goal.

Pro
Fast to stand up role-based multi-agent collaboration with little boilerplate.
Con
Opinionated; multi-agent adds cost and coordination complexity, with less low-level control.

Reach for it when a problem maps naturally onto a few specialized roles working together.

Microsoft AutoGen — agents that converse

A Python framework from Microsoft for multi-agent systems built around agents that talk to each other (and to tools) to solve a task, with strong research-grade conversation patterns.

Pro
Powerful patterns for agent-to-agent conversation and experimentation.
Con
Conversational multi-agent setups can be hard to make cheap and reliable; steeper to tame.

Reach for it when you are exploring multi-agent conversation patterns or research-style prototypes.

How to choose

  • TypeScript / web app? Vercel AI SDK.
  • Need explicit stateful control? LangGraph.
  • Several cooperating roles? CrewAI or AutoGen.
  • Simple single agent? A hand-written loop — no framework yet.
However an agent reaches your tools, the emerging standard for the connection is MCP — the Model Context Protocol, an open standard from Anthropic. Instead of hand-wiring every integration, an agent speaks one protocol to many tool servers. See the MCP deck for hosts, servers, and the primitives.
07 · A worked agent & recap 4 min

One agent, end to end.

Put it together: a small support-triage agent. The goal comes in; the model reasons, calls tools, observes results, and loops until it can answer — capped and logged the whole way.

goal: "Where is my order #4471?" → reason need the order status → act lookupOrder({ id: "4471" }) ← observe { status: "shipped", eta: "Fri" } → reason have what I need → answer "Order #4471 shipped, arriving Friday." // 1 tool call, 2 model turns, then stop
goal agent lookupOrder answer

One lap of reason → act → observe, then a final answer — the whole pattern in miniature.

1An agent is a loop, not a model. Same LLM — the power is the tools and the loop you wrap around it.
2The model asks; you act. Tool calling means the model names a function and arguments — your code runs it and returns the result.
3Reason → act → observe → repeat. Always cap the steps and set a budget so a confused agent cannot run away.
4Plan and remember for big tasks. Short-term memory is the context window; long-term memory is retrieval from an external store.
5Use an agent only when you need one. A fixed task wants a prompt or a script; an open-ended, tool-using task wants the loop.
Knowledge check

Did it stick?

Five quick questions on agents, tool calling, the loop, memory, and the tooling — instant feedback, no sign-in.

Rate this deck
be the first

Navigate with ← → or scroll · back to library