A 30-minute working session on writing prompts that behave the same way twice — from how a model actually reads your text, through clear instructions, examples, structured output and tool calling, reasoning techniques, and guarding against prompt injection.
Before you can steer a model, it helps to know what it's actually doing: chopping your text into tokens, fitting everything into a fixed context window, and spreading its attention unevenly across your instructions. Good prompting is mostly working with these three facts instead of against them.
Summarize might split into Sum + mar + ize.Your text is split into tokens; the model reads those, then generates the most likely next token, one at a time.
Everything competes for one fixed budget. Fill it with history and there's no room left for the model to answer.
The model is one piece of a larger system — retries, streaming, and memory live in the app around it. That layer is its own topic: Building LLM Apps.
The single biggest lever in prompting is also the most boring: say exactly what you want. State the role, the audience, the format, the length, and what to do when the model is unsure. A model fills ambiguity with its average guess — your job is to leave less to guess.
Outer layers set the rules; inner layers fill in the request. Trust decreases as you move inward — important for Part 6.
<doc>…</doc>) so the model can tell your instructions from the data.Some patterns are easier to demonstrate than to describe — a tricky output format, a labelling convention, a particular tone. Drop a few worked examples into the prompt and the model copies the pattern. That's few-shot prompting.
input → output examples so the model infers the pattern by analogy. "Shot" just means example: one example is one-shot, a few is few-shot.Examples pin down the exact shape of the answer — far more reliable than describing the format in prose.
The moment a model's output feeds another program, free text is a liability. Two features fix that: structured output constrains the reply to a schema you define, and tool calling lets the model ask your code to do something and use the result.
The model never runs anything itself — it requests a call, your app executes it, and the result goes back into the prompt.
For multi-step problems — math, logic, planning, careful extraction — a model that blurts the first token often stumbles. Techniques like chain-of-thought, decomposition, and self-check trade a few extra tokens for noticeably better answers by letting the model reason out loud.
Reasoning out the steps gives later tokens something correct to build on — instead of one impulsive guess.
"It feels smarter" is not evidence. Whether CoT, more shots, or a bigger model actually helps is an empirical question — run it against a fixed test set and compare. That discipline is its own topic: LLM Evals & LLMOps.
A prompt that works in the playground can quietly regress when you tweak a word, swap a model, or hit a new input. Two practices keep you honest: measure changes with evals, and assume any text from outside is potentially hostile.
Hostile text hidden in data can override your instructions and abuse whatever tools the model holds.
You don't have to build the harness yourself. A one-line read on where each tool fits — and its trade-off:
Declarative test cases that run a prompt across inputs and models, with assertions and side-by-side diffs.
Treats prompts as parameters and tunes them — including example selection — against a metric you define.
Check structured output against a schema or rules and retry or fix when it doesn't conform.
Capture real traffic, build datasets, and score with rules or LLM-as-judge over time (Langfuse, LangSmith, Braintrust).
Picking a provider, SDK, or model — streaming, retries, cost — is a separate decision covered in Building LLM Apps; the measurement discipline goes deeper in LLM Evals & LLMOps.
Almost every reliable prompt comes back to a few habits — and almost every flaky one repeats a few mistakes. Pick the lightest technique that does the job; reach for the heavier ones only when the simpler option provably falls short.
Reach for the simplest row that works; move down only when it doesn't.
"Leave the model less to guess — then prove it worked."
Five quick questions on tokens, instructions, few-shot, tool calling, and prompt injection — instant feedback, no sign-in.
Navigate with ← → or scroll · back to library