A 34-minute working session on the services that host models and the plumbing around them — AWS Bedrock, Google Vertex AI, Azure AI Foundry, model gateways, agent runtimes — and the honest call between a hosted API and a model you run yourself.
Five years ago, "use AI" meant buying GPUs, wrangling CUDA, and hiring people to keep a model server alive at 3 a.m. Today most teams call an API and ship. A managed AI platform is the service that makes that possible — and the first real decision is how much of the stack you actually want to own.
Managed platforms collapse five layers of ops into one bill — you keep only the app and the prompts.
The headline product from each hyperscaler is the same shape: one API in front of a catalog of foundation models, with governance, logging, and billing wired into the cloud you already use. They differ most in which models they front and which cloud they marry you to.
modelId and send text. The platform routes, scales, and bills.The platform is a switchboard: your app speaks one API; the model behind modelId can change without a rewrite.
modelId is easy; un-wiring a knowledge base or guardrail is the cost that keeps you.Bedrock's Converse API normalizes the request — change MODEL and the rest of your code is untouched.
Serverless access to models from Anthropic (Claude), Meta, Mistral, Cohere, AI21, Stability, and Amazon's own Nova and Titan — plus Knowledge Bases (RAG), Guardrails, model evaluation, and Bedrock Agents, all behind IAM.
Google Cloud's unified AI surface. Model Garden fronts Gemini alongside third-party models (Claude, Llama, Mistral), and the same platform also does custom training, pipelines, and the Agent Builder stack — more breadth than a pure model gateway.
Microsoft's platform (formerly Azure AI Studio) gives enterprise access to OpenAI's GPT models via Azure OpenAI, plus a model catalog (Llama, Mistral, and more), an agent service, prompt flow, evaluations, and Content Safety — all under Azure identity and compliance.
Sometimes the model you need does not exist yet — a fraud detector on your transactions, a demand forecaster, a classifier on your domain's jargon. For that you want a full ML platform: SageMaker on AWS, Vertex AI on Google. These manage the whole lifecycle, not just inference.
The platform manages every stage and the loop back — when live accuracy drifts, you retrain and redeploy.
Calling a model is "simple", not "lesser." Reach for the full platform only when no off-the-shelf model fits your data — for classic tabular problems, a small trained model often beats a giant LLM on cost and latency. To adapt an existing model instead of training from scratch, see Fine-tuning.
Cloud platforms tie you to one cloud. A model gateway is a thinner, provider-neutral layer: a single endpoint that routes to OpenAI, Anthropic, Google, and open models alike — with failover, spend tracking, and one API key across all of them.
One key, one endpoint. The gateway sends to your primary model and fails over to a backup if it errors or rate-limits.
A creator/model slug is all that changes between providers — the gateway handles the rest.
Prefer to self-host the gateway? LiteLLM is a popular open-source proxy that speaks the same OpenAI-compatible API across providers — same idea, you run it.
The sharpest trade-off in this whole space: call a proprietary model (Claude, GPT, Gemini) over the network, or run an open-weight model (Llama, Mistral) on hardware you control. Both are legitimate. The honest answer depends on cost shape, data rules, and how much ops you can stomach.
Hosted: data leaves, you pay per token, you run nothing. Local: data stays, you pay for the box whether it is busy or idle.
Running open models in production is its own discipline — serving, autoscaling, and monitoring are covered in MLOps. For most teams, start hosted and only self-host once cost or compliance forces the move.
An agent wraps a model in a loop: it reasons, calls tools, reads results, and repeats until the task is done. The clouds now offer managed runtimes so you do not hand-roll the loop, memory, and tool plumbing. The deep dive lives in AI Agents & Tool Use — here we map the platforms.
The runtime owns the loop — calling tools, threading memory, and re-prompting the model until the goal is met.
Vertex AI Agent Builder is Google's managed agent stack. You build with the open-source Agent Development Kit (ADK), then deploy onto a managed runtime (Agent Engine) that handles sessions, scaling, and tracing — agents that lean on Gemini and your tools.
A2A (Agent2Agent) is an open protocol — now under the Linux Foundation — for agents built by different teams or vendors to discover each other and collaborate. It is the interop layer: where MCP standardizes how an agent reaches tools, A2A standardizes how agents reach other agents.
Bedrock Agents connect a foundation model to action groups (your functions, typically AWS Lambda) and Knowledge Bases(managed RAG), so the model can take real steps and ground answers in your data — all inside AWS's IAM and logging. AWS also offers AgentCore, a newer managed runtime for deploying agents (including ones built with other frameworks) with memory, identity, and observability.
For a single agent with a few tools, a plain loop in your own code (or a light framework) is often enough — see AI Agents & Tool Use. Reach for a managed platform when you need durable sessions, multi-agent coordination, sandboxed tool execution, or enterprise-grade tracing and identity. Do not adopt a heavy runtime for a one-shot prompt.
Strip away the brand names and almost every choice comes down to four questions: what it costs, where the data may live, how fast it must answer, and how hard it is to leave.
Spiky or low volume → pay-per-token hosted. High, steady volume → self-host can win on unit cost. Model the real traffic, not the demo.
Regulated or air-gapped data → keep inference in your boundary (cloud platform in-region, or local). Check retention and region terms first.
Add up the hops: gateways and agent loops cost milliseconds. Co-locate the model with the app for tight, interactive paths.
Swapping a model is cheap; un-wiring knowledge bases, guardrails, and agent runtimes is not. A gateway hedges; deep platform features bind.
A rough path, not a law: data rules and volume push you off the hosted default; multi-provider needs add a gateway.
modelId or slug should be all that changes between models.Five quick questions on managed platforms, gateways, hosted vs local, and agents — instant feedback, no sign-in.
Navigate with ← → or scroll · back to library