AI Agents should be serverless and durable

Agentic application workloads are naturally bursty with variable task complexity. Serverless platforms seem perfect for this with autoscaling and pay-for-use, while having zero infra to manage. Until you hit their limitations.

Your agent needs human approval before sending an email. What do you do?

Option 1: Keep your Lambda function running while waiting.
Problem: Long-running functions rack up costs. Lambda times out after 15 minutes. Functions are stateless, so you lose context between invocations. Your approval might take 3 hours. This literally doesn't work.
Option 2: Save state to a database, shut down, and resume when approval comes.
Complexity: Now you need a queue, a state store, coordination logic, retry handling, and version management. Your "simple agent" just became a distributed system.
Option 3: Don't use serverless. Use a workflow orchestrator.
Result: Your agent now needs to run on workflow workers, not serverless functions. You're managing infra and worker pools and losing the auto-scaling benefits of serverless.

None of these are good answers. And this isn't just about human approvals, it's what happens when long-running, stateful workloads meet serverless.

APIs break, timeouts happen, networks fail. If your agent can't survive interruptions, you either lose work or build complex infrastructure to prevent it.

The challenge isn't building agents anymore. AI SDKs made that easier than ever. The challenge is making them production-grade: durable across failures, efficient on serverless, observable in production, and safe to evolve.

Never lose progress with Durable Execution

The solution isn't about choosing a different serverless platform or adding more queues. It's about changing how your code executes.

What if your agent could remember every step it takes?

With Durable Execution, when your agent calls an LLM, queries a database, or invokes a tool, that step gets recorded in a durable journal. If your process crashes, it doesn't re-execute all steps, but instead, recovers the recorded results from the journal and resumes exactly where it left off.

Restate is an open-source system that provides Durable Execution for your applications, workflows, and agents. The Restate Server sits in front of your serverless function like a proxy or stateful orchestrator and manages durability, retries, and recovery. Your functions use the Restate SDK to mark which steps should be recorded, transforming your stateless serverless functions into stateful, durable functions that survive failures and restarts.

What sets Restate apart from other workflow orchestrators for serverless deployments, is that Restate pushes requests to your functions rather than requiring workflow workers to pull tasks. This means your agents can run as normal serverless functions, on Vercel, Cloudflare Workers, AWS Lambda, Modal, or anywhere else, without the need for special worker infrastructure.

Deploy your agents on your preferred platform.

You can either host the Restate Server yourself on VMs or Kubernetes or use Restate Cloud, a managed Restate deployment with production-grade availability, observability, and durability (announced last week.).

Restate Cloud lets you run serverless, durable agents in production without managing any infrastructure.

Enable Durable Execution for your favorite AI SDK

Restate can be used for any application, not just agents. So it also works independently of any Agent SDK and specific AI stack, but integrates easily with them. A few lines turn your agent into a durable agent.

Let’s have a look at a few integration examples.

export const claimApprovalAgentWithHumanApproval = restate.workflow({
name: "ClaimApprovalAgent",
handlers: {
  run: async (restate: restate.WorkflowContext, { amount }: { amount: number }) => {
    const model = wrapLanguageModel({
      model: openai("gpt-4o"),
      middleware: durableCalls(restate), // Makes LLM calls durable
    });

    const { text } = await generateText({
      model,
      system:
        "You are an insurance claim evaluation agent. Use these rules: " +
        "* if the amount is more than 1000, ask for human approval, " +
        "* if the amount is less than 1000, decide by yourself",
      prompt: `Please evaluate the following insurance claim: ${amount}.`,
      tools: {
        humanApproval: tool({
          description: "Ask for human approval for high-value claims.",
          inputSchema: InsuranceClaimSchema,
          execute: async (claim: InsuranceClaim) => {
            await restate.run(() => notifyHumanReviewer(claim, restate.key));
            return await restate.promise<boolean>("approval"); // Durable promise
          },
        }),
      },
      stopWhen: [stepCountIs(5)],
    });

    return { response: text };
  },

  onHumanApprovalReady: async (restate: restate.WorkflowSharedContext, approval: boolean) => {
    restate.promise("approval").resolve(approval);
  },
},
});

For each of the integrations, you implement a Restate service with a handler that gets invoked for each agent request. Your handler executes the agent logic using a Restate Context object to mark durable steps.

Two things need to be durable:

LLM responses - Since LLMs are non-deterministic and decide which actions to take, their responses must be recorded for replay on retries.
Tool executions - Tools should use Restate Context actions to record steps, for example, a run-block executes a function and persists its results (useful for database queries, API calls, etc.)

How this looks in each framework:

Vercel AI SDK: Use Restate's durableCalls(restate) middleware for LLM calls; use Restate Context actions for tool side effects
OpenAI Agents: Use Restate's DurableModelCalls(restate_ctx) as your model provider; use Restate Context actions for tool side effects; propagate context through your agent with OpenAI's context management
Custom workflows (LiteLLM, etc.): If you want to own your agent control flow and implement the agent loop from scratch, then you can wrap LLM calls in a run-block and use Restate Context actions throughout your workflow to record steps.

With just a few lines of integration code, your agents become durable.

Reliability at scale: Handle ten-thousands of concurrent agents

The combination of an Agent SDK, serverless, and Durable Execution makes it possible to build highly scalable, resilient agents.

In order to support this use case, Restate is implemented as a distributed, event-driven system that can handle ten-thousands of concurrent workflows.

Restate UI (top) and Modal UI (bottom) showing hundreds of ongoing executions.

If you are interested in what a Durable Execution engine looks like under the hood, then we recommend reading our previous blog post, to learn more.

Restate's durability doesn't only help with surviving crashes. It also enables other features to run efficiently on serverless.

Cost efficiency: Scaling to zero while waiting

Remember the human approval problem from the intro? This is a common pattern where you want to have a human in the loop for high-risk actions or feedback.

But waiting is expensive on serverless platforms that charge for execution time. And some platforms, like AWS Lambda, even limit function execution time to 15 minutes.

Durable Execution comes in handy here: it lets us recover any function at any point in time, so the function can terminate while waiting and be revived later.

When your agent needs to wait for human approval, it creates a durable promise and automatically suspends (via an artificial error thrown by the SDK). The Restate Server persists the state of your agent, including the promise. When the approval comes in (minutes or days later), Restate re-invokes your function, which replays the journal and restores the now-resolved promise, and continues execution.

The result is a fully resilient human-in-the-loop flow that does not make you pay for waiting. No queues to persist approvals and join them with agent state. No complex coordination logic. The durable promise can be resolved via HTTP from any process, and Restate handles all the persistence and coordination.

Debugging: Live execution timeline

Agents do not follow pre-defined code paths but make decisions at runtime based on LLM responses. This makes debugging hard and good observability essential.

Some agent SDKs include basic metrics and traces of LLM calls, but this is not enough. When your agent fails after 20 minutes, you need to know: which step failed, what the error was, what tools executed, which agents were involved, which deployment version was used, and whether other requests are hitting the same issue.

Restate's UI shows a live execution timeline for each request. You see every step in real-time: LLM calls, tool executions, retries, errors, and agent-to-agent communication. This works across all AI SDKs, models, and languages, letting you debug exactly where and why agents failed.

Safe evolution: Version without breaking in-flight work

Over time, you'll constantly improve your agents by fixing bugs, adding new tools, and changing prompts. But when agents suspend for hours or days waiting for approvals, they must resume on the same code version they started with. If a request resumes on a different version, the journal won't match the code and replay breaks.

Having a good story here is key to making your agents evolvable without breaking ongoing work.

Restate solves this with immutable deployment URLs. Each service revision needs to have a unique URL that gets registered with Restate. Restate then routes all new requests to the new deployment, while in-flight requests will continue on their original deployment.

This works very well with serverless platforms like Vercel, Cloudflare, and AWS Lambda, which provide versioned URLs out of the box. Old versions scale to zero when idle and only spin up when paused requests resume, meaning they are idle (no costs) between wake-ups. As opposed to Kubernetes, where you'd have to keep old versions running continuously until all requests complete.

Suppose you add a database lookup to the claim approval agent running on Vercel:

// enrich with database data
const customerPolicy = restate.run("fetch customer policy from DB", () =>
  retrieveCustomerPolicy(customerId),
);

// the rest of our claim approval agent

Once we have deployed and registered this new version, all new requests will use it. But any requests that started before the deployment will continue using the old version without breaking:

The old invocation (top) is pinned to the previous revision, while the new invocation (bottom) is pinned to the new revision and executes an extra step.

Start building in minutes

Using Restate to build serverless agents and AI workflows gives you a set of amazing properties:

Resilience to failures without losing progress.
Zero infrastructure management with Restate Cloud and serverless agents.
Massive scalability to ten-thousands of concurrent agents.
Scale to zero while waiting for human approvals or responses.
Real-time visibility into LLM calls, tool executions, retries, and errors.
Safe versioning so executions never break on upgrades.

If this resonates with you, here are some ways to get started:

☁️ Sign up for the Restate Cloud free tier and run your first durable agent in minutes
🚀 Start with our Vercel AI or OpenAI templates
🎓 Follow the AI Agent tutorial to learn all the feautures Restate includes for building production-grade agents

✨ Star us on GitHub and join the conversation on Discord or Slack — we’d love to hear what you’re building.