Building Durable AI Agents with Restate + Vercel AI SDK

We're excited to announce the integration between Restate and the Vercel AI SDK, bringing enterprise-grade durability to your AI agents with just a few lines of code.

If you're building AI agents, whether it’s conversational agents, task automation agents, or multi-step workflows, you want the agent to remember the work it did. This becomes especially important for agents with human approvals, long-running tasks (e.g. research), or tasks that must roll back cleanly on failure.

The Restate + AI SDK integration lets you build exactly that: stateful, durable agents that recover from crashes, support human-in-the-loop, orchestration, parallelism, resilient rollback and more. Restate comes as a single binary so it’s easy to deploy. Self-host it or connect with Restate Cloud and run your services anywhere including AWS Lambda, Cloudflare, Vercel, and Kubernetes.

Why Restate + AI SDK?

Restate offers durable, observable execution for AI agents. This means:

All steps (LLM calls, tool executions, session state changes) are logged & persisted
If your service crashes, times out, or restarts during a workflow, Restate resumes where you left off
Observability: end-to-end invocation tracing across calls, tools, agents, and other workflows
You can combine that with the AI SDK to build agents running on serverless function environments (for example Vercel’s serverless functions) without manually handling state stores, or retry logic

The combination of the AI SDK and Restate gives you an enterprise-grade foundation for agents with all you need to take your project from idea to POC to production with a single lean technology stack.

Get started in minutes

💡 The integration is available now as an npm package:

📦 npm: @restatedev/restate-sdk + @restatedev/vercel-ai-middleware
🚀 Quickstart
🎓 Tutorial
💻 Next.js examples

Follow the quickstart in the docs to set up your agent project:

You only need to do a few things to turn your agents into durable agents:

Wrap your language model with middleware such as durableCalls(ctx, { maxRetryAttempts: ... }) so that each LLM call is recorded and retried in case of failure.
Wrap tool executions (e.g. fetching data) with context actions (ctx.run(...)) so that they’re retried until success (or fail-permanently).

Here’s a simple example of a WeatherAgent:

export default restate.service({
  name: "WeatherAgent",
  handlers: {
    run: async (restate, prompt: string) => {
      const model = wrapLanguageModel({
        model: openai("gpt-4o"),
        middleware: durableCalls(restate, { maxRetryAttempts: 3 }),
      });

      const { text } = await generateText({
        model,
        system: "You are a helpful agent that provides weather updates.",
        prompt,
        tools: {
          getWeather: tool({
            description: "Get the current weather for a given city.",
            inputSchema: z.object({ city: z.string() }),
            execute: async ({ city }) =>
              restate.run("get weather", () => fetchWeather(city)),
          }),
        },
        stopWhen: [ stepCountIs(5) ],
        providerOptions: { openai: { parallelToolCalls: false } },
      });

      return text;
    },
  },
});

Because Restate logs every agent invocation & step, you can view traces (which include LLM calls, tool executions, state changes, errors) in the built-in Restate UI:

You can also export traces via OpenTelemetry integrations, for example to Langfuse or Jaeger.

Learn how to turn complex patterns into simple code

Restate makes code innately resilient, letting you implement complex patterns with simple code.

Follow the tutorial on implementing agents with Restate and Vercel AI SDK to learn about the most common patterns:

Human approval steps

Unlike typical approval flows that lose state on crashes, Restate's promises persist through any failure. Create an approval request, hand the ID to a human, and the agent suspends durably. If your server crashes while waiting, Restate recovers the waiting context and resumes exactly where it left off. No lost approvals, no duplicate requests, with built-in timeouts for escalation.

Example

Chat sessions

Implement stateful conversations using Virtual Objects, not just a database wrapper, but persistent entities with automatic concurrency control. Your chat state survives crashes, handles concurrent messages correctly, and requires zero database management.

Example

Orchestration / Multi-Agent Patterns

Learn how to scale with resilience built-in:

Resilient workflows as tools: Tools can implement workflows that get retried and resume where they failed.
Multi-Agent Systems: An agent may call another agent. That call is durable, retryable, observable.
Parallel execution: RestatePromise.all/allSettled/race maintains deterministic replay after crashes

Example

Own your control flow

For advanced use cases with guaranteed resilience:

Manual agent loops that remember their iteration state through restarts and let you customize stopping conditions and custom logic between steps.
Resilient rollback that always completes (e.g. undo completed tasks if no approval within 3 hours)
Long-running research agents that survive infrastructure restarts without losing progress

Enterprise-grade durability without enterprise-grade complexity.

Example

Start building

The Restate + Vercel AI SDK integration is available now. With this integration, you get:

✅ Recovery from failures - Never lose agent progress again
✅ Built-in state management - No databases to configure
✅ Complete observability - Trace every decision and action
✅ Composable patterns - From simple agents to complex multi-agent systems
✅ Production safety - Approvals, timeouts, and rollback built-in

If this resonates with you, here is how to get started:

🚀 Start with the quickstart
🎓 Follow the hands-on tour

✨ Star us on GitHub and join the conversation on Discord or Slack - we’d love to hear what you’re building.