Blog

Durable Orchestration for AI Agents with Restate and OpenAI SDK

January 15, 2026

Restate Team

Restate now offers native integration with the OpenAI Agents SDK, bringing enterprise-grade durability to your Python agents with just a few lines of code.

Agents interact with many systems: LLMs, MCP servers, other APIs and databases, humans, session stores, and more. Each of these interactions can fail, experience network outages, hit timeouts and rate limits, or take longer than expected to respond. And also the agent itself can be part of an infrastructure failure. The same issues as with any other backend.

When you deploy your agents to production, they need to be able to handle all these scenarios: not just for simple agents with a single tool but also for long-running agents, human approval steps, and multi-agent systems. The Restate + OpenAI Agents SDK integration makes this possible.

What Restate Adds to Your OpenAI Agents

While the OpenAI Agents SDK offers you a concise, Pythonic way for defining your agents, Restate gives them all they need to work well in production:

  • Never lose progress or duplicate work, via retries and recovery
  • Durable sessions, via embedded K/V store and concurrency management
  • Resilient human approvals that might take minutes or months
  • Resilient orchestration across agents and tools, and even deployment targets and time
  • Great support for serverless: the agent execution suspends when the agent is waiting
  • Support for long-running agents (from millis to months)
  • Detailed observability across agents and tools

Getting started

πŸ’‘ Useful links:

Add the Restate SDK and OpenAI Agents SDK to your project:

uv init .
uv add restate_sdk[serde] openai-agents

Create your first durable agent:

import restate
from restate.ext.openai import DurableRunner, durable_function_tool, restate_context
from agents import Agent

@durable_function_tool
async def get_weather(city: str) -> dict:
  """Get the current weather for a city."""
  return await restate_context().run_typed("fetch weather", fetch_weather_api, city=city)

agent = Agent(
  name="WeatherAgent",
  instructions="You are a helpful assistant that provides weather updates.",
  tools=[get_weather],
)

weather_service = restate.Service("WeatherAgent")

@weather_service.handler()
async def run(_ctx: restate.Context, message: str) -> str:
  result = await DurableRunner.run(agent, message)
  return result.final_output

app = restate.app([weather_service])

That's it. Your agent is now resilient to failures, with every step logged and recoverable. For example:

  • Failed LLM calls will be retried
  • Failed calls to the weather API will be retried
  • If the service crashes, Restate remembers all the steps it did in a journal, and will resume the execution on another process by replaying the journal

But it’s much more than only this!

Building Blocks for Reliable Agents

Durable sessions with concurrency control

Build agents that remember. Use Restate's Virtual Objects to create stateful agents keyed by user or session ID. Enable use_restate_session=True in the DurableRunner and Restate automatically persists conversation history, agent events, and context:

chat = VirtualObject("Chat")

@chat.handler()
async def message(_ctx: ObjectContext, req: ChatMessage) -> dict:
  result = await DurableRunner.run(
      Agent(name="Assistant", instructions="You are a helpful assistant."),
      req.message,
      use_restate_session=True,
  )
  return result.final_output

The conversation state lives in Restate, queryable through the UI, with automatic concurrency control to prevent race conditions when users send multiple messages.

See the chat example

Human Approvals as durable promises

Real-world agents need human oversight. Restate's durable promises make this easy: your agent pauses execution, waits for human input, and resumes automatically, even if the service crashes while waiting:

@durable_function_tool
async def require_approval(claim: dict) -> str:
  """Request human approval for high-value claims."""
  # Create a durable promise
  review_id, review_promise = restate_context().awakeable()

  # Notify moderator (persisted)
  await restate_context().run_typed("notify", request_review, claim=claim, id=review_id)

  # Wait for review - survives crashes
  return await review_promise

Add timeouts to prevent workflows from waiting indefinitely. Restate persists both the timeout and the approval promise, maintaining the correct remaining time even through restarts.

Explore human-in-the-loop patterns

See Everything Your Agent Does

Restate traces every action automatically. Open the Restate UI to inspect running agents, see LLM calls and tool executions in real-time, and debug issues without adding instrumentation:

The trace shows you exactly what your agent did, when it did it, and what state it held at each step. When something goes wrong, you have all the information you need to understand why.

Beyond the Basics

Restate lets you take your agents to the next step:

  • Reliable communication for multi-agent systems
  • Parallelization, racing, and other concurrency patterns, deterministic across retries
  • Resilient rollback for actions that should revert on failure
  • Idempotency for crucial actions
  • Each request is an ID-addressable task that you can cancel, kill, roll back, restart, attach,…
  • And much more…

Start Building

AI agents are moving from demos to production, and production means dealing with all sorts of adversities. The OpenAI Agents SDK gives you a clean, Pythonic way to build agents. Restate gives you the reliability primitives to run them in production without rebuilding your entire stack.

The Restate + OpenAI Agents SDK integration is available now. You get:

  • βœ… Automatic failure recovery for every step your agent takes
  • βœ… Persistent conversation memory without external databases
  • βœ… Complete execution visibility for debugging and monitoring
  • βœ… Resilient primitives to model approval workflows with timeouts
  • βœ… Multi-agent orchestration that is consistent across failures
  • βœ… Deploy on any platform: Modal, AWS Lambda, or Kubernetes

You write normal Python code using the OpenAI Agents SDK patterns you already know. Restate handles the hard parts of making it production-ready.

Get started today:

✨ Star us on GitHub and join our community on Discord or Slack. We'd love to see what you build.