Evaluation record · openai-agents-sdk

OpenAI Agents SDK

v0.x (Python 0.18.0)

OpenAI

Agentmulti-agentopen-sourceorchestrationopenai

Strong

About This Agent

Production-ready multi-agent orchestration framework built around agents, handoffs, guardrails, and tracing. Open-source (MIT) successor to Swarm, released March 2025 with a major overhaul in April 2026 that added a model-native harness (filesystem tools, shell execution, apply-patch edits) and native sandboxed execution for long-horizon tasks.

Last Evaluated: July 9, 2026

Official Website

Trust Vector Analysis

Dimension Breakdown

🚀Performance & Reliability

task completion accuracy

Evaluation of agent task success across single- and multi-agent configurations

Evidence

OpenAI: New tools for building agents — Production-grade upgrade of Swarm with structured outputs, validated handoffs, and Responses API integration

highVerified: 2026-07-09

tool use reliability

Testing of function tools, hosted tools, and schema-validated argument handling

Evidence

OpenAI Agents SDK documentation — Function tools with automatic Pydantic schema validation plus hosted tools (web search, file search, code interpreter, computer use)

highVerified: 2026-07-09

multi step planning

Multi-step task evaluation across the agent loop and handoff chains

Evidence

OpenAI Agents SDK documentation — Agent loop with configurable max turns handles multi-step tasks; planning quality depends on the configured model

mediumVerified: 2026-07-09

memory persistence

Review of session backends and cross-run conversation persistence

Evidence

OpenAI Agents SDK sessions documentation — Built-in session memory (SQLite, Redis, SQLAlchemy backends) automatically maintains conversation history across runs

mediumVerified: 2026-07-09

error recovery

Testing of exception handling, guardrail tripwires, and tool failure paths

Evidence

OpenAI Agents SDK documentation — Typed exceptions, guardrail tripwires, and tool error handlers enable structured failure handling

mediumVerified: 2026-07-09

agent collaboration

Multi-agent coordination testing using handoffs and agents-as-tools patterns

Evidence

OpenAI Agents SDK handoffs documentation — Handoffs are a core primitive for delegating between specialized agents; agents-as-tools pattern supports orchestrator designs

highVerified: 2026-07-09

🛡️Security

tool sandboxing

Security architecture review of custom tool execution versus hosted tools

Evidence

OpenAI Agents SDK documentation — No built-in sandbox for arbitrary custom function tools; hosted tools (code interpreter) run in OpenAI's sandboxed environments

OpenAI: The next evolution of the Agents SDK — April 2026 overhaul added native sandbox execution for harness work (files, shell, code edits), launching first in Python with TypeScript to follow

mediumVerified: 2026-07-09

access control

Assessment of tool scoping, approval flows, and developer-implemented controls

Evidence

OpenAI Agents SDK documentation — Per-agent tool restrictions and human-in-the-loop approval support; broader access control left to the developer

mediumVerified: 2026-07-09

prompt injection defense

Guardrail configuration testing against adversarial and off-policy inputs

Evidence

OpenAI Agents SDK guardrails documentation — First-class input/output guardrails run in parallel with the agent and can trip to halt unsafe or off-policy runs

highVerified: 2026-07-09

data isolation

Review of run context isolation and self-hosted deployment boundaries

Evidence

OpenAI Agents SDK documentation — Typed local context objects are isolated per run and never sent to the LLM; deployment isolation is integrator-managed

mediumVerified: 2026-07-09

open source transparency

Source code and license review

Evidence

openai-agents-python GitHub repository — MIT licensed, fully open source with active public development in Python and TypeScript

highVerified: 2026-07-09

🔒Privacy & Compliance

data retention

Review of OpenAI API retention terms and self-managed session storage

Evidence

OpenAI API data usage policies — API data not used for training by default; session state stored on the integrator's own backend

mediumVerified: 2026-07-09

gdpr compliance

Compliance capabilities assessment for framework plus default model provider

Evidence

OpenAI Trust Portal — OpenAI offers DPA and SOC 2 for API usage; framework itself is self-hosted so compliance is achievable with configuration

mediumVerified: 2026-07-09

third party data sharing

Data flow analysis of default tracing export and model provider routing

Evidence

OpenAI Agents SDK tracing documentation — Tracing uploads run data to OpenAI by default (disableable); prompts go to whichever LLM provider is configured

highVerified: 2026-07-09

local deployment option

Deployment options assessment including non-OpenAI and local model routing

Evidence

OpenAI Agents SDK models documentation — LiteLLM integration supports 100+ LLMs including local models via Ollama-compatible endpoints

highVerified: 2026-07-09

👁️Trust & Transparency

documentation quality

Documentation completeness review across Python and TypeScript SDKs

Evidence

OpenAI Agents SDK documentation — Comprehensive docs with quickstarts, API reference, and examples for agents, handoffs, guardrails, sessions, and tracing

highVerified: 2026-07-09

execution traceability

Review of built-in trace spans, dashboard visualization, and OpenTelemetry export

Evidence

OpenAI Agents SDK tracing documentation — Built-in tracing of every LLM call, tool call, handoff, and guardrail with OpenAI dashboard and OTel/third-party exporters

highVerified: 2026-07-09

decision explainability

Assessment of trace-based explanation of agent routing and tool decisions

Evidence

OpenAI Agents SDK tracing documentation — Trace spans expose handoff reasons, tool arguments, and guardrail outcomes for post-hoc decision analysis

mediumVerified: 2026-07-09

open source code

Open source assessment of license, source availability, and public development

Evidence

openai-agents-python GitHub repository — MIT license with full source available; major 2026-04-15 overhaul developed in the open

highVerified: 2026-07-09

community activity

Community engagement analysis via GitHub stars, contributor activity, and release cadence

Evidence

openai-agents-python GitHub repository — Tens of thousands of stars, frequent releases, and a large contributor and integration ecosystem

openai-agents on PyPI — Rapid release cadence continues: Python 0.18.0 released 2026-07-07 (0.17.x through June); repo at ~27,800 stars

highVerified: 2026-07-09

⚙️Operational Excellence

ease of integration

Integration complexity assessment from install to working multi-agent app

Evidence

OpenAI Agents SDK quickstart — Minimal-primitives design (agents, handoffs, guardrails, sessions); a working agent in a few lines of code

highVerified: 2026-07-09

scalability

Assessment of stateless runner scaling and provider rate limit constraints

Evidence

OpenAI Agents SDK documentation — Lightweight stateless runner scales horizontally; throughput bounded by model provider rate limits

mediumVerified: 2026-07-09

cost predictability

Pricing model analysis of free framework plus pay-per-token model usage

Evidence

openai-agents-python GitHub repository — Free MIT-licensed SDK; only costs are model API rates from the chosen provider

highVerified: 2026-07-09

monitoring capabilities

Monitoring features assessment including built-in and third-party observability

Evidence

OpenAI Agents SDK tracing documentation — Built-in tracing dashboard plus OTel and third-party exporters (Logfire, Langfuse, W&B, and more)

highVerified: 2026-07-09

production readiness

Maturity assessment from release history, API stability, and enterprise adoption

Evidence

OpenAI: New tools for building agents — Explicitly positioned as production-ready successor to experimental Swarm; matured further with the 2026-04-15 overhaul

highVerified: 2026-07-09

Strengths

+Minimal, well-designed primitives: agents, handoffs, guardrails, sessions
+Best-in-class built-in tracing with dashboard and OTel/third-party exporters
+First-class input/output guardrails with tripwire enforcement
+MIT-licensed and fully open source in Python and TypeScript
+Provider-agnostic via LiteLLM (100+ models) despite Responses-API-native design
+Free framework; costs limited to model usage

Limitations

!Custom function tools still run unsandboxed (native sandbox added April 2026 covers harness/shell/file work, Python first)
!Tracing exports run data to OpenAI by default unless explicitly disabled
!Some hosted tools and tracing features work best only with OpenAI models
!April 2026 overhaul introduced breaking changes requiring migration (dropped Python 3.9, requires openai v2.x, refusals now raise ModelRefusalError)
!Python package still versioned 0.x with frequent releases despite production positioning
!Less opinionated about deployment, requiring infrastructure decisions from the team

Metadata

license: MIT

supported models

0: OpenAI GPT series (Responses API native)

1: 100+ LLMs via LiteLLM

2: Local models via OpenAI-compatible endpoints

programming languages

0: Python

1: TypeScript

deployment type: Self-hosted

tool support

0: Function tools with schema validation

1: Hosted tools (web search, file search, code interpreter, computer use)

2: MCP servers

3: Agents as tools

first release: 2025-03-11; major overhaul 2026-04-15

current version: Python 0.18.0 (2026-07-07)

github stars: 27700+

pricing: Free (MIT); pay only model API rates

predecessor: OpenAI Swarm (experimental)

Use Case Ratings

customer support

Handoffs between triage and specialist agents plus guardrails make this a flagship use case

research assistant

Hosted web/file search tools and multi-agent orchestration suit research pipelines

code generation

Capable with code interpreter and function tools, though not specialized for coding workflows

data analysis

Code interpreter, file search, and structured outputs work well for analysis agents

content creation

Multi-agent writer/editor pipelines with output guardrails are straightforward to build