Evaluation record · openai-codex

OpenAI Codex

vGPT-5.5 era

OpenAI

Agentcoding-agentcloud-sandboxopenaiparallel-tasks

Strong

About This Agent

OpenAI's coding agent spanning a cloud agent that runs tasks in isolated containers and an open-source CLI. Delegates parallel software tasks (features, fixes, PRs) powered by GPT-5.5 (recommended model as of mid-2026; GPT-5.3-Codex deprecated), with network access disabled by default in the cloud.

Last Evaluated: July 9, 2026

Official Website

Trust Vector Analysis

Dimension Breakdown

🚀Performance & Reliability

task completion accuracy

Benchmark review and hands-on evaluation of PR-producing cloud tasks

Evidence

OpenAI: Introducing Codex — Cloud agent completes real-world engineering tasks ending in tested, citable PRs

GPT-5-Codex upgrade — GPT-5-Codex and successors substantially improved long-task completion and code quality; GPT-5.5 is the recommended Codex model as of mid-2026

highVerified: 2026-07-09

tool use reliability

Testing of in-container command execution, editing, and test running across repositories

Evidence

Codex cloud documentation — Reliable shell, editor, and test-runner usage inside containers; setup scripts configure project dependencies

highVerified: 2026-07-09

multi step planning

Long-horizon task evaluation from issue description to passing tests and PR

Evidence

OpenAI: Introducing Codex — Handles long-horizon tasks autonomously: reading codebases, implementing changes, running tests, and iterating until passing

highVerified: 2026-07-09

memory persistence

Review of AGENTS.md guidance persistence and per-task container statelessness

Evidence

Codex documentation (AGENTS.md) — AGENTS.md files persist project conventions and instructions across tasks; cloud tasks are otherwise stateless per container

mediumVerified: 2026-07-09

error recovery

Observed recovery behavior from failing tests, build errors, and missing dependencies

Evidence

Codex cloud documentation — Iterates on failing tests and lint errors until passing; surfaces logs and citations when blocked

mediumVerified: 2026-07-09

agent collaboration

Assessment of parallel task fan-out and coordination model

Evidence

OpenAI: Introducing Codex — Many tasks run in parallel containers simultaneously, though without inter-agent handoff primitives

mediumVerified: 2026-07-09

🛡️Security

tool sandboxing

Review of container isolation, default-deny network policy, and CLI sandbox mechanisms

Evidence

Codex cloud documentation — Cloud tasks run in isolated containers with internet access disabled by default during execution; configurable domain allowlists

Codex CLI repository — CLI offers OS-level sandboxing (Seatbelt on macOS, Landlock/seccomp on Linux) with approval modes

highVerified: 2026-07-09

access control

Assessment of repository scoping, environment controls, and approval mode granularity

Evidence

Codex cloud documentation — Scoped GitHub repo access, per-environment configuration, and CLI approval modes (suggest/auto-edit/full-auto)

highVerified: 2026-07-09

prompt injection defense

Review of network-isolation mitigations and model-level injection defenses

Evidence

Codex cloud documentation — Default-disabled internet during execution sharply limits exfiltration from injected instructions; agent-specific safety training applied

mediumVerified: 2026-07-09

data isolation

Architecture review of per-task container isolation and environment scoping

Evidence

Codex cloud documentation — Each task gets its own ephemeral container preloaded only with the target repository and configured environment

highVerified: 2026-07-09

open source transparency

License and source availability review of CLI versus cloud service

Evidence

Codex CLI repository — Codex CLI is open source under Apache-2.0; the cloud agent service and models remain proprietary

highVerified: 2026-07-09

🔒Privacy & Compliance

data retention

Review of OpenAI retention and training policies across ChatGPT plan tiers

Evidence

OpenAI enterprise privacy — Business/Enterprise data excluded from training by default; consumer ChatGPT plan settings govern Codex task data

mediumVerified: 2026-07-09

gdpr compliance

Compliance certification and DPA availability review

Evidence

OpenAI Trust Portal — SOC 2 Type II and DPA available for business tiers covering Codex usage

mediumVerified: 2026-07-09

third party data sharing

Data flow analysis of repository access, GitHub integration, and network policy

Evidence

Codex cloud documentation — Repository code is processed by OpenAI; default-off internet prevents task-time data flows to other third parties

mediumVerified: 2026-07-09

local deployment option

Deployment options assessment of local CLI versus cloud-only agent and models

Evidence

Codex CLI repository — Open-source CLI runs locally and supports OpenAI-compatible endpoints, but flagship Codex models require OpenAI's cloud

highVerified: 2026-07-09

👁️Trust & Transparency

documentation quality

Documentation completeness review across cloud, CLI, and IDE surfaces

Evidence

Codex developer documentation — Dedicated developer docs covering cloud environments, CLI, IDE integration, AGENTS.md, and pricing

highVerified: 2026-07-09

execution traceability

Review of task logs, test output citations, and diff provenance

Evidence

OpenAI: Introducing Codex — Tasks produce verifiable evidence: terminal logs, test outputs, and citations for every action taken

highVerified: 2026-07-09

decision explainability

Assessment of task summaries, cited reasoning, and pre-merge review surfaces

Evidence

Codex cloud documentation — Agent explains its approach in task summaries with linked evidence before users accept changes

mediumVerified: 2026-07-09

open source code

Open source assessment weighting open CLI against proprietary cloud service

Evidence

Codex CLI repository — Apache-2.0 CLI with active public development; cloud agent and models closed

highVerified: 2026-07-09

community activity

Community engagement analysis via GitHub activity and release cadence

Evidence

Codex CLI repository — Tens of thousands of stars, rapid release cadence, and a large contributor community since April 2025

highVerified: 2026-07-09

⚙️Operational Excellence

ease of integration

Setup and integration surface assessment across ChatGPT, CLI, IDE, and GitHub

Evidence

Codex developer documentation — Available in ChatGPT (web/mobile), as a CLI, IDE extensions, and GitHub integration with @codex mentions

highVerified: 2026-07-09

scalability

Assessment of parallel container execution and plan-tier task throughput

Evidence

OpenAI: Introducing Codex — Cloud architecture runs many tasks in parallel isolated containers, enabling fleet-style delegation

highVerified: 2026-07-09

cost predictability

Pricing model analysis of plan-based limits, Pro 5x tier, and typical-usage estimates

Evidence

Codex pricing documentation — Included in ChatGPT plans with a $100/mo Pro 5x tier added 2026-04-09; typical usage estimated ~$100-200/dev/month

highVerified: 2026-07-09

monitoring capabilities

Review of task logs, usage visibility, and admin monitoring features

Evidence

Codex cloud documentation — Per-task logs, usage dashboards, and admin controls for business plans; deeper APM requires external tooling

mediumVerified: 2026-07-09

production readiness

Maturity assessment from rollout timeline, model upgrades, and enterprise availability

Evidence

Codex rollout history — Research preview 2025-05-16, ChatGPT Plus rollout 2025-06-03, GPT-5-Codex upgrades Sept 2025

Codex changelog — Continuous 2026 investment: GPT-5.5 now the recommended Codex model; GPT-5.3-Codex deprecated (no new API requests after 2026-06-30, endpoint shutdown 2026-12-31); June 2026 added Codex Remote GA, Record & Replay skills, and Sites preview

highVerified: 2026-07-09

Strengths

+Strong isolation: per-task containers with internet disabled by default during execution
+Runs many tasks in parallel for fleet-style software delegation
+Verifiable outputs with terminal logs, test results, and action citations
+Open-source Apache-2.0 CLI with local OS-level sandboxing
+Deep GitHub integration from task to reviewed pull request
+Continuously upgraded models, currently GPT-5.5 (with GPT-5.4 / GPT-5.4 mini options)

Limitations

!Default network isolation can block tasks needing external dependencies unless allowlists are configured
!Cloud agent and Codex models are proprietary with no self-hosted option
!Plan-based limits are opaque; heavy users may need the $100/mo Pro 5x tier (~$100-200/dev/month typical)
!Stateless per-task containers limit cross-task memory beyond AGENTS.md
!Environment setup scripts add onboarding friction for complex monorepos

Metadata

license: Cloud agent proprietary; Codex CLI Apache-2.0 (github.com/openai/codex)

supported models

0: GPT-5.5 (recommended, mid-2026)

1: GPT-5.4 / GPT-5.4 mini

2: GPT-5.3-Codex (deprecated; API sunset 2026-12-31)

3: GPT-5-Codex (Sept 2025)

4: codex-1 (launch)

programming languages

0: Language-agnostic (any language in the repository)

deployment type: Managed cloud containers + local open-source CLI

tool support

0: Container shell and editor

1: Test runners

2: GitHub PR integration

3: Configurable network allowlists

4: AGENTS.md project instructions

first release: CLI April 2025; cloud agent research preview 2025-05-16; ChatGPT Plus rollout 2025-06-03

pricing: Included in ChatGPT plans; $100/mo Pro 5x tier (added 2026-04-09); typical usage ~$100-200/dev/month

interfaces

0: ChatGPT web and mobile

1: Codex CLI

2: IDE extensions

3: GitHub (@codex mentions)

Use Case Ratings

code generation

Core use case: parallel feature work, bug fixes, refactors, and PR generation with test evidence

data analysis

Capable of scripted analysis within containers, though network-off defaults limit live data access

research assistant

Strong at codebase Q&A and architecture exploration; not aimed at general web research

education

Cited logs and diffs make its work reviewable for learning, but it targets professional workflows