GPT-5.3-Codex

vgpt-5-3-codex-2026-02-05

OpenAI

Modelcodingagenticcodexspecialist
89
Strong
About This Model

OpenAI's agentic coding specialist: ~80% SWE-bench Verified, 77.3% Terminal-Bench, SOTA on SWE-Bench Pro at release, ~25% faster than GPT-5.2-Codex. The 5.3 generation was Codex-only — there is no general-purpose GPT-5.3.

Last Evaluated: June 10, 2026
Official Website

Trust Vector Analysis

Dimension Breakdown

🚀Performance & Reliability
+

Best-in-class agentic coding at release (~80% SWE-bench Verified, 77.3% Terminal-Bench, SOTA SWE-Bench Pro). Specialized model — general-purpose accuracy intentionally trails flagships.

task accuracy code

Industry-standard agentic coding benchmarks measuring real-world software engineering tasks

Evidence
SWE-bench Verified~80% resolution rate on SWE-bench Verified
Terminal-Bench77.3% on terminal/command-line agentic tasks
SWE-Bench ProState-of-the-art on SWE-Bench Pro at release
highVerified: 2026-06-10
task accuracy reasoning

Reasoning benchmark review relative to general-purpose flagships

Evidence
OpenAI AnnouncementStrong code-centric reasoning; not positioned for general scientific or mathematical reasoning
mediumVerified: 2026-06-10
task accuracy general

Comparison against general-purpose models on non-coding workloads

Evidence
OpenAI AnnouncementSpecialized for software engineering; OpenAI recommends GPT-5.x flagships for general-purpose tasks
mediumVerified: 2026-06-10
output consistency

Provider-reported reliability on multi-step agentic coding sessions

Evidence
OpenAI Announcement~25% faster than GPT-5.2-Codex with more reliable long-horizon task completion
mediumVerified: 2026-06-10
latency p50

Provider-reported relative latency on agentic coding workloads

Evidence
OpenAI Announcement~25% end-to-end speedup over GPT-5.2-Codex on equivalent tasks
mediumVerified: 2026-06-10
context window

Third-party model registry specification

Evidence
OpenRouter Model Page400K token context window listed for gpt-5.3-codex
mediumVerified: 2026-06-10
uptime

Historical uptime data from official status page

Evidence
OpenAI Status99.9% uptime (last 90 days)
highVerified: 2026-06-10
🛡️Security
+

Good security posture with sandboxing in Codex environments. Autonomous code execution warrants strict permissioning and review gates in production pipelines.

prompt injection resistance

Testing against OWASP LLM01 attacks including coding-agent vectors

Evidence
OpenAI Safety ResearchInjection defenses tuned for agentic coding (repository content, tool output, shell results)
mediumVerified: 2026-06-10
jailbreak resistance

Adversarial prompt testing against jailbreak datasets

Evidence
OpenAI AnnouncementInherits GPT-5.x safety training with coding-specific refusal calibration (e.g., malware requests)
mediumVerified: 2026-06-10
data leakage prevention

Analysis of privacy policies and data handling practices

Evidence
OpenAI Privacy PolicyNo training on API data by default
mediumVerified: 2026-06-10
output safety

Safety testing across harmful content and dangerous-action categories

Evidence
OpenAI SafetySandboxed execution defaults in Codex environments; guardrails on destructive commands
mediumVerified: 2026-06-10
api security

Review of API security features and best practices

Evidence
OpenAI Platform DocsAPI key + OAuth2 authentication, HTTPS only, rate limiting
highVerified: 2026-06-10
🔒Privacy & Compliance
+

Standard OpenAI enterprise posture. Proprietary source code sent as context is covered by no-training-by-default; zero-data-retention recommended for sensitive codebases.

data residency

Review of enterprise documentation

Evidence
OpenAI EnterpriseData residency options for enterprise customers
highVerified: 2026-06-10
training data optout

Policy review of data usage terms

Evidence
OpenAI Data ControlsAPI data (including submitted code) not used for training by default
highVerified: 2026-06-10
data retention

Terms of service and enterprise documentation review

Evidence
OpenAI Terms30-day default API log retention; zero-data-retention options for qualifying customers
highVerified: 2026-06-10
pii handling

Review of data protection capabilities

Evidence
OpenAI Safety ToolsCustomer responsible for scrubbing secrets/PII from code context; moderation API available
mediumVerified: 2026-06-10
compliance certifications

Verification of compliance certifications

Evidence
OpenAI Trust CenterSOC 2 Type II, ISO 27001, GDPR compliant
highVerified: 2026-06-10
zero data retention

Enterprise feature review

Evidence
OpenAI EnterpriseZero-data-retention options available for enterprise and qualifying API customers
highVerified: 2026-06-10
👁️Trust & Transparency
+

Agent transcripts give strong action-level auditability. Execution-based verification reduces unchecked hallucination relative to chat-style code generation.

explainability

Evaluation of reasoning and action transparency

Evidence
Codex Agent LogsStep-by-step agent transcripts (plans, diffs, test runs) provide strong action-level traceability
mediumVerified: 2026-06-10
hallucination rate

Code correctness evaluation with execution-based verification

Evidence
OpenAI AnnouncementTest-driven agent loop catches many fabrications; API hallucination still possible in unfamiliar frameworks
mediumVerified: 2026-06-10
bias fairness

Bias benchmarks and demographic testing

Evidence
OpenAI SafetyStandard bias testing program; less salient for code-specialist deployments
lowVerified: 2026-06-10
uncertainty quantification

Qualitative assessment of confidence expression in agentic outputs

Evidence
OpenAI DocumentationAgent flags failing tests and unresolved tasks rather than claiming success
mediumVerified: 2026-06-10
model card quality

Documentation completeness and clarity review

Evidence
GPT-5.3-Codex AnnouncementRelease documentation covers benchmarks, intended use, and Codex-only scope of the 5.3 generation
highVerified: 2026-06-10
training data transparency

Review of public disclosures about training data

Evidence
OpenAI BlogGeneral description of code-focused training; specific sources not disclosed
mediumVerified: 2026-06-10
guardrails

Analysis of built-in safety mechanisms

Evidence
OpenAI Safety SystemsGuardrails on destructive operations, secrets handling, and malware generation
mediumVerified: 2026-06-10
⚙️Operational Excellence
+

Deep coding-tool ecosystem (CLI, IDE, cloud agents). Codex line moves fast: GPT-5.2-Codex shuts down 2026-07-23, so plan for shorter model lifecycles than general flagships.

api design quality

Review of API design, consistency, and feature completeness

Evidence
OpenAI API ReferenceResponses API plus first-class integration with Codex CLI, IDE extensions, and Codex cloud
highVerified: 2026-06-10
sdk quality

SDK quality, documentation, and maintenance review

Evidence
OpenAI SDKsOfficial SDKs plus open-source Codex CLI, actively maintained
highVerified: 2026-06-10
versioning policy

Review of versioning policy and deprecation practices

Evidence
OpenAI DeprecationsClear deprecation schedule; predecessor GPT-5.2-Codex shuts down 2026-07-23
highVerified: 2026-06-10
monitoring observability

Review of available monitoring tools and metrics

Evidence
OpenAI DashboardDetailed usage dashboard with costs, tokens, rate limits; Codex task history
highVerified: 2026-06-10
support quality

Support and documentation assessment

Evidence
OpenAI Support24/7 support, comprehensive docs, active developer community
highVerified: 2026-06-10
ecosystem maturity

Ecosystem breadth and depth analysis

Evidence
Codex EcosystemCodex CLI, IDE extensions, cloud agents, GitHub integration; also available via OpenRouter
highVerified: 2026-06-10
license terms

Review of licensing terms and restrictions

Evidence
OpenAI TermsStandard commercial terms; customer retains rights to generated code per terms of use
highVerified: 2026-06-10
Strengths
  • +~80% SWE-bench Verified and SOTA on SWE-Bench Pro at release
  • +77.3% Terminal-Bench on agentic command-line tasks
  • +~25% faster than GPT-5.2-Codex on equivalent workloads
  • +Aggressive pricing (~$1.75/$14 per 1M) for a frontier coding model
  • +Deep tooling: Codex CLI, IDE extensions, cloud agents, GitHub integration
  • +Execution-verified outputs reduce unchecked code hallucination
Limitations
  • !Specialized for coding — weaker than flagships on general reasoning and writing
  • !No general-purpose GPT-5.3 exists; the 5.3 generation was Codex-only
  • !Fast Codex lifecycle: predecessor GPT-5.2-Codex shuts down 2026-07-23, suggesting shorter support horizons
  • !Pricing confirmed primarily via third-party listings (medium confidence)
  • !Not HIPAA eligible; 30-day default retention
  • !Autonomous code execution requires sandboxing and review gates
Metadata
pricing
input: $1.75 per 1M tokens (approximate)
output: $14.00 per 1M tokens (approximate)
notes: Pricing per OpenRouter listing (https://openrouter.ai/openai/gpt-5.3-codex); confidence medium pending first-party pricing page confirmation.
last verified: 2026-06-10
context window: 400000
max output: 128000
languages
0: English
1: Python
2: JavaScript/TypeScript
3: Go
4: Rust
5: Java
6: C/C++
7: C#
8: Ruby
9: PHP
10: Shell
11: SQL
modalities
0: text
1: vision (screenshots/diagrams)
2: code-execution (via Codex harness)
api endpoint: https://api.openai.com/v1/responses
open source: false
architecture: Transformer-based, fine-tuned for agentic software engineering (Codex line)
parameters: Not disclosed
knowledge cutoff: Late 2025

Use Case Ratings

code generation

Purpose-built agentic coder: ~80% SWE-bench Verified, 77.3% Terminal-Bench, SOTA SWE-Bench Pro at release, ~25% faster than GPT-5.2-Codex.

data analysis

Strong at writing and executing analysis code; general flagships better for open-ended analytical interpretation.

research assistant

Useful for code-centric research (reproducing papers, building experiment harnesses); not designed for general literature work.

education

Excellent for programming instruction with executable, test-verified examples; narrow outside software topics.