Evaluation record · gpt-5-2

GPT-5.2

vgpt-5-2-2025-12-11

OpenAI

Modelsupersededreasoningmultimodal400k-context

Exceptional

About This Model

SUPERSEDED by GPT-5.4 (2026-03-05), GPT-5.5 (2026-04-23), and the GPT-5.6 family (2026-07-09). API snapshots (gpt-5.2, gpt-5.2-2025-12-11) still served with no announced shutdown, but gpt-5.2-chat-latest shuts down 2026-08-10 (announced 2026-05-08; migrate to gpt-5.5). 400K context window, 100% AIME 2025 score, 52.9% ARC-AGI-2. Three variants: Instant (speed), Thinking (reasoning), Pro (accuracy). New projects should prefer GPT-5.5.

Last Evaluated: July 9, 2026

Official Website

Trust Vector Analysis

Dimension Breakdown

🚀Performance & Reliability

Industry-leading reasoning with 100% AIME and 52.9% ARC-AGI-2. 400K context enables full codebase processing. ~30% fewer hallucinations than GPT-5.1.

task accuracy code

Industry-standard coding benchmarks

Evidence

SWE-bench Verified — 80% on SWE-bench Verified

SWE-bench Pro — 56.4% on SWE-bench Pro (Codex variant)

Terminal-bench 2.0 — 64.0% (industry-leading command-line tasks)

highVerified: 2026-07-09

task accuracy reasoning

PhD-level and Olympiad-level reasoning benchmarks

Evidence

AIME 2025 — 100% (perfect score, no tools)

GPQA Diamond — 93.2% Pro / 92.4% Thinking (near state-of-the-art)

ARC-AGI-2 — 52.9% (massive lead, 3x GPT-5.1)

ARC-AGI-1 — 90.5% (first model above 90%)

FrontierMath Tier 1-3 — 40.3% (10% improvement over GPT-5.1)

highVerified: 2026-07-09

task accuracy general

Crowdsourced and expert-level comparisons

Evidence

LMSYS Chatbot Arena — Top tier ELO rating

GDPval — 70.9% (beats human experts at 11x speed, <1% cost)

highVerified: 2026-07-09

output consistency

Internal testing across model variants

Evidence

OpenAI Documentation — ~30% fewer errors/hallucinations vs GPT-5.1

highVerified: 2026-07-09

latency p50

Platform-wide performance metrics

Evidence

OpenAI Performance Data — Instant variant optimized for low latency

highVerified: 2026-07-09

latency p95

95th percentile response time

Evidence

Community benchmarking — p95 latency varies by variant

mediumVerified: 2026-07-09

context window

Official specification

Evidence

OpenAI Documentation — 400K token context window (industry-leading for non-Google)

highVerified: 2026-07-09

uptime

Historical uptime data

Evidence

OpenAI Status — 99.9% uptime (last 90 days)

highVerified: 2026-07-09

🛡️Security

Strong security with multi-layer safety systems. 30% fewer hallucinations improves output safety.

prompt injection resistance

Testing against OWASP LLM01 attacks

Evidence

OpenAI Safety Research — Enhanced prompt injection defenses

mediumVerified: 2026-07-09

jailbreak resistance

Adversarial prompt testing

Evidence

OpenAI System Card — Improved resistance over GPT-5.1

mediumVerified: 2026-07-09

data leakage prevention

Policy review and data handling practices

Evidence

OpenAI Privacy Policy — No training on API data by default

mediumVerified: 2026-07-09

output safety

Safety testing across harmful content categories

Evidence

OpenAI Safety Evals — Multi-layer safety with improved refusal accuracy

highVerified: 2026-07-09

api security

Review of API security features

Evidence

OpenAI Platform Docs — API key + OAuth2, HTTPS, rate limiting

highVerified: 2026-07-09

🔒Privacy & Compliance

Good privacy with 30-day default retention. Zero retention for enterprise. Not HIPAA eligible.

data residency

Review of enterprise documentation

Evidence

OpenAI Enterprise — Data residency options for enterprise

highVerified: 2026-07-09

training data optout

Policy review

Evidence

OpenAI Data Controls — API data not used for training by default

highVerified: 2026-07-09

data retention

Evidence

OpenAI Terms — API logs retained for 30 days (zero retention for enterprise)

highVerified: 2026-07-09

pii handling

Review of data protection capabilities

Evidence

OpenAI Safety Tools — Customer responsible for PII, moderation API available

mediumVerified: 2026-07-09

compliance certifications

Verification of certifications

Evidence

OpenAI Trust Center — SOC 2 Type II, ISO 27001, GDPR compliant

highVerified: 2026-07-09

zero data retention

Enterprise feature review

Evidence

OpenAI Enterprise — Zero retention available for enterprise tier

highVerified: 2026-07-09

👁️Trust & Transparency

Excellent transparency with 30% fewer hallucinations. Thinking variant provides reasoning insight. Comprehensive system card.

explainability

Evaluation of reasoning transparency

Evidence

GPT-5.2 Thinking Variant — Thinking variant exposes reasoning process

highVerified: 2026-07-09

hallucination rate

Factual accuracy testing

Evidence

OpenAI Testing — ~30% fewer errors/hallucinations than GPT-5.1

highVerified: 2026-07-09

bias fairness

Bias benchmarks and demographic testing

Evidence

OpenAI System Card — Regular bias testing and red-teaming

mediumVerified: 2026-07-09

uncertainty quantification

Qualitative confidence expression

Evidence

GPT-5.2 Capabilities — Better uncertainty expression with lower hallucination rate

mediumVerified: 2026-07-09

model card quality

Documentation completeness review

Evidence

GPT-5.2 System Card — Comprehensive system card with detailed evaluations

highVerified: 2026-07-09

training data transparency

Public disclosure review

Evidence

OpenAI Blog — General description, specific sources not disclosed

mediumVerified: 2026-07-09

guardrails

Safety mechanism analysis

Evidence

OpenAI Safety Systems — Multi-layer safety with improved accuracy

highVerified: 2026-07-09

⚙️Operational Excellence

Industry-leading operational maturity with largest ecosystem. Three model variants for different use cases. Excellent tooling.

api design quality

API design and feature review

Evidence

OpenAI API — RESTful API with streaming, function calling, vision, audio

highVerified: 2026-07-09

sdk quality

SDK quality and maintenance review

Evidence

OpenAI SDKs — Official SDKs for Python, Node.js, Go, .NET, Swift

highVerified: 2026-07-09

versioning policy

Versioning policy review

Evidence

OpenAI Versioning — Clear versioning with deprecation notices

OpenAI: Introducing GPT-5.5 — GPT-5.2 superseded by GPT-5.4 (2026-03-05) and GPT-5.5 (2026-04-23)

OpenAI Deprecations — gpt-5.2-chat-latest API shutdown 2026-08-10 (announced 2026-05-08), replacement gpt-5.5; base gpt-5.2 snapshots remain served ('previous frontier model' per model page) with no announced shutdown

highVerified: 2026-07-09

monitoring observability

Observability tools review

Evidence

OpenAI Dashboard — Detailed usage dashboard with costs, tokens, rate limits

highVerified: 2026-07-09

support quality

Support and documentation assessment

Evidence

OpenAI Support — 24/7 support, comprehensive docs, active community

highVerified: 2026-07-09

ecosystem maturity

Ecosystem breadth and depth analysis

Evidence

OpenAI Ecosystem — Largest ecosystem with Assistants API, plugins, GPTs, Codex

highVerified: 2026-07-09

license terms

License terms review

Evidence

OpenAI Terms — Standard commercial terms with usage policies

highVerified: 2026-07-09

Strengths

+Industry-leading reasoning: 100% AIME, 52.9% ARC-AGI-2, 93.2% GPQA Diamond
+400K context window (largest non-Google model)
+~30% fewer hallucinations than GPT-5.1
+Three variants: Instant (speed), Thinking (reasoning), Pro (accuracy)
+90.5% ARC-AGI-1 (first model above 90%)
+~390x efficiency improvement on ARC-AGI-1 vs o3 (High) from year prior
+Largest AI ecosystem with best tooling

Limitations

!Not HIPAA eligible (unlike Claude models)
!30-day data retention vs Anthropic's 0-day
!1.4x price increase over GPT-5.1 ($1.75/$14)
!Slightly behind Claude Opus 4.5 on SWE-bench (80% vs 80.9%)
!Smaller context than Gemini 3 (400K vs 1M)
!SUPERSEDED: GPT-5.4 (2026-03-05), GPT-5.5 (2026-04-23), and GPT-5.6 (2026-07-09) are newer; gpt-5.2-chat-latest API shutdown 2026-08-10

Metadata

pricing

input: $1.75 per 1M tokens

output: $14.00 per 1M tokens

notes: Confirmed unchanged on official model page 2026-07-09 (snapshots gpt-5.2, gpt-5.2-2025-12-11).

last verified: 2026-07-09

context window: 400000

max output: 128000

languages

0: English

1: Spanish

2: French

3: German

4: Italian

5: Portuguese

6: Japanese

7: Korean

8: Chinese

9: Russian

10: Arabic

11: Hindi

12: 50+ languages

modalities

0: text

1: vision

2: audio (input/output)

api endpoint: https://api.openai.com/v1/chat/completions

open source: false

architecture: Transformer-based with unified thinking system

parameters: Not disclosed

knowledge cutoff: August 31, 2025

Use Case Ratings

code generation

80% SWE-bench with Codex variant reaching 56.4% SWE-bench Pro. 400K context enables full codebase analysis.

customer support

Instant variant provides low latency. 30% fewer hallucinations improves response accuracy.

content creation

Excellent creative capabilities with natural writing style. Multiple variants for different needs.

data analysis

Strong analytical capabilities. 400K context enables massive dataset analysis.

research assistant

Thinking variant excels at deep analysis. 400K context for comprehensive research.

legal compliance

Good capabilities but not HIPAA eligible. 30-day default retention may be concern.

healthcare

Not HIPAA eligible. Good clinical understanding but privacy controls less strict.

financial analysis

Excellent quantitative reasoning with 100% AIME. Strong for financial modeling.

education

Exceptional math (100% AIME). Patient explanations with reduced hallucinations.

creative writing

Strong creative capabilities with good narrative flow and character development.

Similar Models

GPT-5.5

OpenAI

GPT-5.4

OpenAI

GPT-5.1

OpenAI

GPT-5.2 Codex

OpenAI

Claude Opus 4.5

Anthropic

Gemini 3 Pro

Google