GPT-5.4

vgpt-5-4-2026-03-05

OpenAI

Modelcomputer-useprevious-flagshipmillion-token-contextfactuality
91
Exceptional
About This Model

OpenAI's previous-generation flagship (superseded by GPT-5.5 in April 2026). Headline native computer use with 75% OSWorld-Verified, ~33% fewer factual errors than GPT-5.2, ~1.05M context. Thinking, Pro, mini, and nano variants.

Last Evaluated: June 10, 2026
Official Website

Trust Vector Analysis

Dimension Breakdown

🚀Performance & Reliability
+

Strong generation defined by native computer use (75% OSWorld-Verified, +27.7 points over GPT-5.2) and a ~33% factual-error reduction. Superseded by GPT-5.5 six weeks after release.

task accuracy code

Industry-standard coding benchmarks and provider-reported comparisons

Evidence
OpenAI AnnouncementIncremental coding gains over GPT-5.2; release emphasis on computer use rather than coding (GPT-5.3-Codex remained the coding lead at launch)
mediumVerified: 2026-06-10
task accuracy reasoning

PhD-level and Olympiad-level reasoning benchmarks

Evidence
OpenAI AnnouncementReasoning improvements over GPT-5.2 across science and math suites; Thinking and Pro variants for deeper reasoning
mediumVerified: 2026-06-10
task accuracy general

Computer-use benchmarks and factual accuracy testing

Evidence
OSWorld-Verified75% on OSWorld-Verified with native computer use (vs GPT-5.2's 47.3%)
OpenAI Announcement~33% fewer factual errors than GPT-5.2
highVerified: 2026-06-10
output consistency

Internal testing across model variants

Evidence
OpenAI Announcement~33% reduction in factual errors vs GPT-5.2 improves run-to-run reliability
highVerified: 2026-06-10
latency p50

Median latency for API requests with standard prompt sizes

Evidence
Community benchmarkingmini and nano variants optimized for low latency
mediumVerified: 2026-06-10
latency p95

95th percentile response time across diverse workloads

Evidence
Community benchmarkingp95 latency varies by variant and reasoning depth
mediumVerified: 2026-06-10
context window

Official specification from provider

Evidence
OpenAI Documentation~1.05M token input context, 128K max output tokens
highVerified: 2026-06-10
uptime

Historical uptime data from official status page

Evidence
OpenAI Status99.9% uptime (last 90 days)
highVerified: 2026-06-10
🛡️Security
+

Solid security posture with computer-use-specific guardrails. Native computer use expands the attack surface (UI-based injection) relative to text-only models.

prompt injection resistance

Testing against OWASP LLM01 prompt injection attacks

Evidence
OpenAI Safety ResearchInjection defenses extended to native computer-use surfaces (screenshots, UI text)
mediumVerified: 2026-06-10
jailbreak resistance

Adversarial prompt testing against jailbreak datasets

Evidence
OpenAI AnnouncementImproved refusal robustness over GPT-5.2 per release documentation
mediumVerified: 2026-06-10
data leakage prevention

Analysis of privacy policies and data handling practices

Evidence
OpenAI Privacy PolicyNo training on API data by default
mediumVerified: 2026-06-10
output safety

Safety testing across harmful content categories

Evidence
OpenAI SafetyMulti-layer safety with additional guardrails for autonomous computer-use actions
highVerified: 2026-06-10
api security

Review of API security features and best practices

Evidence
OpenAI Platform DocsAPI key + OAuth2 authentication, HTTPS only, rate limiting
highVerified: 2026-06-10
🔒Privacy & Compliance
+

Standard OpenAI enterprise posture: SOC 2, no API-data training by default, zero-data-retention options. Not HIPAA eligible.

data residency

Review of enterprise documentation

Evidence
OpenAI EnterpriseData residency options for enterprise customers
highVerified: 2026-06-10
training data optout

Policy review of data usage terms

Evidence
OpenAI Data ControlsAPI data not used for training by default
highVerified: 2026-06-10
data retention

Terms of service and enterprise documentation review

Evidence
OpenAI Terms30-day default API log retention; zero-data-retention options for qualifying customers
highVerified: 2026-06-10
pii handling

Review of data protection capabilities

Evidence
OpenAI Safety ToolsCustomer responsible for PII redaction; moderation API available
mediumVerified: 2026-06-10
compliance certifications

Verification of compliance certifications

Evidence
OpenAI Trust CenterSOC 2 Type II, ISO 27001, GDPR compliant
highVerified: 2026-06-10
zero data retention

Enterprise feature review

Evidence
OpenAI EnterpriseZero-data-retention options available for enterprise and qualifying API customers
highVerified: 2026-06-10
👁️Trust & Transparency
+

Marked transparency improvement via the ~33% factual-error reduction. Computer-use action logging aids auditability of agentic runs.

explainability

Evaluation of reasoning transparency and explanation capabilities

Evidence
GPT-5.4 Thinking VariantThinking variant exposes reasoning summaries; computer-use actions are step-logged
highVerified: 2026-06-10
hallucination rate

Factual accuracy testing on QA datasets

Evidence
OpenAI Announcement~33% fewer factual errors than GPT-5.2
highVerified: 2026-06-10
bias fairness

Bias benchmarks and demographic testing

Evidence
OpenAI SafetyRegular bias testing and red-teaming program
mediumVerified: 2026-06-10
uncertainty quantification

Qualitative assessment of confidence expression in outputs

Evidence
OpenAI DocumentationBetter uncertainty expression accompanying the factual-error reduction
mediumVerified: 2026-06-10
model card quality

Documentation completeness and clarity review

Evidence
GPT-5.4 Announcement and System CardDetailed release documentation covering variants, computer use, and safety evaluations
highVerified: 2026-06-10
training data transparency

Review of public disclosures about training data

Evidence
OpenAI BlogGeneral description of training approach; specific sources not disclosed
mediumVerified: 2026-06-10
guardrails

Analysis of built-in safety mechanisms

Evidence
OpenAI Safety SystemsMulti-layer safety guardrails including computer-use action confirmation
highVerified: 2026-06-10
⚙️Operational Excellence
+

Excellent operational maturity with the full variant family. Procurement caveat: superseded by GPT-5.5 only six weeks after release; plan migrations accordingly.

api design quality

Review of API design, consistency, and feature completeness

Evidence
OpenAI API ReferenceResponses API with streaming, function calling, vision, and native computer use
highVerified: 2026-06-10
sdk quality

SDK quality, documentation, and maintenance review

Evidence
OpenAI SDKsOfficial SDKs for Python, Node.js, Go, .NET, actively maintained
highVerified: 2026-06-10
versioning policy

Review of versioning policy and historical deprecation practices

Evidence
OpenAI DeprecationsClear deprecation schedule; GPT-5.4 remains supported but OpenAI designates GPT-5.5 as the migration target for the GPT-5.x line
highVerified: 2026-06-10
monitoring observability

Review of available monitoring tools and metrics

Evidence
OpenAI DashboardDetailed usage dashboard with costs, tokens, rate limits
highVerified: 2026-06-10
support quality

Support and documentation assessment

Evidence
OpenAI Support24/7 support, comprehensive docs, active developer community
highVerified: 2026-06-10
ecosystem maturity

Ecosystem breadth and depth analysis

Evidence
OpenAI PlatformLargest AI ecosystem; full variant family (Thinking, Pro, mini, nano) for cost/latency tiers
highVerified: 2026-06-10
license terms

Review of licensing terms and restrictions

Evidence
OpenAI TermsStandard commercial terms with usage policies
highVerified: 2026-06-10
Strengths
  • +Native computer use: 75% OSWorld-Verified (vs GPT-5.2's 47.3%)
  • +~33% fewer factual errors than GPT-5.2
  • +~1.05M token input context with 128K output
  • +Full variant family: Thinking, Pro, mini, nano for cost/latency tiers
  • +Better price-performance than GPT-5.5 ($2.50/$15 vs $5/$30)
  • +Mature OpenAI ecosystem and tooling
Limitations
  • !Superseded by GPT-5.5 six weeks after release — previous-generation flagship
  • !Input pricing doubles above 272K input tokens
  • !Not HIPAA eligible
  • !30-day default API data retention
  • !Behind GPT-5.5 on reasoning and coding benchmarks (e.g., 75% vs 78.7% OSWorld-Verified)
  • !Computer use expands prompt-injection attack surface
Metadata
pricing
input: $2.50 per 1M tokens
output: $15.00 per 1M tokens
notes: Applies up to 272K input tokens; pricing doubles above that threshold. Thinking, Pro, mini, and nano variants priced separately.
last verified: 2026-06-10
context window: 1050000
max output: 128000
languages
0: English
1: Spanish
2: French
3: German
4: Italian
5: Portuguese
6: Japanese
7: Korean
8: Chinese
9: Russian
10: Arabic
11: Hindi
12: 50+ languages
modalities
0: text
1: vision
2: computer-use
api endpoint: https://api.openai.com/v1/responses
open source: false
architecture: Transformer-based with native computer-use capability and variant family
parameters: Not disclosed
knowledge cutoff: Late 2025

Use Case Ratings

code generation

Capable generalist coder with ~1.05M context, but GPT-5.3-Codex and GPT-5.5 lead on agentic coding benchmarks.

customer support

mini and nano variants give strong cost/latency tiers; ~33% fewer factual errors improves answer reliability.

content creation

Strong general writing with long-context coherence. Good value at $2.50/$15 vs GPT-5.5's $5/$30.

data analysis

Solid analytical reasoning with ~1.05M context for whole-dataset work; GPT-5.5 leads on frontier math.

research assistant

Reduced factual errors and very long context suit research; native computer use can drive live source gathering.

legal compliance

Good document analysis but not HIPAA eligible; 30-day default retention may be a concern.

healthcare

Not HIPAA eligible. Improved factuality helps clinical documentation but privacy controls remain a gap.

financial analysis

Strong quantitative reasoning at lower cost than GPT-5.5; computer use enables workflow automation across financial tools.

education

Reliable explanations with fewer factual errors; nano/mini tiers economical for high-volume tutoring.

creative writing

Capable narrative writing and good instruction following; flagship successors edge it on nuance.