Evaluation record · claude-opus-4-5

Claude Opus 4.5

v20251101

Anthropic

Modelsupersededcodingreasoningenterprise

Exceptional

About This Model

SUPERSEDED: no longer Anthropic's most capable model — succeeded by Opus 4.6, 4.7, 4.8 (2026-05-28) and Claude Fable 5 (2026-06-09, new top tier). At launch it scored 80.9% SWE-bench and was the first model to exceed 80% on SWE-bench Verified, with a unique effort parameter for compute control.

Last Evaluated: July 9, 2026

Official Website

Trust Vector Analysis

Dimension Breakdown

🚀Performance & Reliability

Industry-leading coding capabilities with 80.9% SWE-bench. Unique effort parameter allows compute control. Exceptional abstract reasoning (37.6% ARC-AGI-2).

task accuracy code

Industry-standard coding benchmarks measuring real-world software engineering tasks

Evidence

SWE-bench Verified — 80.9% resolution rate (first model to exceed 80%, industry-leading)

Aider Polyglot — 89.4% on polyglot coding tasks

Terminal-bench 2.0 — 59.3% on command-line tasks

highVerified: 2026-07-09

task accuracy reasoning

Graduate and PhD-level reasoning benchmarks requiring multi-step problem solving

Evidence

GPQA Diamond — 87% (PhD-level science questions)

ARC-AGI-2 — 37.6% (2x GPT-5.1's 17.6%, exceptional abstract reasoning)

highVerified: 2026-07-09

task accuracy general

Comprehensive knowledge and multimodal testing

Evidence

MMLU — ~90.8% on graduate-level knowledge

MMMU (Vision) — 80.7% multimodal understanding

highVerified: 2026-07-09

output consistency

Internal testing with effort parameter across quality levels

Evidence

Anthropic Documentation — Effort parameter enables consistent quality control

highVerified: 2026-07-09

latency p50

Median latency for API requests with standard prompt sizes

Evidence

Community benchmarking — Typical response time ~2.5s for standard prompts

mediumVerified: 2026-07-09

latency p95

95th percentile response time across diverse workloads

Evidence

Community benchmarking — p95 latency ~5.0s

mediumVerified: 2026-07-09

context window

Official specification from provider

Evidence

Anthropic API Documentation — 200K token context window

highVerified: 2026-07-09

uptime

Historical uptime data from official status page

Evidence

Anthropic Status Page — Claude API uptime 99.57% (last 90 days)

highVerified: 2026-07-09

🛡️Security

Strongest safety posture in the Claude family. Enhanced Constitutional AI provides industry-leading jailbreak resistance.

prompt injection resistance

Testing against OWASP LLM01 prompt injection attacks

Evidence

Anthropic Safety Research — 92% resistance to prompt injection attacks in testing

highVerified: 2026-07-09

jailbreak resistance

Testing against adversarial prompt datasets

Evidence

Anthropic Constitutional AI — Enhanced Constitutional AI provides strongest jailbreak resistance

highVerified: 2026-07-09

data leakage prevention

Analysis of privacy policies and data handling practices

Evidence

Anthropic Privacy Statement — No training on user data without explicit consent

mediumVerified: 2026-07-09

output safety

Comprehensive safety testing across harmful content categories

Evidence

Anthropic Safety Evaluations — ASL-2+ safety level with enhanced guardrails

highVerified: 2026-07-09

api security

Review of API security features and best practices

Evidence

Anthropic API Documentation — API key authentication, HTTPS only, rate limiting

highVerified: 2026-07-09

🔒Privacy & Compliance

Exceptional privacy posture with ephemeral data handling and strong compliance certifications. HIPAA eligible for healthcare.

data residency

Review of enterprise documentation and privacy policies

Evidence

Anthropic Enterprise Documentation — Data residency options for US and EU customers

highVerified: 2026-07-09

training data optout

Analysis of privacy policy and data usage terms

Evidence

Anthropic Privacy Policy — Opt-out available, no training on API data by default

highVerified: 2026-07-09

data retention

Review of terms of service and data retention policies

Evidence

Anthropic Terms of Service — API prompts and outputs not retained (except for trust & safety)

highVerified: 2026-07-09

pii handling

Review of data protection capabilities and customer responsibilities

Evidence

Anthropic Privacy Documentation — Customer responsible for PII redaction

mediumVerified: 2026-07-09

compliance certifications

Verification of compliance certifications and audit reports

Evidence

Anthropic Trust Center — SOC 2 Type II, GDPR compliant, HIPAA eligible

highVerified: 2026-07-09

zero data retention

Review of data handling practices

Evidence

Anthropic API Documentation — Ephemeral data processing, no storage of prompts/outputs

highVerified: 2026-07-09

👁️Trust & Transparency

Strong explainability with effort parameter control. Enhanced Constitutional AI provides transparency in alignment approach.

explainability

Evaluation of reasoning transparency and explanation capabilities

Evidence

Effort Parameter Feature — Effort parameter provides control and transparency over reasoning depth

highVerified: 2026-07-09

hallucination rate

Testing on factual QA datasets and real-world usage

Evidence

Anthropic Testing — Improved factual accuracy with effort parameter on high

mediumVerified: 2026-07-09

bias fairness

Evaluation on bias benchmarks and diverse demographic testing

Evidence

Anthropic Responsible Scaling Policy — Regular bias testing and mitigation

mediumVerified: 2026-07-09

uncertainty quantification

Qualitative assessment of confidence expression in outputs

Evidence

Model Behavior — Model expresses uncertainty appropriately

mediumVerified: 2026-07-09

model card quality

Review of documentation completeness and clarity

Evidence

Anthropic Model Documentation — Comprehensive model cards with capabilities, limitations, benchmarks

highVerified: 2026-07-09

training data transparency

Review of public disclosures about training data

Evidence

Anthropic Public Statements — General description provided, detailed sources not disclosed

mediumVerified: 2026-07-09

guardrails

Analysis of built-in safety mechanisms

Evidence

Constitutional AI — Enhanced Constitutional AI safety guardrails

highVerified: 2026-07-09

⚙️Operational Excellence

Excellent operational maturity with multi-cloud availability. Effort parameter adds unique control capability. Enterprise-ready.

api design quality

Review of API design, consistency, and feature completeness

Evidence

Anthropic API Documentation — RESTful API with streaming, function calling, vision, effort parameter

highVerified: 2026-07-09

sdk quality

Review of SDK quality, documentation, and maintenance

Evidence

Anthropic SDKs — Official SDKs for Python, TypeScript, actively maintained

highVerified: 2026-07-09

versioning policy

Review of versioning policy and historical practices

Evidence

Anthropic API Versioning — Clear versioning with 6-month deprecation notice

Anthropic: Claude Opus 4.8 — Opus 4.5 superseded by Opus 4.6 (2026-02-05), 4.7 (2026-04-16), 4.8 (2026-05-28); Claude Fable 5 (2026-06-09) is the new top tier

Anthropic Model Deprecations — claude-opus-4-5-20251101 still Active (not deprecated); tentative retirement not sooner than November 24, 2026

highVerified: 2026-07-09

monitoring observability

Review of available monitoring tools and metrics

Evidence

Anthropic Console — Usage dashboard with metrics

mediumVerified: 2026-07-09

support quality

Assessment of documentation, community, and support responsiveness

Evidence

Anthropic Support — Email support, Discord community, comprehensive docs

highVerified: 2026-07-09

ecosystem maturity

Analysis of third-party integrations and tools

Evidence

Cloud Providers — Available on AWS Bedrock, Google Vertex AI, Azure Foundry

highVerified: 2026-07-09

license terms

Review of licensing terms and restrictions

Evidence

Anthropic Terms of Service — Standard commercial terms, enterprise agreements available

highVerified: 2026-07-09

Strengths

+Industry-leading coding: 80.9% SWE-bench Verified (first model >80%)
+Unique effort parameter for compute/quality control
+Exceptional abstract reasoning: 37.6% ARC-AGI-2 (2x GPT-5.1)
+Best computer-use model: 66.3% OSWorld
+67% price reduction from Opus 4.1 ($5/$25 vs $15/$75)
+HIPAA eligible with ephemeral data handling
+Multi-cloud availability (AWS, GCP, Azure)

Limitations

!Higher latency than Sonnet models (~2.5s p50)
!Smaller context than Gemini 3 (200K vs 1M)
!Premium pricing ($5/$25 per 1M tokens)
!No native audio capabilities
!Training data transparency limited (industry standard)
!SUPERSEDED: Opus 4.6/4.7/4.8 and Claude Fable 5 (2026-06-09) are newer; no longer Anthropic's most capable model
!Listed as a legacy model in Anthropic's docs; still Active on the API with tentative retirement not sooner than 2026-11-24

Metadata

pricing

input: $5.00 per 1M tokens

output: $25.00 per 1M tokens

notes: 67% reduction from Opus 4.1. Batch API 50% discount. Prompt caching up to 90% savings. Confirmed unchanged at $5/$25 as of 2026-07-09.

last verified: 2026-07-09

context window: 200000

max output: 64000

languages

0: English

1: Spanish

2: French

3: German

4: Italian

5: Portuguese

6: Japanese

7: Korean

8: Chinese

9: Arabic

10: Hindi

modalities

0: text

1: image (input)

2: document

3: computer-use

api endpoint: https://api.anthropic.com/v1/messages

open source: false

architecture: Transformer-based with Constitutional AI alignment and effort parameter

parameters: Not disclosed

knowledge cutoff: May 2025

Use Case Ratings

code generation

Industry-leading 80.9% SWE-bench. Best model for complex software engineering. Effort parameter enables quality/speed tradeoffs.

customer support

Strong empathy and natural conversation. Higher latency than Sonnet but superior quality for complex support.

content creation

Excellent for long-form, nuanced content. Effort parameter allows quality optimization for important pieces.

data analysis

Strong analytical capabilities. Effort parameter excellent for complex data interpretation.

research assistant

Exceptional for deep research. 200K context and effort parameter ideal for comprehensive analysis.

legal compliance

Strong privacy posture, HIPAA eligible. Effort parameter useful for thorough contract analysis.

healthcare

HIPAA eligible with strong privacy controls. Good for clinical documentation requiring high accuracy.

financial analysis

Excellent quantitative reasoning. Effort parameter enables thorough financial modeling.

education

Excellent tutoring with patient explanations. Can adjust effort based on question complexity.

creative writing

Strong creative capabilities with nuanced character development and narrative flow.

Similar Models

Claude Opus 4.8

Anthropic

Claude Fable 5

Anthropic

Claude Sonnet 4.5

Anthropic

Claude Opus 4.1

Anthropic

GPT-5.2

OpenAI

Gemini 3 Pro

Google