Evaluation record · claude-sonnet-4-5

Claude Sonnet 4.5

v20250929

Anthropic

Modelcodingreasoningenterprisehipaa-eligible

Exceptional

About This Model

Previous-generation Sonnet released September 2025, since superseded by Sonnet 4.6 (2026) and Claude Sonnet 5 (2026-06-30). Still Active on the API with tentative retirement not sooner than 2026-09-29. Historically the top coding model of its era (77.2% SWE-bench Verified at launch) with extended thinking and strong safety features.

Last Evaluated: July 9, 2026

Official Website

Trust Vector Analysis

Dimension Breakdown

🚀Performance & Reliability

Exceptional performance across coding, reasoning, and general tasks. Extended thinking capability enables more reliable outputs for complex problems.

task accuracy code

Industry-standard coding benchmarks measuring real-world software engineering tasks

Evidence

SWE-bench Verified — 77.2% resolution rate (highest of any model at launch); corrected 2026-07-09 from a mislabeled Claude 3.5 Sonnet figure

Anthropic Internal Benchmarks — Best coding model across HumanEval, MBPP, and CodeContests

highVerified: 2026-07-09

task accuracy reasoning

Graduate and PhD-level reasoning benchmarks requiring multi-step problem solving

Evidence

GPQA Diamond — 65.0% (PhD-level reasoning)

MATH-500 — 92.3% accuracy

highVerified: 2026-07-09

task accuracy general

Crowdsourced blind comparisons and comprehensive knowledge testing

Evidence

OSWorld — 61.4% on real-world computer-use tasks (benchmark leader at launch)

MMLU-Pro — 78.0% on graduate-level knowledge

highVerified: 2026-07-09

output consistency

Internal testing with repeated prompts at various temperature settings

Evidence

Anthropic Documentation — Consistent outputs across temperature settings 0.0-1.0

mediumVerified: 2026-07-09

latency p50

Median latency for API requests with standard prompt sizes

Evidence

Anthropic API Documentation — Typical response time ~1.8s for standard prompts

mediumVerified: 2026-07-09

latency p95

95th percentile response time across diverse workloads

Evidence

Community benchmarking — p95 latency ~3.2s

mediumVerified: 2026-07-09

context window

Official specification from provider

Evidence

Anthropic API Documentation — 200K token context window

highVerified: 2026-07-09

uptime

Historical uptime data from official status page

Evidence

Anthropic Status Page — Claude API uptime 99.57% (last 90 days)

highVerified: 2026-07-09

🛡️Security

Strong security posture with Constitutional AI providing robust guardrails. Best-in-class prompt injection resistance.

prompt injection resistance

Testing against OWASP LLM01 prompt injection attacks

Evidence

Anthropic Safety Research — 90% resistance to prompt injection attacks in testing

Community Testing (Lakera) — Strong resistance compared to competitors

highVerified: 2026-07-09

jailbreak resistance

Testing against adversarial prompt datasets

Evidence

Anthropic Constitutional AI — Constitutional AI provides strong jailbreak resistance

Community Testing — 92% resistance to adversarial prompts

highVerified: 2026-07-09

data leakage prevention

Analysis of privacy policies and data handling practices

Evidence

Anthropic Privacy Statement — No training on user data without explicit consent

mediumVerified: 2026-07-09

output safety

Comprehensive safety testing across harmful content categories

Evidence

Anthropic Safety Evaluations — ASL-2 safety level, lowest refusal rate while maintaining safety

highVerified: 2026-07-09

api security

Review of API security features and best practices

Evidence

Anthropic API Documentation — API key authentication, HTTPS only, rate limiting

highVerified: 2026-07-09

🔒Privacy & Compliance

Exceptional privacy posture with ephemeral data handling and strong compliance certifications. HIPAA eligible.

data residency

Review of enterprise documentation and privacy policies

Evidence

Anthropic Enterprise Documentation — Data residency options for US and EU customers

highVerified: 2026-07-09

training data optout

Analysis of privacy policy and data usage terms

Evidence

Anthropic Privacy Policy — Opt-out available, no training on API data by default

highVerified: 2026-07-09

data retention

Review of terms of service and data retention policies

Evidence

Anthropic Terms of Service — API prompts and outputs not retained (except for trust & safety)

highVerified: 2026-07-09

pii handling

Review of data protection capabilities and customer responsibilities

Evidence

Anthropic Privacy Documentation — Customer responsible for PII redaction, no automatic detection

mediumVerified: 2026-07-09

compliance certifications

Verification of compliance certifications and audit reports

Evidence

Anthropic Trust Center — SOC 2 Type II, GDPR compliant, HIPAA eligible

highVerified: 2026-07-09

zero data retention

Review of data handling practices

Evidence

Anthropic API Documentation — Ephemeral data processing, no storage of prompts/outputs

highVerified: 2026-07-09

👁️Trust & Transparency

Strong explainability with extended thinking feature. Constitutional AI provides transparency in alignment approach. Training data transparency could be improved.

explainability

Evaluation of reasoning transparency and explanation capabilities

Evidence

Extended Thinking Feature — Extended thinking mode exposes reasoning process

Anthropic Research — Constitutional AI provides interpretable alignment

highVerified: 2026-07-09

hallucination rate

Testing on factual QA datasets and real-world usage

Evidence

SimpleQA Benchmark — Claude performs well on factual accuracy tests

Community Testing — Lower hallucination rate with citation requests

mediumVerified: 2026-07-09

bias fairness

Evaluation on bias benchmarks and diverse demographic testing

Evidence

Anthropic Responsible Scaling Policy — Regular bias testing and mitigation

BBQ Benchmark — Moderate performance on bias detection benchmarks

mediumVerified: 2026-07-09

uncertainty quantification

Qualitative assessment of confidence expression in outputs

Evidence

Model Behavior — Model expresses uncertainty when appropriate

mediumVerified: 2026-07-09

model card quality

Review of documentation completeness and clarity

Evidence

Anthropic Model Documentation — Comprehensive model cards with capabilities, limitations, benchmarks

highVerified: 2026-07-09

training data transparency

Review of public disclosures about training data

Evidence

Anthropic Public Statements — General description provided, detailed sources not disclosed

mediumVerified: 2026-07-09

guardrails

Analysis of built-in safety mechanisms

Evidence

Constitutional AI — Built-in Constitutional AI safety guardrails

highVerified: 2026-07-09

⚙️Operational Excellence

Excellent operational maturity with well-designed APIs, strong SDKs, and good documentation. Enterprise-ready.

api design quality

Review of API design, consistency, and feature completeness

Evidence

Anthropic API Documentation — RESTful API with streaming, function calling, vision support

highVerified: 2026-07-09

sdk quality

Review of SDK quality, documentation, and maintenance

Evidence

Anthropic SDKs — Official SDKs for Python, TypeScript, actively maintained

highVerified: 2026-07-09

versioning policy

Review of versioning policy and historical practices

Evidence

Anthropic API Versioning — Clear versioning with 6-month deprecation notice

Anthropic Model Deprecations — claude-sonnet-4-5-20250929 still Active (not deprecated); tentative retirement not sooner than September 29, 2026; listed as a legacy model in the docs

highVerified: 2026-07-09

monitoring observability

Review of available monitoring tools and metrics

Evidence

Anthropic Console — Usage dashboard with metrics, but limited observability

mediumVerified: 2026-07-09

support quality

Assessment of documentation, community, and support responsiveness

Evidence

Anthropic Support — Email support, Discord community, comprehensive docs

highVerified: 2026-07-09

ecosystem maturity

Analysis of third-party integrations and tools

Evidence

GitHub Ecosystem — Growing ecosystem with LangChain, LlamaIndex integration

highVerified: 2026-07-09

license terms

Review of licensing terms and restrictions

Evidence

Anthropic Terms of Service — Standard commercial terms, enterprise agreements available

highVerified: 2026-07-09

Strengths

+Top coding model of its era: 77.2% SWE-bench Verified at launch (September 2025)
+Extended thinking feature for complex problem-solving
+Exceptional privacy posture with ephemeral data handling
+Strong safety and jailbreak resistance via Constitutional AI
+200K context window enables large-scale document processing
+HIPAA eligible for healthcare applications

Limitations

!Higher latency than some competitors (~1.8s p50)
!Limited vision capabilities compared to multimodal specialists
!Training data transparency could be improved
!No built-in PII detection (customer responsibility)
!Premium pricing ($3/$15 per 1M tokens)
!Superseded by Sonnet 4.6 and Claude Sonnet 5 (2026-06-30) at the same standard price; 200K context vs their 1M
!No adaptive thinking or effort parameter (introduced with Sonnet 4.6)

Metadata

pricing

input: $3.00 per 1M tokens

output: $15.00 per 1M tokens

notes: Premium tier pricing, batch discounts available for enterprise. Confirmed unchanged at $3/$15 as of 2026-07-09.

last verified: 2026-07-09

context window: 200000

max output: 64000

languages

0: English

1: Spanish

2: French

3: German

4: Italian

5: Portuguese

6: Japanese

7: Korean

8: Chinese

9: Arabic

10: Hindi

modalities

0: text

1: image (input)

2: document

api endpoint: https://api.anthropic.com/v1/messages

open source: false

architecture: Transformer-based with Constitutional AI alignment

parameters: Not disclosed

knowledge cutoff: January 2025 (reliable); training data through July 2025

Use Case Ratings

code generation

Top coding model of its era (77.2% SWE-bench Verified at launch). Superseded by Sonnet 4.6 and Sonnet 5 for new builds.

customer support

Strong empathy and natural conversation. Slightly higher latency than specialized models, but excellent quality.

content creation

Excellent for long-form content, maintains consistent voice and structure. Natural writing style.

data analysis

Strong SQL generation and data interpretation. Extended thinking excellent for complex analytical tasks.

research assistant

Excellent summarization and synthesis. Extended thinking mode provides detailed reasoning for complex topics.

legal compliance

Strong privacy posture and careful reasoning. HIPAA eligible. Extended thinking useful for contract analysis.

healthcare

HIPAA eligible with strong privacy controls. Good for clinical documentation but requires human oversight.

financial analysis

Strong analytical capabilities and mathematical reasoning. Good for financial modeling and report generation.

education

Excellent tutoring capabilities with patient explanations. Extended thinking shows work step-by-step.

creative writing

Good for creative tasks but can be slightly verbose. Strong dialogue and character development.

Similar Models

Claude Sonnet 4.6

Anthropic

Claude Opus 4.8

Anthropic

Claude Haiku 4.5

Anthropic

GPT-5.5

OpenAI