Evaluation record · deepseek-r1

DeepSeek-R1

v20251020

DeepSeek

Modelsupersededcodingreasoningopen-source

Strong

About This Model

DeepSeek's standalone reasoning model, now superseded and discontinued. Its reasoning was folded into DeepSeek V3.1's hybrid thinking mode (Aug 2025), then V3.2 (Dec 2025) and V4 (Apr 2026); a successor 'R2' never shipped. The legacy deepseek-reasoner name no longer serves R1 — it routes to V4-Flash's thinking mode and is removed 2026-07-24 (15:59 UTC). R1 remains available only via third-party hosts or self-hosted open weights. Historically 53.6% SWE-bench and 79.8% HumanEval.

Last Evaluated: July 9, 2026

Official Website

Trust Vector Analysis

Dimension Breakdown

🚀Performance & Reliability

Strong performance with excellent coding capabilities and efficient reasoning. Competitive latency despite reasoning optimization.

task accuracy code

Industry-standard coding benchmarks measuring real-world software engineering tasks

Evidence

SWE-bench Verified — 53.6% resolution rate

HumanEval — 79.8% accuracy on code generation

highVerified: 2026-07-09

task accuracy reasoning

Graduate-level reasoning benchmarks requiring multi-step problem solving

Evidence

MATH Benchmark — 88.5% on mathematical reasoning

GPQA — 58.7% on graduate-level questions

highVerified: 2026-07-09

task accuracy general

Comprehensive knowledge testing across domains

Evidence

MMLU — 74.8% on comprehensive knowledge benchmark

DeepSeek Benchmarks — Strong general performance

highVerified: 2026-07-09

output consistency

Internal testing with repeated prompts at various temperature settings

Evidence

DeepSeek Documentation — Good consistency with reasoning optimization

mediumVerified: 2026-07-09

latency p50

Median latency for API requests with standard prompt sizes

Evidence

DeepSeek Performance Metrics — Typical response time ~1.9s

mediumVerified: 2026-07-09

latency p95

95th percentile response time across diverse workloads

Evidence

Community benchmarking — p95 latency ~3.4s

mediumVerified: 2026-07-09

context window

Official specification from provider

Evidence

DeepSeek Documentation — 64K token context window

highVerified: 2026-07-09

uptime

Historical uptime data from official status page

Evidence

DeepSeek Status — 98.7% uptime (last 90 days)

mediumVerified: 2026-07-09

🛡️Security

Good security posture with standard guardrails. Adequate protection for typical use cases.

prompt injection resistance

Testing against OWASP LLM01 prompt injection attacks

Evidence

DeepSeek Safety Documentation — Good resistance to prompt injection

mediumVerified: 2026-07-09

jailbreak resistance

Testing against adversarial prompt datasets

Evidence

DeepSeek Safety Research — Standard safety guardrails

mediumVerified: 2026-07-09

data leakage prevention

Analysis of privacy policies and data handling practices

Evidence

DeepSeek Privacy Policy — Standard data handling practices

mediumVerified: 2026-07-09

output safety

Comprehensive safety testing across harmful content categories

Evidence

DeepSeek Safety Evaluations — Comprehensive safety filtering

mediumVerified: 2026-07-09

api security

Review of API security features and best practices

Evidence

DeepSeek API Documentation — API key authentication, HTTPS, rate limiting

highVerified: 2026-07-09

🔒Privacy & Compliance

Moderate privacy posture. Data residency primarily in Asia. Limited compliance certifications for Western markets.

data residency

Review of documentation and privacy policies

Evidence

DeepSeek Documentation — Primary data centers in China and Singapore

mediumVerified: 2026-07-09

training data optout

Analysis of privacy policy and data usage terms

Evidence

DeepSeek Privacy Policy — No training on API data by default

highVerified: 2026-07-09

data retention

Review of terms of service and data retention policies

Evidence

DeepSeek Terms of Service — 60-day default retention

mediumVerified: 2026-07-09

pii handling

Review of data protection capabilities

Evidence

DeepSeek Privacy Documentation — Customer responsible for PII handling

mediumVerified: 2026-07-09

compliance certifications

Verification of compliance certifications

Evidence

DeepSeek Compliance — ISO 27001, limited Western certifications

mediumVerified: 2026-07-09

zero data retention

Review of data handling practices

Evidence

DeepSeek Documentation — No zero retention option

mediumVerified: 2026-07-09

👁️Trust & Transparency

Moderate transparency with standard safety features. Limited disclosure compared to Western providers.

explainability

Evaluation of reasoning transparency

Evidence

DeepSeek Features — Reasoning mode with explanation capabilities

mediumVerified: 2026-07-09

hallucination rate

Testing on factual QA datasets

Evidence

Community Testing — Moderate hallucination rate

mediumVerified: 2026-07-09

bias fairness

Evaluation on bias benchmarks

Evidence

DeepSeek Research — Basic bias mitigation

mediumVerified: 2026-07-09

uncertainty quantification

Assessment of confidence expression

Evidence

Model Behavior — Basic uncertainty expression

mediumVerified: 2026-07-09

model card quality

Review of documentation completeness

Evidence

DeepSeek Documentation — Good technical documentation

highVerified: 2026-07-09

training data transparency

Review of public disclosures

Evidence

DeepSeek Research Papers — Limited disclosure of training data

mediumVerified: 2026-07-09

guardrails

Analysis of built-in safety mechanisms

Evidence

DeepSeek Safety Features — Standard safety guardrails

mediumVerified: 2026-07-09

⚙️Operational Excellence

Good operational quality with open licensing. Growing ecosystem with room for maturity.

api design quality

Review of API design and consistency

Evidence

DeepSeek API Documentation — Clean RESTful API design

highVerified: 2026-07-09

sdk quality

Review of SDK quality and maintenance

Evidence

DeepSeek SDKs — Python SDK available, actively maintained

highVerified: 2026-07-09

versioning policy

Review of versioning practices

Evidence

DeepSeek API Documentation — Basic versioning policy

DeepSeek API Pricing and Deprecation Notice — Legacy deepseek-reasoner endpoint deprecates 2026-07-24; R1 line discontinued in favor of V3.1/V3.2/V4 hybrid thinking models

DeepSeek API Change Log — deepseek-reasoner alias currently routes to deepseek-v4-flash thinking mode (no longer serves R1); alias removed 2026-07-24 15:59 UTC, after which requests using it fail

highVerified: 2026-07-09

monitoring observability

Review of monitoring tools

Evidence

DeepSeek Platform — Basic usage dashboard

mediumVerified: 2026-07-09

support quality

Assessment of support options

Evidence

DeepSeek Support — Community support and documentation

mediumVerified: 2026-07-09

ecosystem maturity

Analysis of ecosystem maturity

Evidence

GitHub Community — Growing ecosystem, limited third-party integrations

mediumVerified: 2026-07-09

license terms

Review of licensing terms

Evidence

DeepSeek License — Open license with commercial use allowed

highVerified: 2026-07-09

Strengths

+Excellent coding performance (53.6% SWE-bench, 79.8% HumanEval)
+Competitive pricing compared to Western alternatives
+Good reasoning capabilities with efficient implementation
+Open license allowing commercial use
+Fast latency (1.9s p50) despite reasoning features
+Strong mathematical capabilities

Limitations

!Limited data residency options (primarily Asia)
!Fewer compliance certifications for Western markets
!60-day data retention (not ephemeral)
!Limited transparency on training data
!Smaller context window (64K tokens)
!Less mature ecosystem compared to Western providers
!Superseded: line discontinued in favor of DeepSeek V3.1/V3.2/V4 hybrid thinking; legacy deepseek-reasoner alias now routes to V4-Flash (not R1) and is removed 2026-07-24, leaving third-party hosts or self-hosting as the only ways to run R1

Metadata

pricing

input: $0.55 per 1M tokens (historical first-party pricing)

output: $2.19 per 1M tokens (historical first-party pricing)

notes: Historical first-party API pricing no longer applies: the deepseek-reasoner alias now serves V4-Flash at V4 pricing and is removed 2026-07-24. R1 is now served only by third-party hosts (rates vary) or self-hosted from open weights.

last verified: 2026-07-09

context window: 64000

languages

0: English

1: Chinese

2: Japanese

3: Korean

4: Spanish

5: French

6: German

modalities

0: text

api endpoint: https://api.deepseek.com/v1/chat/completions

open source: true

architecture: Transformer-based with efficient reasoning optimization

parameters: Not disclosed

Use Case Ratings

code generation

Excellent coding with 53.6% SWE-bench and 79.8% HumanEval. Strong value proposition with competitive pricing.

customer support

Adequate for customer support but not specialized. Good latency helps.

content creation

Solid content generation capabilities at competitive pricing.

data analysis

Strong analytical capabilities with good reasoning. Excellent value for price.

research assistant

Good research capabilities with reasoning optimization.

legal compliance

Limited compliance certifications for Western markets. Data residency concerns.

healthcare

Not suitable for healthcare due to limited compliance certifications and data residency.

financial analysis

Good analytical capabilities at competitive pricing.

education

Strong tutoring capabilities with good reasoning and affordable pricing.

creative writing

Adequate creative capabilities at good value.

Similar Models

Claude Sonnet 4.5

Anthropic

OpenAI o1

OpenAI

Nemotron Ultra 253B

NVIDIA

DeepSeek-V3.2

DeepSeek

DeepSeek-V4

DeepSeek