Evaluation record · glm-5

GLM-5

v20260211

Z.ai (Zhipu AI)

Modelcodingreasoningopen-sourcemit-license

Strong

About This Model

Z.ai's MIT-licensed 744B-parameter MoE (40B active) with 77.8% SWE-bench Verified, 92.7% AIME 2026, and open-source leadership on BrowseComp and agentic benchmarks. Trained on 28.5T tokens with DeepSeek Sparse Attention.

Last Evaluated: July 9, 2026

Official Website

Trust Vector Analysis

Dimension Breakdown

🚀Performance & Reliability

Among the strongest open-weight models released to date: 77.8% SWE-bench Verified, 92.7% AIME 2026, 86.0% GPQA-Diamond, with independent confirmation of open-source leadership on BrowseComp, Vending Bench 2, and MCP-Atlas.

task accuracy code

Industry-standard coding benchmarks with independent third-party verification

Evidence

SWE-bench Verified — 77.8% resolution rate

Artificial Analysis independent evaluation — Top-tier open-weight coding performance, confirmed by independent harness

highVerified: 2026-07-09

task accuracy reasoning

Graduate and competition-level reasoning benchmarks requiring multi-step problem solving

Evidence

AIME 2026 — 92.7% on competition mathematics

GPQA-Diamond — 86.0% on PhD-level science questions

highVerified: 2026-07-09

task accuracy general

Independent agentic and general-capability benchmarking across domains

Evidence

Artificial Analysis — Open-source leader on BrowseComp, Vending Bench 2, and MCP-Atlas agentic benchmarks

highVerified: 2026-07-09

output consistency

Community testing with repeated prompts and long agent runs

Evidence

GLM-5 GitHub repository — Stable behavior across agentic trajectories; reproducible deployment recipes published

mediumVerified: 2026-07-09

latency p50

Median latency for API requests with standard prompt sizes

Evidence

Artificial Analysis — Typical first-party API response ~2.6s; DeepSeek Sparse Attention keeps long-context latency manageable

mediumVerified: 2026-07-09

latency p95

95th percentile response time across diverse workloads

Evidence

Artificial Analysis — p95 ~6.0s across diverse workloads

mediumVerified: 2026-07-09

context window

Official specification from model card

Evidence

GLM-5 Model Card — 200K token context window

highVerified: 2026-07-09

uptime

Review of platform availability and self-hosting fallback options

Evidence

Z.ai Platform — First-party API generally stable; open weights enable self-hosted redundancy

mediumVerified: 2026-07-09

🛡️Security

Solid for an open model; no published third-party security audit. Self-hosting shifts responsibility to the deployer.

prompt injection resistance

Review of safety documentation and community testing against OWASP LLM01 patterns

Evidence

GLM-5 documentation — Safety tuning applied; resilient in agentic browsing benchmarks but no dedicated third-party injection audit published

mediumVerified: 2026-07-09

jailbreak resistance

Testing against adversarial prompt datasets; deployer-dependent for self-hosted use

Evidence

Community red-teaming — Standard alignment; open weights allow guardrail removal in derivatives

mediumVerified: 2026-07-09

data leakage prevention

Analysis of privacy policies and self-hosting data-control options

Evidence

Z.ai Privacy Policy — Standard data handling on first-party API; full control when self-hosted

mediumVerified: 2026-07-09

output safety

Safety testing across harmful content categories

Evidence

GLM-5 Model Card — Safety post-training; refusal behavior comparable to peer open frontier models

mediumVerified: 2026-07-09

api security

Review of API security features and best practices

Evidence

Z.ai API Documentation — API key authentication, HTTPS only, rate limiting; OpenAI-compatible endpoints

mediumVerified: 2026-07-09

🔒Privacy & Compliance

First-party Z.ai API operates under Chinese jurisdiction — a material caveat for Western regulated industries. The unencumbered MIT license makes self-hosting or Western-host deployment a clean mitigation.

data residency

Review of provider jurisdiction and third-party hosting options

Evidence

Z.ai Platform Documentation — Zhipu/Z.ai is a China-based provider; first-party API data processed under Chinese jurisdiction

OpenRouter availability — MIT weights hosted by Western inference providers, enabling non-China residency

mediumVerified: 2026-07-09

training data optout

Analysis of privacy policy and data usage terms

Evidence

Z.ai Privacy Policy — Standard API data terms; self-hosting removes the concern entirely

mediumVerified: 2026-07-09

data retention

Review of terms of service and deployment-dependent retention

Evidence

Z.ai Terms of Service — First-party retention governed by Chinese data regulations; self-hosted deployments retain nothing externally

mediumVerified: 2026-07-09

pii handling

Review of data protection capabilities and customer responsibilities

Evidence

Z.ai Documentation — Customer responsible for PII redaction; no managed PII tooling

mediumVerified: 2026-07-09

compliance certifications

Verification of compliance certifications and audit reports

Evidence

Z.ai public materials — No published SOC 2 / HIPAA / GDPR attestations for the first-party API; Western hosts may carry their own certifications

mediumVerified: 2026-07-09

zero data retention

Review of self-hosting deployment options enabling zero retention

Evidence

Open weights on Hugging Face — MIT-licensed self-hosting gives complete data control and zero external retention

mediumVerified: 2026-07-09

👁️Trust & Transparency

Above-average transparency for an open frontier model: architecture, training scale, and benchmarks well documented with independent verification. Bias and safety evaluations remain thin.

explainability

Evaluation of reasoning transparency and trajectory inspectability

Evidence

GLM-5 documentation — Reasoning traces and agentic tool-call logs inspectable; strong MCP-Atlas results reflect transparent tool use

mediumVerified: 2026-07-09

hallucination rate

Testing on factual QA and grounded research benchmarks

Evidence

BrowseComp results — Open-source leader on grounded web-research benchmark, indicating disciplined sourcing behavior

mediumVerified: 2026-07-09

bias fairness

Review of published bias benchmarks and community evaluations

Evidence

GLM-5 Model Card — Limited published bias evaluation

lowVerified: 2026-07-09

uncertainty quantification

Qualitative assessment of confidence expression in outputs

Evidence

Model behavior testing — Expresses uncertainty adequately; no calibrated confidence outputs

mediumVerified: 2026-07-09

model card quality

Review of documentation completeness and clarity

Evidence

Hugging Face model card and GitHub repo — Detailed disclosure: 744B/40B MoE, 256 experts, DeepSeek Sparse Attention, 28.5T pretraining tokens, full benchmark tables

highVerified: 2026-07-09

training data transparency

Review of public disclosures about training data

Evidence

GLM-5 technical disclosure — Pretraining scale (28.5T tokens) and recipe outlined; detailed data sources not disclosed

mediumVerified: 2026-07-09

guardrails

Analysis of built-in safety mechanisms

Evidence

GLM-5 Model Card — Built-in safety tuning; deployers of open weights must layer their own guardrails

mediumVerified: 2026-07-09

⚙️Operational Excellence

Clean MIT licensing and strong ecosystem support. Note the rapid successor cadence: GLM-5.1 shipped within two months and GLM-5.2 (1M context, MIT) within four; evaluate which version your hosts actually serve. GLM-5 weights remain available.

api design quality

Review of API design, consistency, and feature completeness

Evidence

Z.ai API Documentation — OpenAI-compatible API with streaming, tool calling, structured output

highVerified: 2026-07-09

sdk quality

Review of SDK quality, documentation, and maintenance

Evidence

Z.ai GitHub organization — Official repos with deployment recipes; OpenAI-compatible so mainstream SDKs work

mediumVerified: 2026-07-09

versioning policy

Review of versioning practices and weight availability across releases

Evidence

GLM release history — Fast cadence: GLM-5.1 API launched 2026-03-27 with weights on 2026-04-07, same architecture; GLM-5 weights remain available

Z.ai release notes / GLM-5.2 coverage — GLM-5.2 released mid-June 2026: MIT-licensed 744B/40B MoE with a 1M-token context window, priced at $1.40/$4.40 per 1M; GLM-5 remains served and its weights remain available

mediumVerified: 2026-07-09

monitoring observability

Review of available monitoring tools and metrics

Evidence

Z.ai Platform — Basic usage dashboard; self-hosted observability is deployer-built

mediumVerified: 2026-07-09

support quality

Assessment of documentation, community, and support responsiveness

Evidence

Z.ai community channels — Active GitHub support and documentation; limited English-language enterprise support

mediumVerified: 2026-07-09

ecosystem maturity

Analysis of third-party hosting, integrations, and tooling

Evidence

Inference ecosystem — Day-one vLLM/SGLang support, OpenRouter and major Western host availability

highVerified: 2026-07-09

license terms

Review of licensing terms and restrictions

Evidence

MIT License — Unencumbered MIT license, unrestricted commercial use and derivatives

highVerified: 2026-07-09

Strengths

+Frontier-competitive results: 77.8% SWE-bench Verified, 92.7% AIME 2026, 86.0% GPQA-Diamond
+Independently verified open-source leadership on BrowseComp, Vending Bench 2, and MCP-Atlas
+Unencumbered MIT license with full self-hosting rights
+Efficient inference: 40B active of 744B total with DeepSeek Sparse Attention
+Very competitive API pricing (~$0.60/$1.92 per 1M tokens)
+Well-documented training scale (28.5T tokens) and architecture

Limitations

!First-party Z.ai API processes data under Chinese jurisdiction with limited Western compliance certifications
!Text-only — no vision or audio modalities
!Rapid successor cadence (GLM-5.1 within two months, GLM-5.2 with 1M context by June 2026) creates version-tracking overhead
!Limited published bias, safety, and red-team evaluations
!Self-hosting a 744B MoE requires substantial GPU infrastructure
!English-language enterprise support is thin compared to Western providers

Metadata

pricing

input: $0.60 per 1M tokens

output: $1.92 per 1M tokens

notes: First-party Z.ai API pricing, re-confirmed July 2026; third-party hosts vary. Successors are pricier: GLM-5.1 ~$0.97/$3.04 and GLM-5.2 $1.40/$4.40 per 1M, leaving GLM-5 as the budget option in the family.

last verified: 2026-07-09

context window: 200000

languages

0: English

1: Chinese

2: Japanese

3: Korean

4: Spanish

5: French

6: German

modalities

0: text

api endpoint: https://api.z.ai/api/paas/v4/chat/completions

open source: true

license: MIT

architecture: Mixture-of-Experts: 744B total / 40B active parameters, 256 experts, DeepSeek Sparse Attention; 28.5T pretraining tokens

parameters: 744B total / 40B active

release date: 2026-02-11

Use Case Ratings

code generation

77.8% SWE-bench Verified with independent verification; best-in-class open-weight coding at ~$0.60/$1.92 per 1M.

customer support

Capable and inexpensive, but not specialized for support workflows.

content creation

Strong long-form generation at very low cost.

data analysis

Excellent mathematical reasoning (92.7% AIME 2026) and agentic tool use for analysis pipelines.

research assistant

Open-source leader on BrowseComp and MCP-Atlas; strong grounded web research.

legal compliance

China-jurisdiction first-party API and absent Western certifications are blockers unless self-hosted.

healthcare

Not recommended via first-party API; self-hosted deployment in a compliant environment is the only viable path.

financial analysis

Top-tier quantitative reasoning; regulated firms should self-host or use certified Western hosts.

education

Outstanding math and science tutoring (92.7% AIME, 86.0% GPQA-Diamond) at budget pricing.

creative writing

Competent prose; optimized for reasoning and agentic work rather than creative style.

Similar Models

Kimi K2.6

Moonshot AI

DeepSeek-V4

DeepSeek

DeepSeek-V3.2

DeepSeek

MiniMax-M2

MiniMax

Claude Opus 4.8

Anthropic