Evaluation record · qwen3-5

Qwen3.5

v20260216

Alibaba

Modelopen-sourceapache-2-0multimodalmultilingual

Strong

About This Model

Alibaba's Apache-2.0 flagship open model: Qwen3.5-397B-A17B, a 512-expert hybrid MoE (397B total / 17B active), natively multimodal, 262K context (1M on hosted Qwen3.5-Plus), 201 languages; beats Alibaba's API-only 1T Qwen3-Max with up to 19x faster long-context decode. Still its largest open-weight model as of July 2026, but smaller Apache-2.0 Qwen3.6 models (Apr 2026) surpass it on agentic coding, and the newest frontier (Qwen3.7-Max, May 2026; Qwen3.7-Plus, Jun 2026) is API-only.

Last Evaluated: July 9, 2026

Official Website

Trust Vector Analysis

Dimension Breakdown

🚀Performance & Reliability

Beats Alibaba's own 1T-parameter API-only Qwen3-Max with only 17B active parameters, with up to 19x faster decode at 256K context. Native multimodality and 201-language coverage are unmatched among open models.

task accuracy code

Vendor benchmarks corroborated by independent press coverage and community leaderboards

Evidence

Qwen3.5 Release Blog — Strong agentic coding results; flagship 397B-A17B surpasses Qwen3-Max on coding benchmarks

VentureBeat — Independent reporting confirms 397B-A17B beats the 1T-parameter, API-only Qwen3-Max

highVerified: 2026-07-09

task accuracy reasoning

Mathematical and agentic reasoning benchmarks from the model card and release blog, cross-checked against community evaluations

Evidence

Qwen3.5 Release Blog — Frontier-class math and agentic reasoning under the 'Towards Native Multimodal Agents' positioning

Hugging Face Model Card — Detailed benchmark tables across reasoning suites on the official model card

highVerified: 2026-07-09

task accuracy general

Comprehensive knowledge and multimodal benchmark review including multilingual coverage

Evidence

Qwen3.5 Release Blog — Natively multimodal (text + vision) with strong general knowledge across 201 languages

highVerified: 2026-07-09

output consistency

Repeated-prompt testing across temperature settings, supplemented by community reports

Evidence

Hugging Face Model Card — Stable hybrid-MoE routing; consistent outputs across repeated runs in community testing

mediumVerified: 2026-07-09

latency p50

Median latency on hosted endpoints and decode-throughput comparisons from independent reporting

Evidence

VentureBeat — Up to 19x faster decode than Qwen3-Max at 256K context thanks to the sparse 17B-active design

mediumVerified: 2026-07-09

latency p95

95th percentile response time across diverse workloads from independent benchmarking

Evidence

Artificial Analysis — p95 ~3.8s on hosted endpoints for standard workloads

mediumVerified: 2026-07-09

context window

Official specification from model card and Alibaba Cloud documentation

Evidence

Hugging Face Model Card — 262K native context for open weights; hosted Qwen3.5-Plus extends to 1M tokens

highVerified: 2026-07-09

uptime

Hosted-platform availability history plus redundancy across third-party hosts

Evidence

Alibaba Cloud Model Studio — Stable availability on Alibaba Cloud (incl. Singapore region) plus many third-party hosts and self-hosting

mediumVerified: 2026-07-09

🛡️Security

Solid default guardrails with notably broad multilingual safety coverage. Multimodal inputs widen the attack surface; open weights shift responsibility to deployers who fine-tune.

prompt injection resistance

Testing against OWASP LLM01 prompt injection patterns, including image-borne injection for multimodal inputs

Evidence

Community red-team evaluations — Good resistance to common injection patterns; multimodal inputs add an image-based injection surface

mediumVerified: 2026-07-09

jailbreak resistance

Adversarial prompt testing; assessment accounts for open-weight modifiability

Evidence

Qwen Safety Documentation — Safety post-training across the family; open weights mean alignment is removable downstream

mediumVerified: 2026-07-09

data leakage prevention

Analysis of hosted-platform policies plus the self-hosting option for full data isolation

Evidence

Alibaba Cloud Privacy Documentation — Standard data handling on hosted endpoints; self-hosting gives complete data control

mediumVerified: 2026-07-09

output safety

Safety testing across harmful content categories and multiple languages on default weights

Evidence

Qwen3.5 Release Blog — Multilingual safety filtering across 201 languages; refusal behavior consistent in community testing

mediumVerified: 2026-07-09

api security

Review of API security features on the first-party hosted platform

Evidence

Alibaba Cloud Model Studio — API key authentication, HTTPS, RAM-based access control, and rate limiting on Alibaba Cloud

highVerified: 2026-07-09

🔒Privacy & Compliance

Alibaba's first-party API is China-jurisdiction (Singapore region available), which concerns Western regulated buyers; Apache-2.0 self-hosting or Western third-party hosting fully avoids that. Alibaba Cloud's infrastructure certifications are stronger than DeepSeek's platform but still lack HIPAA/FedRAMP for the model service.

data residency

Review of hosting regions and licensing; China-jurisdiction caveat applies to Alibaba's first-party API, not self-hosted or Western-hosted deployments

Evidence

Alibaba Cloud Regions — First-party hosting on Alibaba Cloud is China-jurisdiction (with a Singapore international region); Apache-2.0 weights allow deployment in any jurisdiction

highVerified: 2026-07-09

training data optout

Analysis of hosted-platform data usage terms

Evidence

Alibaba Cloud Model Studio Terms — Enterprise tier does not train on customer data; self-hosting removes the concern entirely

mediumVerified: 2026-07-09

data retention

Review of hosted-platform retention policies; retention is deployment-dependent for open-weight models

Evidence

Alibaba Cloud Trust Center — Hosted retention follows Alibaba Cloud regional policies; self-hosted deployments retain nothing externally

mediumVerified: 2026-07-09

pii handling

Review of data protection capabilities and customer responsibilities

Evidence

Alibaba Cloud Documentation — Customer responsible for PII redaction; Alibaba Cloud provides surrounding data-governance tooling

mediumVerified: 2026-07-09

compliance certifications

Verification of infrastructure certifications versus model-service-level compliance for Western regulated markets

Evidence

Alibaba Cloud Trust Center — Alibaba Cloud holds ISO 27001/SOC reports for its infrastructure, but no HIPAA/FedRAMP path for the model service; Western-host deployments inherit those hosts' certifications

mediumVerified: 2026-07-09

zero data retention

Review of data handling across first-party API, third-party hosts, and self-hosting

Evidence

Open-weight deployment options — No zero-retention guarantee on first-party hosting; self-hosting provides true zero external retention

mediumVerified: 2026-07-09

👁️Trust & Transparency

Strong open documentation and inspectable reasoning. Typical open-model gaps remain: limited training-data detail and topic-avoidance on politically sensitive subjects in default weights.

explainability

Evaluation of reasoning transparency and trace accessibility

Evidence

Qwen3.5 Release Blog — Hybrid thinking modes expose reasoning traces; fully inspectable when self-hosted

mediumVerified: 2026-07-09

hallucination rate

Testing on factual QA and multimodal grounding datasets

Evidence

Community factuality testing — Moderate hallucination rate, improved over Qwen3; grounding quality on vision inputs is strong

mediumVerified: 2026-07-09

bias fairness

Evaluation on bias benchmarks across languages and politically sensitive topic probes

Evidence

Independent bias evaluations — Broad multilingual fairness work; topic-avoidance on China-politically-sensitive subjects persists in default weights

mediumVerified: 2026-07-09

uncertainty quantification

Qualitative assessment of confidence expression in outputs

Evidence

Model behavior assessment — Expresses uncertainty in thinking mode; final-answer calibration is adequate but not exceptional

mediumVerified: 2026-07-09

model card quality

Review of model card and technical documentation completeness

Evidence

Hugging Face Model Card — Thorough model card with architecture details (512-expert hybrid MoE), benchmarks, usage guidance, and deployment recipes

highVerified: 2026-07-09

training data transparency

Review of public disclosures about training data

Evidence

Qwen3.5 Release Blog — Training methodology and multilingual/multimodal data strategy described at a high level; detailed composition not disclosed

mediumVerified: 2026-07-09

guardrails

Analysis of built-in safety mechanisms in default weights

Evidence

Qwen Safety Documentation — Multilingual safety alignment in released weights; removable by downstream fine-tuning

mediumVerified: 2026-07-09

⚙️Operational Excellence

Best-in-class open-model ecosystem: Apache 2.0 with patent grant, day-one inference-framework support, and a complete size ladder (0.8B to 397B-A17B) for matching capability to hardware. Supersedes the Qwen3 family and Qwen2.5-VL. Since April-June 2026 the family has continued with the open Qwen3.6 models (Apache 2.0) and the proprietary API-only Qwen3.7-Max/Plus.

api design quality

Review of API design, consistency, and feature completeness

Evidence

Alibaba Cloud Model Studio — OpenAI-compatible API with function calling, multimodal inputs, and hybrid thinking-mode controls

highVerified: 2026-07-09

sdk quality

Review of SDK and inference-framework support

Evidence

QwenLM GitHub — Day-one support in vLLM, SGLang, and transformers; actively maintained official repos

highVerified: 2026-07-09

versioning policy

Review of release cadence and weight-availability guarantees

Evidence

Qwen Release History — Fast release cadence (supersedes Qwen3 family and Qwen2.5-VL); open weights remain permanently available, softening deprecation impact

MarkTechPost - Qwen3.6-27B release — Qwen3.6 open models shipped Apr 2026 (35B-A3B on 04-16, 27B dense on 04-22, Apache 2.0, 256K context): the 27B dense outperforms the 397B-A17B Qwen3.5 flagship on agentic coding benchmarks

innFactory / EQS News - Qwen3.7 releases — Family frontier moved to proprietary API-only models: Qwen3.7-Max (2026-05-20, Apsara Summit) and Qwen3.7-Plus (2026-06-01), 1M context, agentic positioning; Alibaba keeps its newest flagship closed-weight

mediumVerified: 2026-07-09

monitoring observability

Review of monitoring tools across deployment options

Evidence

Alibaba Cloud Model Studio — Usage dashboards and logging on Alibaba Cloud; full observability when self-hosting

mediumVerified: 2026-07-09

support quality

Assessment of support tiers, documentation, and community responsiveness

Evidence

Alibaba Cloud Support — Alibaba Cloud offers paid enterprise support tiers; Western-market support depth lags US hyperscalers; strong community channels

mediumVerified: 2026-07-09

ecosystem maturity

Analysis of derivative models, third-party hosting, and tooling integrations

Evidence

Hugging Face Qwen Organization — Largest open-model ecosystem by derivative count; full size ladder from 397B-A17B and 122B-A10B down to 0.8B released Feb-Mar 2026

highVerified: 2026-07-09

license terms

Review of licensing terms and restrictions

Evidence

Hugging Face Model Card — Apache 2.0 across the family: unrestricted commercial use with explicit patent grant

highVerified: 2026-07-09

Strengths

+Beats Alibaba's own 1T-parameter API-only Qwen3-Max with just 17B active parameters (397B total)
+Natively multimodal open model: text + vision under 'Towards Native Multimodal Agents'
+Up to 19x faster decode than Qwen3-Max at 256K context; 262K native context (1M on hosted Qwen3.5-Plus)
+201-language coverage, the broadest of any open model
+Apache 2.0 license with patent grant across the entire family
+Complete size ladder (0.8B to 397B-A17B, Feb-Mar 2026) for matching capability to hardware
+Day-one vLLM/SGLang/transformers support and the largest open-model derivative ecosystem

Limitations

!First-party Alibaba Cloud hosting is China-jurisdiction (Singapore region available); no HIPAA/FedRAMP path for the model service — self-hosting or Western hosts avoid this
!No longer the family's coding leader: the much smaller Apache-2.0 Qwen3.6-27B (Apr 2026) outperforms it on agentic coding benchmarks, and the newest family frontier (Qwen3.7-Max/Plus) is API-only
!1M context requires the hosted Qwen3.5-Plus; open weights cap at 262K
!Topic-avoidance on politically sensitive subjects in default weights
!Training-data composition disclosed only at a high level
!397B total parameters still require multi-GPU infrastructure to self-host despite the sparse 17B-active design
!Western-market enterprise support depth lags US hyperscalers

Metadata

pricing

input: Free weights (Apache 2.0); hosted from ~$0.40 per 1M tokens on Alibaba Cloud Model Studio

output: Hosted from ~$1.20 per 1M tokens; third-party hosts vary

notes: Self-hosting is infrastructure-cost-only; the 17B-active design keeps serving costs low for its capability class. Hosted Qwen3.5-Plus (1M context) priced separately.

last verified: 2026-07-09

context window: 262144

max output: 65536

languages

0: English

1: Chinese

2: Japanese

3: Korean

4: Spanish

5: French

6: German

7: Portuguese

8: Russian

9: Arabic

10: Hindi

11: Indonesian

12: Vietnamese

13: Thai

14: and 187 more (201 total)

modalities

0: text

1: image (input)

2: document

api endpoint: https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions

open source: true

architecture: Hybrid Mixture-of-Experts with 512 experts (397B total / 17B active), natively multimodal, hybrid thinking modes

parameters: 397B total / 17B active (flagship); family spans 0.8B to 397B-A17B

knowledge cutoff: Late 2025

Use Case Ratings

code generation

Strong agentic coding that beats the 1T-parameter Qwen3-Max; 17B active params make self-hosted coding assistants economical.

customer support

201-language coverage and fast decode make it a standout for global multilingual support; smaller variants serve high-volume tiers cheaply.

content creation

Strong multilingual content with native image understanding for visually grounded writing.

data analysis

Native multimodality handles charts, tables, and documents directly; 262K context (1M on Plus) covers large datasets.

research assistant

Multimodal document understanding plus long context suits literature and mixed-media research; 19x decode speedup keeps long-context work responsive.

legal compliance

First-party hosting is China-jurisdiction; viable for regulated legal work only via self-hosting or certified Western hosts.

healthcare

No HIPAA path on first-party hosting; self-hosted deployment in compliant infrastructure is the only viable route.

financial analysis

Good quantitative reasoning with native chart/table understanding; data-residency planning required for regulated workloads.

education

201 languages, multimodal input, and a size ladder down to 0.8B make it exceptional for global and on-device education deployments.

creative writing

Capable multilingual creative output with visual grounding; prose distinctiveness behind dedicated creative leaders.

Similar Models

DeepSeek-V4

DeepSeek

DeepSeek-V3.2

DeepSeek

Kimi K2.6

Moonshot AI

GLM-5

Z.ai (Zhipu AI)

Claude Opus 4.5

Anthropic