Kimi K2.6

v20260420

Moonshot AI

Modelcodingagenticopen-sourcemixture-of-experts
81
Strong
About This Model

Moonshot AI's open-weight 1T-parameter MoE (32B active) with vendor-reported 80.2% SWE-Bench Verified and 58.6 SWE-Bench Pro. Agent Swarm orchestration scales to 300 sub-agents and 4,000 coordinated steps for long-horizon coding.

Last Evaluated: June 10, 2026
Official Website

Trust Vector Analysis

Dimension Breakdown

🚀Performance & Reliability
+

Vendor-reported open-weight leadership on agentic coding (80.2% SWE-Bench Verified, 58.6 SWE-Bench Pro). Agent Swarm scales to 300 sub-agents / 4,000 coordinated steps. Most headline scores are vendor-reported and await independent replication.

task accuracy code

Vendor-reported industry-standard coding benchmarks; scores pending broad independent replication

Evidence
Kimi K2.6 Model Card (vendor-reported)SWE-Bench Verified 80.2%, SWE-Bench Pro 58.6 (vs GPT-5.4's 57.7)
LiveCodeBench v6 (vendor-reported)89.6% on competitive programming tasks
MarkTechPost release coverageLong-horizon coding focus; claims open-weight state of the art on agentic coding
mediumVerified: 2026-06-10
task accuracy reasoning

Vendor-reported tool-augmented reasoning benchmarks requiring multi-step problem solving

Evidence
Humanity's Last Exam with tools (vendor-reported)54.0 on HLE-with-tools, frontier-competitive
mediumVerified: 2026-06-10
task accuracy general

Review of vendor benchmark suite and community evaluations across knowledge domains

Evidence
Kimi K2.6 Model CardStrong general performance across knowledge benchmarks; text and vision modalities
mediumVerified: 2026-06-10
output consistency

Community testing of repeated runs and long-horizon agent trajectories

Evidence
Community evaluationConsistent agentic behavior over long trajectories; native INT4 quantization preserves quality
mediumVerified: 2026-06-10
latency p50

Median latency for API requests with standard prompt sizes; self-hosted latency depends on hardware

Evidence
Community benchmarkingTypical first-response time ~3s on first-party API; varies widely by host
lowVerified: 2026-06-10
latency p95

95th percentile response time across diverse workloads

Evidence
Community benchmarkingp95 ~7.5s; long agentic chains take substantially longer by design
lowVerified: 2026-06-10
context window

Official specification from model card

Evidence
Kimi K2.6 Model Card262,144 token context window
highVerified: 2026-06-10
uptime

Review of platform availability and self-hosting fallback options

Evidence
Moonshot AI PlatformFirst-party API generally stable; open weights allow self-hosted redundancy
mediumVerified: 2026-06-10
🛡️Security
+

Standard open-model security posture. No published third-party security audit; self-hosting shifts security responsibility to the deployer.

prompt injection resistance

Review of vendor safety documentation and community red-team reports against OWASP LLM01 patterns

Evidence
Kimi K2.6 Model CardSafety tuning described; no published third-party prompt-injection audit
lowVerified: 2026-06-10
jailbreak resistance

Testing against adversarial prompt datasets; open-weight deployments inherit deployer responsibility

Evidence
Community red-teamingStandard alignment tuning; open weights mean guardrails can be removed in fine-tuned derivatives
mediumVerified: 2026-06-10
data leakage prevention

Analysis of privacy policies and self-hosting data-control options

Evidence
Moonshot AI Privacy PolicyStandard data handling on first-party API; full control when self-hosted
mediumVerified: 2026-06-10
output safety

Safety testing across harmful content categories per vendor card and community reports

Evidence
Kimi K2.6 Model CardSafety post-training applied; refusal behavior comparable to other open frontier models
mediumVerified: 2026-06-10
api security

Review of API security features and best practices

Evidence
Moonshot AI API DocumentationAPI key authentication, HTTPS only, rate limiting; OpenAI-compatible endpoints
mediumVerified: 2026-06-10
🔒Privacy & Compliance
+

First-party API operates under Chinese jurisdiction — a material caveat for Western regulated industries. Open weights fully mitigate this for organizations able to self-host or use Western inference providers.

data residency

Review of provider jurisdiction and third-party hosting options

Evidence
Moonshot AI Platform DocumentationMoonshot AI is a China-based provider; first-party API data processed under Chinese jurisdiction
OpenRouter availabilityAvailable via OpenRouter and Western inference hosts, enabling non-China residency
mediumVerified: 2026-06-10
training data optout

Analysis of privacy policy and data usage terms

Evidence
Moonshot AI Privacy PolicyAPI data usage terms standard for the segment; self-hosting removes the question entirely
mediumVerified: 2026-06-10
data retention

Review of terms of service and deployment-dependent retention

Evidence
Moonshot AI TermsFirst-party retention governed by Chinese data regulations; self-hosted deployments retain nothing externally
mediumVerified: 2026-06-10
pii handling

Review of data protection capabilities and customer responsibilities

Evidence
Moonshot AI DocumentationCustomer responsible for PII redaction; no managed PII tooling
mediumVerified: 2026-06-10
compliance certifications

Verification of compliance certifications and audit reports

Evidence
Moonshot AI public materialsNo published SOC 2 / HIPAA / GDPR attestations for the first-party API; Western hosts may carry their own certifications
mediumVerified: 2026-06-10
zero data retention

Review of self-hosting deployment options enabling zero retention

Evidence
Open weights on Hugging FaceSelf-hosting (vLLM/SGLang, native INT4) gives complete data control and zero external retention
mediumVerified: 2026-06-10
👁️Trust & Transparency
+

Open weights and a detailed model card provide good architectural transparency; training data disclosure and independent benchmark verification remain limited.

explainability

Evaluation of reasoning and agent-trajectory transparency

Evidence
Agent Swarm architectureSub-agent trajectories and tool-call traces are inspectable, aiding auditability of long-horizon runs
mediumVerified: 2026-06-10
hallucination rate

Testing on factual QA datasets and tool-augmented workflows

Evidence
Community testingModerate hallucination rate; tool-use grounding improves factuality in agentic mode
mediumVerified: 2026-06-10
bias fairness

Review of published bias benchmarks and community evaluations

Evidence
Kimi K2.6 Model CardLimited published bias evaluation
lowVerified: 2026-06-10
uncertainty quantification

Qualitative assessment of confidence expression in outputs

Evidence
Model behavior testingExpresses uncertainty adequately; no calibrated confidence outputs
mediumVerified: 2026-06-10
model card quality

Review of documentation completeness and clarity

Evidence
Hugging Face model cardDetailed card: 1T total / 32B active MoE, 384 experts, MLA attention, native INT4, benchmarks, deployment guides
highVerified: 2026-06-10
training data transparency

Review of public disclosures about training data

Evidence
Moonshot AI publicationsArchitecture well documented; training data composition not disclosed in detail
mediumVerified: 2026-06-10
guardrails

Analysis of built-in safety mechanisms

Evidence
Kimi K2.6 Model CardBuilt-in safety tuning; deployers of open weights must layer their own guardrails
mediumVerified: 2026-06-10
⚙️Operational Excellence
+

Strong open-model ecosystem presence. Modified MIT license is permissive for most users but the attribution clause above 100M MAU / $20M monthly revenue requires legal review at hyperscale.

api design quality

Review of API design, consistency, and feature completeness

Evidence
Moonshot AI API DocumentationOpenAI-compatible API with streaming, tool calling, vision; Agent Swarm orchestration endpoints
highVerified: 2026-06-10
sdk quality

Review of SDK quality, documentation, and maintenance

Evidence
Moonshot AI GitHubOpenAI-compatible so mainstream SDKs work; first-party tooling thinner than Western providers
mediumVerified: 2026-06-10
versioning policy

Review of versioning practices and weight availability

Evidence
Kimi release historyK2.6 supersedes K2.5/K2; prior weights remain available, but cadence is fast
mediumVerified: 2026-06-10
monitoring observability

Review of available monitoring tools and metrics

Evidence
Moonshot AI PlatformBasic usage dashboard; self-hosted observability is deployer-built
mediumVerified: 2026-06-10
support quality

Assessment of documentation, community, and support responsiveness

Evidence
Moonshot AI community channelsGitHub and community support; limited English-language enterprise support
mediumVerified: 2026-06-10
ecosystem maturity

Analysis of third-party hosting, integrations, and tooling

Evidence
OpenRouter and inference ecosystemAvailable on OpenRouter and major open-model hosts; vLLM/SGLang support with native INT4
highVerified: 2026-06-10
license terms

Review of licensing terms and restrictions; attribution clause is trust-relevant for large-scale commercial use

Evidence
Modified MIT LicenseMIT with an attribution-UI requirement for deployments exceeding 100M MAU or $20M/month revenue
highVerified: 2026-06-10
Strengths
  • +Vendor-reported open-weight leadership in agentic coding (80.2% SWE-Bench Verified, 58.6 SWE-Bench Pro vs GPT-5.4's 57.7)
  • +Agent Swarm scales to 300 sub-agents and 4,000 coordinated steps for long-horizon tasks
  • +Open weights with near-MIT license enable full self-hosting and data control
  • +Efficient inference: 32B active of 1T total, MLA attention, native INT4 quantization
  • +262,144-token context with text and vision modalities
  • +Competitive API pricing (~$0.95/$4.00 per 1M tokens) and broad availability via OpenRouter
Limitations
  • !First-party Moonshot API processes data under Chinese jurisdiction with limited Western compliance certifications
  • !Headline benchmarks are vendor-reported and await independent replication
  • !Modified MIT license imposes attribution-UI requirement above 100M MAU or $20M/month revenue
  • !Self-hosting a 1T-parameter MoE requires substantial GPU infrastructure even at INT4
  • !Limited published bias, safety, and red-team evaluations
  • !English-language enterprise support is thin compared to Western providers
Metadata
pricing
input: $0.95 per 1M tokens (approx.)
output: $4.00 per 1M tokens (approx.)
notes: First-party Moonshot API pricing; third-party hosts on OpenRouter vary. Self-hosting cost is infrastructure-dependent.
last verified: 2026-06-10
context window: 262144
languages
0: English
1: Chinese
2: Japanese
3: Korean
4: Spanish
5: French
6: German
modalities
0: text
1: image (input)
api endpoint: https://api.moonshot.ai/v1/chat/completions
open source: true
license: Modified MIT (attribution-UI requirement above 100M MAU or $20M/month revenue)
architecture: Mixture-of-Experts: 1T total / 32B active parameters, 384 experts, Multi-head Latent Attention (MLA), native INT4
parameters: 1T total / 32B active
release date: 2026-04-20

Use Case Ratings

code generation

Vendor-reported 80.2% SWE-Bench Verified and 58.6 SWE-Bench Pro; Agent Swarm excels at long-horizon multi-file engineering.

customer support

Capable but not specialized; agentic latency unnecessary for simple support flows.

content creation

Solid long-form generation with large context; not its differentiator.

data analysis

Strong tool-augmented analysis; Agent Swarm parallelizes multi-source investigation well.

research assistant

54.0 HLE-with-tools and 262K context make it strong for deep, tool-driven research.

legal compliance

China-jurisdiction first-party API and absent Western certifications are blockers unless self-hosted.

healthcare

Not recommended via first-party API; self-hosted deployment in a compliant environment is the only viable path.

financial analysis

Strong quantitative and agentic capability; data residency requires self-hosting for regulated firms.

education

Strong STEM and coding tutoring at competitive pricing.

creative writing

Competent creative output; optimized for agentic engineering rather than prose.