Evaluation record · minimax-m2

MiniMax-M2

v20251027

MiniMax

Modelagentictool-callingopen-sourcemit-license

Strong

About This Model

MiniMax's MIT-licensed 230B MoE with only 10B active parameters, optimized for agentic tool calling and coding. Topped open-model agentic rankings at launch and undercut Claude pricing by roughly 92% while remaining fast due to its small active footprint.

Last Evaluated: July 9, 2026

Official Website

Trust Vector Analysis

Dimension Breakdown

🚀Performance & Reliability

Was the leading open agentic model at its October 2025 launch; still strong, but 2026 releases (GLM-5, Kimi K2.6) have surpassed it on raw benchmarks. Its 10B-active design remains a standout for speed and serving cost. Successor MiniMax-M3 (428B total / 23B active, 1M context, native multimodality) launched 2026-06-01 with open weights published on Hugging Face by 2026-06-07.

task accuracy code

Vendor benchmarks corroborated by independent press and leaderboard coverage; superseded at the top by 2026 releases

Evidence

MiniMax-M2 launch announcement — Strong SWE-bench and Terminal-Bench results for an open model at launch

VentureBeat launch coverage — Ranked the leading open-source model for agentic coding workflows at launch

highVerified: 2026-07-09

task accuracy reasoning

Vendor-reported reasoning benchmarks and community evaluation

Evidence

MiniMax-M2 Model Card — Competitive reasoning for its 10B-active footprint; interleaved thinking format

mediumVerified: 2026-07-09

task accuracy general

Independent composite benchmarking across knowledge domains

Evidence

Artificial Analysis — Highest composite intelligence score among open-weight models at launch window

mediumVerified: 2026-07-09

output consistency

Community testing of repeated runs and agentic trajectories

Evidence

Community evaluation — Stable tool-calling behavior across long agent loops

mediumVerified: 2026-07-09

latency p50

Median latency for API requests with standard prompt sizes

Evidence

Artificial Analysis — Fast responses — 10B active parameters yield roughly 2x the speed of comparable dense models

mediumVerified: 2026-07-09

latency p95

95th percentile response time across diverse workloads

Evidence

Community benchmarking — p95 ~4.0s across diverse workloads

mediumVerified: 2026-07-09

context window

Official specification from model card

Evidence

MiniMax-M2 Model Card — ~205K token context window

highVerified: 2026-07-09

uptime

Review of platform availability and self-hosting fallback options

Evidence

MiniMax Platform — First-party API generally stable; open weights enable self-hosted redundancy

mediumVerified: 2026-07-09

🛡️Security

Standard open-model posture without third-party audits. Self-hosting shifts security responsibility to the deployer.

prompt injection resistance

Review of vendor documentation and community testing against OWASP LLM01 patterns

Evidence

MiniMax-M2 Model Card — Safety tuning described; no published third-party prompt-injection audit

lowVerified: 2026-07-09

jailbreak resistance

Testing against adversarial prompt datasets; deployer-dependent for self-hosted use

Evidence

Community red-teaming — Standard alignment tuning; open weights allow guardrail removal in derivatives

mediumVerified: 2026-07-09

data leakage prevention

Analysis of privacy policies and self-hosting data-control options

Evidence

MiniMax Privacy Policy — Standard data handling on first-party API; full control when self-hosted

mediumVerified: 2026-07-09

output safety

Safety testing across harmful content categories

Evidence

MiniMax-M2 Model Card — Safety post-training applied; refusal behavior in line with peer open models

mediumVerified: 2026-07-09

api security

Review of API security features and best practices

Evidence

MiniMax API Documentation — API key authentication, HTTPS only, rate limiting; OpenAI- and Anthropic-compatible endpoints

mediumVerified: 2026-07-09

🔒Privacy & Compliance

First-party MiniMax API operates under Chinese jurisdiction — a material caveat for Western regulated industries. The small 10B-active footprint makes self-hosted mitigation cheaper than for other frontier-scale open models.

data residency

Review of provider jurisdiction and third-party hosting options

Evidence

MiniMax Platform Documentation — MiniMax is a China-based provider; first-party API data processed under Chinese jurisdiction

OpenRouter availability — MIT weights served by Western inference providers, enabling non-China residency

mediumVerified: 2026-07-09

training data optout

Analysis of privacy policy and data usage terms

Evidence

MiniMax Privacy Policy — Standard API data terms; self-hosting removes the concern entirely

mediumVerified: 2026-07-09

data retention

Review of terms of service and deployment-dependent retention

Evidence

MiniMax Terms of Service — First-party retention governed by Chinese data regulations; self-hosted deployments retain nothing externally

mediumVerified: 2026-07-09

pii handling

Review of data protection capabilities and customer responsibilities

Evidence

MiniMax Documentation — Customer responsible for PII redaction; no managed PII tooling

mediumVerified: 2026-07-09

compliance certifications

Verification of compliance certifications and audit reports

Evidence

MiniMax public materials — No published SOC 2 / HIPAA / GDPR attestations for the first-party API

mediumVerified: 2026-07-09

zero data retention

Review of self-hosting deployment options enabling zero retention

Evidence

Open weights on Hugging Face — MIT-licensed self-hosting gives complete data control and zero external retention; 10B active makes this unusually affordable

mediumVerified: 2026-07-09

👁️Trust & Transparency

Open weights and interleaved-thinking traces provide reasonable transparency; training data disclosure and formal bias/safety evaluations are limited.

explainability

Evaluation of reasoning transparency and trajectory inspectability

Evidence

MiniMax-M2 documentation — Interleaved thinking format exposes reasoning between tool calls, aiding agent-loop auditability

mediumVerified: 2026-07-09

hallucination rate

Testing on factual QA datasets and tool-augmented workflows

Evidence

Community testing — Moderate hallucination rate; tool-grounded workflows perform better than closed-book QA

mediumVerified: 2026-07-09

bias fairness

Review of published bias benchmarks and community evaluations

Evidence

MiniMax-M2 Model Card — Limited published bias evaluation

lowVerified: 2026-07-09

uncertainty quantification

Qualitative assessment of confidence expression in outputs

Evidence

Model behavior testing — Basic uncertainty expression; no calibrated confidence outputs

mediumVerified: 2026-07-09

model card quality

Review of documentation completeness and clarity

Evidence

Hugging Face model card — Clear documentation of 230B/10B MoE architecture, MIT license, benchmarks, and deployment guidance

highVerified: 2026-07-09

training data transparency

Review of public disclosures about training data

Evidence

MiniMax publications — Architecture documented; training data composition not disclosed in detail

mediumVerified: 2026-07-09

guardrails

Analysis of built-in safety mechanisms

Evidence

MiniMax-M2 Model Card — Built-in safety tuning; deployers of open weights must layer their own guardrails

mediumVerified: 2026-07-09

⚙️Operational Excellence

Clean MIT licensing and dual OpenAI/Anthropic API compatibility lower switching costs. Successor M3's weights shipped 2026-06-07, resolving the earlier roadmap uncertainty; M2 remains available but is now the previous generation.

api design quality

Review of API design, consistency, and feature completeness

Evidence

MiniMax API Documentation — OpenAI- and Anthropic-compatible endpoints with streaming and tool calling, easing migration

highVerified: 2026-07-09

sdk quality

Review of SDK quality, documentation, and maintenance

Evidence

MiniMax GitHub — Compatibility with mainstream OpenAI/Anthropic SDKs; first-party tooling adequate

mediumVerified: 2026-07-09

versioning policy

Review of versioning practices and weight availability across releases

Evidence

MiniMax-M3 announcement and weights release — Successor M3 (428B/23B, 1M context, native multimodal) launched 2026-06-01; open weights published on Hugging Face by 2026-06-07, though training code and inference operators were not released. M2 weights remain available

mediumVerified: 2026-07-09

monitoring observability

Review of available monitoring tools and metrics

Evidence

MiniMax Platform — Basic usage dashboard; self-hosted observability is deployer-built

mediumVerified: 2026-07-09

support quality

Assessment of documentation, community, and support responsiveness

Evidence

MiniMax community channels — GitHub and community support; limited English-language enterprise support

mediumVerified: 2026-07-09

ecosystem maturity

Analysis of third-party hosting, integrations, and tooling

Evidence

Inference ecosystem — vLLM/SGLang support, OpenRouter availability, popular in open-source agent frameworks

highVerified: 2026-07-09

license terms

Review of licensing terms and restrictions

Evidence

MIT License (Hugging Face card) — MIT license per model card, unrestricted commercial use and derivatives

highVerified: 2026-07-09

Strengths

+Topped open-model agentic tool-calling rankings at launch (October 2025)
+Exceptional efficiency: 10B active of 230B total — fast inference and cheap self-hosting
+Launched at roughly 8% of Claude pricing, among the best cost/capability ratios available
+Clean MIT license with full self-hosting rights
+OpenAI- and Anthropic-compatible APIs minimize migration effort
+~205K context window for long-document and long-trajectory work

Limitations

!First-party MiniMax API processes data under Chinese jurisdiction with no published Western compliance certifications
!Surpassed on raw benchmarks by 2026 open-weight releases (GLM-5, Kimi K2.6)
!Superseded within MiniMax's own lineup: M3 (1M context, native multimodality) shipped with open weights in June 2026
!Text-only — no vision or audio modalities
!Limited published bias, safety, and red-team evaluations
!Interleaved thinking format requires prompt-handling care in some frameworks

Metadata

pricing

input: $0.30 per 1M tokens (approx.)

output: $1.20 per 1M tokens (approx.)

notes: Launched at roughly 8% of Claude Sonnet pricing; third-party host pricing varies. First-party rates unchanged in July 2026 checks.

last verified: 2026-07-09

context window: 204800

languages

0: English

1: Chinese

2: Japanese

3: Korean

4: Spanish

5: French

6: German

modalities

0: text

api endpoint: https://api.minimax.io/v1/chat/completions

open source: true

license: MIT (per Hugging Face model card)

architecture: Mixture-of-Experts: 230B total / 10B active parameters, interleaved thinking for agentic tool use

parameters: 230B total / 10B active

release date: 2025-10-27

Use Case Ratings

code generation

Strong agentic coding at exceptional cost-efficiency; no longer the open-weight leader after 2026 releases.

customer support

Fast (10B active) and very cheap — well suited to high-volume conversational workloads.

content creation

Adequate generation quality at minimal cost.

data analysis

Good tool-calling for analysis pipelines; weaker raw reasoning than GLM-5 or Kimi K2.6.

research assistant

Strong agentic search and tool orchestration; 205K context handles long documents.

legal compliance

China-jurisdiction first-party API and absent Western certifications are blockers unless self-hosted.

healthcare

Not recommended via first-party API; self-hosted deployment in a compliant environment is the only viable path.

financial analysis

Capable and cheap; data residency requires self-hosting for regulated firms.

education

Good tutoring at very low cost; well suited to high-volume educational platforms.

creative writing

Serviceable creative output; not its design focus.

Similar Models

GLM-5

Z.ai (Zhipu AI)

Kimi K2.6

Moonshot AI

DeepSeek-V3.2

DeepSeek

GPT-OSS-120B

OpenAI