Home›Models›GPT-OSS-120B

GPT-OSS-120B

OpenAI

92·Exceptional

Overall Trust Score

OpenAI's first open-weight model released August 2025. 117B total params (5.1B active), Apache 2.0 license. Matches o4-mini on many benchmarks. Runs in 80GB memory.

open-source

apache-2.0

self-hosted

privacy

moe

reasoning

coding

on-premises

customizable

transparent

Version: 20250805

Last Evaluated: November 17, 2025

Official Website →

Trust Vector

Performance & Reliability

Flagship open-source performance. MoE architecture activates 5.1B of 117B params per token. Matches or beats o4-mini on most benchmarks.

task accuracy code

Methodology

Competition coding and tool use benchmarks

Evidence

Codeforces Benchmark

Matches o4-mini on competition coding

Date: 2025-08-05

TauBench Tool Calling

Exceeds o4-mini on tool calling

Date: 2025-08-05

Confidence: highLast verified: 2025-11-17

task accuracy reasoning

Methodology

Math competition benchmarks

Evidence

AIME 2024 & 2025

Outperforms o3-mini on competition mathematics

Date: 2025-08-05

Chain-of-Thought Access

Full chain-of-thought reasoning process exposed

Date: 2025-08-05

Confidence: highLast verified: 2025-11-17

task accuracy general

Methodology

General knowledge and domain-specific testing

Evidence

MMLU & HLE

Matches o4-mini on general problem solving

Date: 2025-08-05

HealthBench

Exceeds o4-mini on health-related queries

Date: 2025-08-05

Confidence: highLast verified: 2025-11-17

output consistency

Methodology

Internal testing

Evidence

OpenAI Model Card

Configurable reasoning effort for consistency

Date: 2025-08-05

Confidence: mediumLast verified: 2025-11-17

latency p50

Value: 1.0s

Methodology

Median latency estimation

Evidence

Optimized Inference

Fast inference with MoE architecture

Date: 2025-08-05

Confidence: mediumLast verified: 2025-11-17

latency p95

Value: 2.2s

Methodology

95th percentile from community benchmarks

Evidence

Community Deployments

~2s on H100 hardware

Date: 2025-08-10

Confidence: mediumLast verified: 2025-11-17

context window

Value: 128,000 tokens

Methodology

Official specification

Evidence

OpenAI Technical Specs

128K context window natively supported

Date: 2025-08-05

Confidence: highLast verified: 2025-11-17

uptime

Methodology

Self-hosting provides full control

Evidence

Self-Hosted Model

100% uptime when self-hosted

Date: 2025-08-05

Confidence: highLast verified: 2025-11-17

Security

Good base security. Self-hosting provides complete control over safety guardrails and data handling.

prompt injection resistance

Methodology

OWASP LLM01 testing

Evidence

OpenAI Safety Testing

Good resistance, customizable for self-hosted

Date: 2025-08-05

Confidence: mediumLast verified: 2025-11-17

jailbreak resistance

Methodology

Adversarial testing

Evidence

Community Testing

Standard resistance, self-host allows custom guardrails

Date: 2025-08-10

Confidence: mediumLast verified: 2025-11-17

data leakage prevention

Methodology

Self-hosting analysis

Evidence

Self-Hosted Deployment

Complete data control when self-hosted

Date: 2025-08-05

Confidence: highLast verified: 2025-11-17

output safety

Methodology

Safety testing

Evidence

OpenAI Safety

Standard safety training, customizable

Date: 2025-08-05

Confidence: mediumLast verified: 2025-11-17

api security

Methodology

Deployment security review

Evidence

Self-Hosted Security

Customer controls all API security when self-hosted

Date: 2025-08-05

Confidence: highLast verified: 2025-11-17

Privacy & Compliance

Perfect privacy when self-hosted. No data sent to OpenAI. Full compliance control. Ideal for regulated industries.

data residency

Value: Anywhere (self-hosted)

Methodology

Self-hosting analysis

Evidence

Open-Weight Model

Deploy anywhere, full data residency control

Date: 2025-08-05

Confidence: highLast verified: 2025-11-17

training data optout

100

Methodology

Privacy model analysis

Evidence

Self-Hosted Model

No data sent to OpenAI when self-hosted

Date: 2025-08-05

Confidence: highLast verified: 2025-11-17

data retention

Value: 0 days (self-controlled)

Methodology

Self-hosting review

Evidence

Self-Hosted Deployment

Complete control over data retention

Date: 2025-08-05

Confidence: highLast verified: 2025-11-17

pii handling

100

Methodology

Data flow analysis

Evidence

On-Premises Deployment

PII never leaves your infrastructure

Date: 2025-08-05

Confidence: highLast verified: 2025-11-17

compliance certifications

Methodology

Compliance model review

Evidence

Self-Hosted Compliance

Inherit your infrastructure's certifications

Date: 2025-08-05

Confidence: highLast verified: 2025-11-17

zero data retention

100

Methodology

Privacy architecture review

Evidence

Open-Weight Model

Complete control, zero external retention

Date: 2025-08-05

Confidence: highLast verified: 2025-11-17

Trust & Transparency

Exceptional transparency. Full chain-of-thought access. Complete model weights and architecture disclosed. Open-source enables auditing.

explainability

Methodology

Reasoning transparency

Evidence

Full Chain-of-Thought

Complete access to reasoning process

Date: 2025-08-05

Confidence: highLast verified: 2025-11-17

hallucination rate

Methodology

QA testing

Evidence

Benchmark Testing

Good factual accuracy

Date: 2025-08-05

Confidence: mediumLast verified: 2025-11-17

bias fairness

Methodology

Bias benchmarks

Evidence

OpenAI Model Card

Standard bias testing

Date: 2025-08-05

Confidence: mediumLast verified: 2025-11-17

uncertainty quantification

Methodology

Confidence assessment

Evidence

Model Behavior

Good uncertainty expression

Date: 2025-08-05

Confidence: mediumLast verified: 2025-11-17

model card quality

Methodology

Documentation review

Evidence

Comprehensive Model Card

Detailed technical specs, benchmarks, architecture

Date: 2025-08-05

Confidence: highLast verified: 2025-11-17

training data transparency

Methodology

Training data disclosure review

Evidence

OpenAI Documentation

Mostly English, STEM, coding focus disclosed

Date: 2025-08-05

Confidence: highLast verified: 2025-11-17

guardrails

Methodology

Safety mechanism review

Evidence

Customizable Guardrails

Standard safety, customizable when self-hosted

Date: 2025-08-05

Confidence: highLast verified: 2025-11-17

Operational Excellence

Exceptional operational flexibility. Apache 2.0 enables commercial use. Massive deployment ecosystem. Self-host or use managed platforms.

api design quality

Methodology

API compatibility review

Evidence

Deployment Platforms

Works with vLLM, Ollama, llama.cpp, Azure, AWS, etc.

Date: 2025-08-05

Confidence: highLast verified: 2025-11-17

sdk quality

Methodology

SDK ecosystem review

Evidence

GitHub Repository

Official repo, Hugging Face integration

Date: 2025-08-05

Confidence: highLast verified: 2025-11-17

versioning policy

Methodology

Version stability analysis

Evidence

Open Weights

Weights frozen, no deprecation risk

Date: 2025-08-05

Confidence: highLast verified: 2025-11-17

monitoring observability

Methodology

Monitoring capability review

Evidence

Self-Hosted Control

Full observability when self-hosted

Date: 2025-08-05

Confidence: highLast verified: 2025-11-17

support quality

Methodology

Support ecosystem assessment

Evidence

Community Support

GitHub issues, community forums, deployment partners

Date: 2025-08-05

Confidence: mediumLast verified: 2025-11-17

ecosystem maturity

Methodology

Ecosystem breadth analysis

Evidence

Deployment Partners

Azure, Hugging Face, AWS, Fireworks, Together AI, Databricks, Vercel, Cloudflare, OpenRouter

Date: 2025-08-05

Confidence: highLast verified: 2025-11-17

license terms

100

Methodology

License review

Evidence

Apache 2.0 License

Permissive Apache 2.0, no copyleft, no patent risk

Date: 2025-08-05

Confidence: highLast verified: 2025-11-17

✨ Strengths

•Apache 2.0 open-weight license enables commercial use without restrictions
•Matches or beats o4-mini on coding, math, and health benchmarks
•Complete data privacy when self-hosted (zero external data transmission)
•Full chain-of-thought reasoning access for transparency and debugging
•MoE architecture: 5.1B active of 117B total params, runs in 80GB
•Massive deployment ecosystem (Azure, AWS, Hugging Face, vLLM, Ollama)

⚠️ Limitations

•Requires 80GB GPU memory (H100 or equivalent)
•Self-hosting complexity and infrastructure costs
•Community support vs enterprise SLA
•Slightly lower performance than flagship closed models
•No built-in safety guardrails (customizable but requires setup)

📊 Metadata

pricing:

input: Free (self-hosted)

output: Free (self-hosted)

notes: Infrastructure costs only: ~$2-4/hr for H100. Managed platforms vary. Free for download and commercial use under Apache 2.0.

last verified: 2025-11-17

context window: 128000

languages:

0: English

1: Spanish

2: French

3: German

4: Italian

5: Portuguese

6: Japanese

7: Korean

8: Chinese

modalities:

0: text

api endpoint: Self-hosted (various platforms)

model download: https://huggingface.co/openai/gpt-oss-120b

github: https://github.com/openai/gpt-oss

open source: true

license: Apache 2.0

architecture: Mixture-of-Experts (MoE) Transformer

parameters: 117B total (5.1B active per token)

memory requirement: 80GB (MXFP4 quantization)

tokenizer: o200k_harmony

deployment platforms:

0: Azure

1: AWS

2: Hugging Face

3: vLLM

4: Ollama

5: llama.cpp

6: LM Studio

7: Fireworks

8: Together AI

9: Baseten

10: Databricks

11: Vercel

12: Cloudflare

13: OpenRouter

Use Case Ratings

code generation

Excellent coding. Matches o4-mini. Configurable reasoning effort. Full chain-of-thought debugging.

customer support

Good for customer support. Self-host for complete data privacy. Configurable reasoning for cost control.

content creation

Strong content creation. Self-hosting enables unlimited generation without API costs.

data analysis

Excellent for data analysis. Keep sensitive data on-premises. Full chain-of-thought for transparency.

research assistant

Outstanding for research. 128K context. Self-host proprietary research data. Full reasoning transparency.

legal compliance

Perfect for legal. Self-host for complete compliance. No data leaves premises. Apache 2.0 license clarity.

healthcare

Ideal for healthcare. Self-host for HIPAA. Complete PHI privacy. No external data transmission.

financial analysis

Excellent for finance. Outperforms o3-mini on math. Self-host proprietary financial data.

education

Great for education. Full chain-of-thought shows reasoning steps. Self-host for institutional control.

creative writing

Good creative writing. Unlimited generation when self-hosted. No API costs for iteration.

Similar Models

GPT-OSS-20B

OpenAI

OpenAI o4-mini

OpenAI

OpenAI o3-mini

OpenAI

Claude Haiku 4.5

Anthropic