Evaluation record · gemini-3-5-flash

Gemini 3.5 Flash

vgemini-3.5-flash

Google

Modelworkhorsegaagenticfast

Strong

About This Model

Google's GA 'frontier workhorse' launched at I/O 2026. Beats Gemini 3.1 Pro on agentic and coding suites (76.2% Terminal-Bench 2.1, 83.6% MCP Atlas) at roughly 4x the speed, with 1M token context. Pricier than past Flash tiers at $1.50/$9.00 per 1M.

Last Evaluated: July 9, 2026

Official Website

Trust Vector Analysis

Dimension Breakdown

🚀Performance & Reliability

Unusual positioning: a Flash-tier model that beats the flagship 3.1 Pro on agentic/coding suites at ~4x speed. Benchmark figures are official claims from launch (2026-05-19); third-party replication still maturing.

task accuracy code

Official launch benchmarks for agentic coding; vendor-reported, pending broad third-party replication

Evidence

Terminal-Bench 2.1 — 76.2% on terminal/command-line agentic tasks (official claim, beats Gemini 3.1 Pro)

MCP Atlas — 83.6% multi-tool orchestration (official claim, above Gemini 3.1 Pro's 78.2%)

highVerified: 2026-07-09

task accuracy reasoning

Official agentic and multimodal reasoning benchmarks from launch materials

Evidence

Finance Agent v2 — 57.9% on multi-step financial agent tasks (official claim)

CharXiv — 84.2% on chart reasoning (official claim)

mediumVerified: 2026-07-09

task accuracy general

Cross-benchmark comparison against Gemini 3.1 Pro from official launch claims

Evidence

Google I/O 2026 Launch — Positioned as 'frontier workhorse' matching or beating prior Pro tier on agentic suites

mediumVerified: 2026-07-09

output consistency

Consistency assessment based on GA status and vendor throughput claims

Evidence

Google DeepMind Models Page — GA status with ~4x speed of Gemini 3.1 Pro at consistent quality

mediumVerified: 2026-07-09

latency p50

Relative speed claims from official materials; absolute latency varies by workload

Evidence

Google launch materials — ~4x faster than Gemini 3.1 Pro on agentic workloads

mediumVerified: 2026-07-09

context window

Official specification from provider documentation

Evidence

Gemini API Changelog — 1,048,576 input tokens, 64K output tokens

highVerified: 2026-07-09

uptime

Historical uptime data from official status page

Evidence

Google Cloud Status — 99.9% uptime (last 90 days, Vertex AI)

highVerified: 2026-07-09

🛡️Security

Standard Gemini 3.x-family security posture on Google Cloud infrastructure. High-speed agentic use increases the importance of downstream tool sandboxing.

prompt injection resistance

OWASP LLM01 testing and vendor documentation review

Evidence

Google AI Safety — Gemini 3.x-generation prompt injection defenses

mediumVerified: 2026-07-09

jailbreak resistance

Adversarial prompt testing

Evidence

Google Safety Testing — Adversarial robustness consistent with Gemini 3.x family

mediumVerified: 2026-07-09

data leakage prevention

Evidence

Gemini API Terms — Paid-tier API data not used for training

mediumVerified: 2026-07-09

output safety

Safety filter testing across content categories

Evidence

Google Safety Filters — Configurable multi-category safety filters

highVerified: 2026-07-09

api security

API security feature review

Evidence

Google Cloud Security — Google Cloud security standards, Vertex AI IAM

highVerified: 2026-07-09

🔒Privacy & Compliance

Same Google Cloud compliance envelope as the Pro tier: SOC/ISO certifications, GDPR, HIPAA via Google Cloud, EU residency options.

data residency

Cloud infrastructure documentation review

Evidence

Google Cloud Locations — Regional deployment options via Vertex AI

highVerified: 2026-07-09

training data optout

Evidence

Gemini API Terms — Paid API data not used for training

highVerified: 2026-07-09

data retention

Data retention policy review

Evidence

Google Cloud Service Terms — Enterprise zero-retention configurations available

mediumVerified: 2026-07-09

pii handling

Data protection capability review

Evidence

Google AI Documentation — Customer responsible for PII redaction; Cloud DLP integration available

mediumVerified: 2026-07-09

compliance certifications

Certification verification

Evidence

Google Cloud Compliance — SOC 1/2/3, ISO 27001/27017/27018, GDPR, HIPAA (via Google Cloud)

highVerified: 2026-07-09

zero data retention

Enterprise feature review

Evidence

Vertex AI Data Governance — Zero-retention configuration available for enterprise customers

mediumVerified: 2026-07-09

👁️Trust & Transparency

Solid documentation at launch; ~7 weeks GA as of 2026-07-09. Most benchmark figures remain vendor-reported and independent verification is still accumulating.

explainability

Reasoning transparency evaluation

Evidence

Gemini API Documentation — Thinking traces available, though shallower than Pro-tier extended reasoning

mediumVerified: 2026-07-09

hallucination rate

Benchmark-derived grounding assessment; vendor claims pending independent replication

Evidence

CharXiv (chart grounding) — 84.2% chart reasoning suggests solid grounding; speed-optimized tiers historically hallucinate slightly more

mediumVerified: 2026-07-09

bias fairness

Bias benchmark evaluation and policy review

Evidence

Google AI Principles — Regular bias testing and mitigation per AI Principles

mediumVerified: 2026-07-09

uncertainty quantification

Qualitative assessment; limited data given ~7 weeks since GA

Evidence

Model Behavior — Limited independent assessment so far; recent GA release

lowVerified: 2026-07-09

model card quality

Documentation completeness review

Evidence

Gemini Documentation — Launch documentation with benchmarks, specs, and pricing

highVerified: 2026-07-09

training data transparency

Public disclosure review

Evidence

Google AI Blog — General training description; detailed sources not disclosed

mediumVerified: 2026-07-09

guardrails

Safety mechanism analysis

Evidence

Safety Settings — Configurable multi-category safety guardrails

highVerified: 2026-07-09

⚙️Operational Excellence

Full Google Cloud operational stack from day one. Pricing ($1.50/$9.00, cached input $0.15) is notably higher than past Flash tiers, narrowing the cost gap to Pro models.

api design quality

API design and feature completeness review

Evidence

Gemini API — RESTful API with streaming, function calling, multimodal, context caching

highVerified: 2026-07-09

sdk quality

SDK quality and maintenance assessment

Evidence

Google Gen AI SDKs — Unified Gen AI SDKs across Python, Node.js, Go, Java

highVerified: 2026-07-09

versioning policy

Versioning and changelog review

Evidence

Gemini API Changelog — GA at I/O 2026 with documented model lifecycle

Gemini API Deprecations — gemini-3.5-flash active with no shutdown date; named designated replacement for gemini-2.0-flash (retired 2026-06-01), gemini-2.5-flash (shutdown 2026-10-16), and gemini-3-flash-preview

highVerified: 2026-07-09

monitoring observability

Observability tooling review

Evidence

Google Cloud Console — Comprehensive Cloud Console and Vertex AI monitoring

highVerified: 2026-07-09

support quality

Support channel assessment

Evidence

Google Cloud Support — Enterprise support tiers with SLAs

highVerified: 2026-07-09

ecosystem maturity

Ecosystem and integration analysis

Evidence

Google AI Ecosystem — Day-one availability across AI Studio, Vertex AI, and Gemini app

highVerified: 2026-07-09

license terms

License terms review

Evidence

Google Cloud Terms — Standard commercial terms; enterprise agreements available

highVerified: 2026-07-09

Strengths

+Beats Gemini 3.1 Pro on agentic/coding suites: 76.2% Terminal-Bench 2.1, 83.6% MCP Atlas
+~4x faster than Gemini 3.1 Pro — strong agent-loop economics
+1,048,576 token input context with 64K output
+Context caching at $0.15 per 1M cuts repeated-prefix costs dramatically
+GA from day one across AI Studio, Vertex AI, and Gemini app
+Full Google Cloud compliance envelope (SOC/ISO, GDPR, HIPAA via GCP)

Limitations

!Pricier than past Flash tiers ($1.50/$9.00 vs historical sub-dollar Flash pricing)
!Benchmark figures are official claims; independent replication still accumulating (~7 weeks since GA)
!Deepest reasoning tasks still favor Pro-tier models
!Under two months of production track record
!Training data transparency limited (industry standard)

Metadata

pricing

input: $1.50 per 1M tokens

output: $9.00 per 1M tokens

notes: Confirmed on the official Gemini API pricing page 2026-07-09. Cached input $0.15 per 1M; batch API 50% discount ($0.75/$4.50). Pricier than past Flash tiers, reflecting 'frontier workhorse' positioning.

last verified: 2026-07-09

context window: 1048576

max output: 65536

languages

0: English

1: 100+ languages

modalities

0: text

1: vision

2: audio

3: video

api endpoint: https://generativelanguage.googleapis.com/v1beta/models

open source: false

architecture: Speed-optimized multimodal transformer with thinking support

parameters: Not disclosed

knowledge cutoff: Early 2026 (not officially confirmed)

release date: 2026-05-19

Use Case Ratings

code generation

76.2% Terminal-Bench 2.1 and 83.6% MCP Atlas — beats Gemini 3.1 Pro on agentic coding at ~4x speed. Excellent agent-loop economics.

customer support

Speed plus frontier quality is ideal for high-volume support. Multimodal input handles screenshots.

data analysis

84.2% CharXiv chart reasoning and 1M context for large datasets at workhorse cost.

financial analysis

57.9% Finance Agent v2 is strong for an agentic suite, but high-stakes analysis still favors Pro-tier reasoning.

research assistant

1M context and fast iteration; deepest reasoning tasks still favor Gemini 3.1 Pro.

content creation

Fast, capable long-form generation for production content pipelines.

education

Low latency suits interactive tutoring; strong multimodal explanations.

Similar Models

Gemini 3.1 Pro

Google

Gemini 3 Flash

Google

Gemini 3 Pro

Google

GPT-5.5

OpenAI