Gemini 3.1 Pro: Google's Reasoning Leap Explained
Google released Gemini 3.1 Pro on February 19, 2026 — scoring 77.1% on ARC-AGI-2, more than doubling Gemini 3 Pro. Full benchmark breakdown, pricing ($2/$12 per M tokens), availability, and what it means for developers.
TL;DR
Google released Gemini 3.1 Pro (preview) on February 19, 2026. The key numbers:
- ARC-AGI-2: 77.1% — more than double Gemini 3 Pro (31.1%), beats Opus 4.6 (68.8%) and GPT-5.2 (52.9%)
- GPQA Diamond: 94.3% — leads all models on graduate-level science
- SWE-bench: 80.6% — matches Opus 4.6 (80.8%) on coding
- Price: $2/$12 per M tokens — cheapest frontier model
- 1M token context — unchanged from Gemini 3 Pro
- Leads on 13 of 16 benchmarks evaluated by Google
- Available now in preview: AI Studio, Vertex AI, Gemini CLI, Gemini app
What Google Announced
On February 19, 2026, Google released Gemini 3.1 Pro — the first ".1" increment in their model versioning. It builds on Gemini 3 Pro (November 2025) by integrating techniques from the Gemini 3 Deep Think series into a more accessible, faster model.
Google's blog describes it as designed for "tasks where a simple answer isn't enough" — complex multi-step reasoning, data synthesis, and agentic workflows.
The headline stat: 77.1% on ARC-AGI-2, the benchmark for novel abstract reasoning. That's more than double Gemini 3 Pro's 31.1%, and significantly ahead of both Opus 4.6 (68.8%) and GPT-5.2 (52.9%). VentureBeat calls it "a Deep Think Mini with adjustable reasoning on demand."
Be first to build with AI
Y Build is the AI-era operating system for startups. Join the waitlist and get early access.
Full Benchmark Breakdown
Where Gemini 3.1 Pro Leads (13 of 16 benchmarks)
| Benchmark | What It Tests | Gemini 3.1 Pro | Best Competitor |
|---|---|---|---|
| ARC-AGI-2 | Novel reasoning | 77.1% | Opus 4.6: 68.8% |
| GPQA Diamond | Graduate science | 94.3% | GPT-5.2: 92.4% |
| BrowseComp | Agentic web search | 85.9% | Opus 4.6: 84.0% |
| Terminal-Bench 2.0 | Terminal coding | 68.5% | Opus 4.6: 65.4% |
| APEX-Agents | Agent capabilities | 33.5% | Opus 4.6: 29.8% |
| MCP Atlas | Tool use | 69.2% | — |
| t2-bench Telecom | Domain-specific | 99.3% | — |
| SWE-bench Verified | Coding | 80.6% | Opus 4.6: 80.8% |
| MRCR v2 | Long-context | 84.9% | Sonnet 4.6: 84.9% (tie) |
Where Competitors Still Win
| Benchmark | What It Tests | Winner | Gemini 3.1 Pro |
|---|---|---|---|
| GDPval-AA (Elo) | Office tasks | Sonnet 4.6: 1633 | Not disclosed |
| Terminal-Bench 2.0 | Heavy terminal coding | GPT-5.3-Codex: 77.3% | 68.5% |
| SWE-Bench Pro | Advanced coding | GPT-5.3-Codex: 56.8% | Not disclosed |
| OSWorld | Computer use | Sonnet 4.6: 72.5% | Not benchmarked |
The Reasoning Leap in Context
ARC-AGI-2 measures a model's ability to solve problems it has never seen before — pure abstract reasoning, not pattern matching from training data. Here's how quickly Gemini improved:
| Model | ARC-AGI-2 | Date |
|---|---|---|
| Gemini 3 Pro | 31.1% | Nov 2025 |
| GPT-5.2 | 52.9% | Dec 2025 |
| Claude Opus 4.6 | 68.8% | Feb 2026 |
| Gemini 3.1 Pro | 77.1% | Feb 2026 |
Gemini 3.1 Pro jumped from 31.1% to 77.1% in one version — a 148% improvement. This comes from integrating Deep Think's extended reasoning techniques into the base model.
What Changed vs. Gemini 3 Pro
1. Deep Think Integration
Gemini 3 Deep Think was a separate, slower model optimized for extended reasoning. Gemini 3.1 Pro bakes those techniques into the standard model, with adjustable reasoning depth. You get Deep Think-level reasoning without the Deep Think latency for most tasks.
2. Dramatically Better Reasoning
The numbers speak for themselves:
| Benchmark | Gemini 3 Pro | Gemini 3.1 Pro | Improvement |
|---|---|---|---|
| ARC-AGI-2 | 31.1% | 77.1% | +148% |
| GPQA Diamond | ~88% | 94.3% | +7% |
| APEX-Agents | 18.4% | 33.5% | +82% |
3. Better Agentic Performance
APEX-Agents (33.5%) and MCP Atlas (69.2%) scores show Gemini 3.1 Pro is significantly more capable as an autonomous agent — tool use, multi-step planning, and self-correction are all improved.
4. Maintained Multimodal Strength
Gemini 3.1 Pro retains Gemini's core advantage: native multimodal processing of text, images, audio, and video within a single context. No other frontier model matches this breadth at this price point.
Pricing
Same price as Gemini 3 Pro — a free upgrade:
| Context Size | Input (per M tokens) | Output (per M tokens) |
|---|---|---|
| ≤200K tokens | $2.00 | $12.00 |
| >200K tokens | $4.00 | $18.00 |
Comparison with Competitors
| Model | Input | Output | Relative Cost |
|---|---|---|---|
| Gemini 3.1 Pro | $2.00 | $12.00 | 1x |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 1.5x |
| GPT-5.2 | $5.00 | $15.00 | 2.0x (input) |
| Claude Opus 4.6 | $15.00 | $75.00 | 7.5x |
Gemini 3.1 Pro is the cheapest frontier model — 33% cheaper than Sonnet 4.6 on input, and 20% cheaper on output.
Cost Per Session (100K in + 20K out)
| Model | Cost |
|---|---|
| Gemini 3.1 Pro | $0.44 |
| Claude Sonnet 4.6 | $0.60 |
| GPT-5.2 | $0.80 |
| Claude Opus 4.6 | $3.00 |
Additional cost optimization:
- Batch mode: 50% discount ($0.22/session)
- Context caching: Cached input reads cost 10% of base price
Availability
Where to Use It
| Platform | Status | Model ID |
|---|---|---|
| Gemini App (consumer) | Rolling out | Auto-selected |
| Google AI Studio | Available now | gemini-3.1-pro-preview |
| Vertex AI | Available now | gemini-3.1-pro-preview |
| Gemini API | Available now | gemini-3.1-pro-preview |
| Gemini CLI | Available now | gemini-3.1-pro-preview |
| Antigravity | Available now | Auto-selected |
| Android Studio | Available now | Auto-selected |
| GitHub Copilot | Public preview | Selectable |
| NotebookLM | Pro/Ultra subscribers | Auto-selected |
API Quick Start
import google.generativeai as genai
genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel("gemini-3.1-pro-preview")
response = model.generate_content("Your prompt here")
print(response.text)
Custom Tools Endpoint
Google also launched a specialized endpoint for better tool performance:
model = genai.GenerativeModel("gemini-3.1-pro-preview-customtools")
Use this endpoint when building agents that rely heavily on function calling and tool use.
What This Means
The Reasoning Race Heats Up
Three frontier models released in 13 days:
- Feb 6: Claude Opus 4.6 (Anthropic)
- Feb 17: Claude Sonnet 4.6 (Anthropic)
- Feb 19: Gemini 3.1 Pro (Google)
Each claims leadership in different areas. The model landscape is fragmenting — no single model dominates everything anymore.
Best-in-Class Reasoning at Budget Pricing
Gemini 3.1 Pro's 77.1% ARC-AGI-2 is the highest reasoning score available, at the lowest price ($2/$12). For tasks requiring novel problem-solving, abstract reasoning, or scientific analysis, it's the clear choice.
Coding Parity
With 80.6% on SWE-bench (vs. Opus 4.6's 80.8% and Sonnet 4.6's 79.6%), Gemini 3.1 Pro is now competitive on coding for the first time. Previous Gemini models trailed Claude significantly on this benchmark.
The Missing Piece: Computer Use
Gemini 3.1 Pro doesn't benchmark on OSWorld (computer use). Claude Sonnet 4.6 leads at 72.5% on this capability. If your workflow involves browser automation, form filling, or desktop control, Claude remains the only viable option.
For Developers Building Products
The practical implications:
- Cheapest reasoning: $0.44/session vs $0.60 (Sonnet) vs $0.80 (GPT-5.2)
- Best for scientific/analytical tasks: 94.3% GPQA Diamond is the highest score available
- Competitive on coding: 80.6% SWE-bench closes the gap with Claude
- Multimodal advantage: Native video/audio processing that Claude and GPT don't match
- Preview status: Not yet GA — expect improvements before general availability
Building with AI? Y Build integrates with your preferred AI tools for development, then handles deployment, Demo Cut product videos, AI SEO, and analytics — the full stack from code to growth. Start free.
Sources:
- Google Blog: Gemini 3.1 Pro announcement
- Google DeepMind: Gemini 3.1 Pro Model Card
- 9to5Google: Gemini 3.1 Pro for complex problem-solving
- VentureBeat: Gemini 3.1 Pro first impressions
- MarkTechPost: Gemini 3.1 Pro 77.1% ARC-AGI-2
- OfficeChai: Gemini 3.1 Pro Benchmarks
- GitHub Blog: Gemini 3.1 Pro in GitHub Copilot
- The Decoder: Gemini 3.1 Pro reasoning
Be first to build with AI
Y Build is the AI-era operating system for startups. Join the waitlist and get early access.