AI Model Tracker.
A live comparison of frontier AI models — benchmarks, pricing and capabilities, updated regularly.
AI Model Tracker
GPT-5.5 · Claude Sonnet 4.6 · Gemini 3.1 Pro
Side-by-Side Comparison
| Dimension | GPT-5.4 | Claude Sonnet 4.6 | Gemini 3.1 Pro |
|---|---|---|---|
| Release date | Mar 5, 2026 | Feb 17, 2026 | Feb 19, 2026 |
| Context window | 1.05M tokens | 1M (beta, API only) | 1M tokens |
| Max output | Not disclosed | 128K tokens | 64K tokens |
| Input price / 1M | $2.50 | $3.00 | $2.00 |
| Output price / 1M | $15.00 | $15.00 | $12.00 |
| Cached input price | $0.25 / 1M | 90% savings (est. ~$0.30) | Supported (tiered) |
| Input modalities | Text, Image | Text, Image | Text, Image, Video, Audio |
| Reasoning control | 5-level configurable | Hybrid extended thinking | 3 thinking levels |
| Computer use | Native API | Improved in 4.6 | Not available |
| SWE-bench Verified | ~80.0% | Approaches Opus-level | Not reported |
| ARC-AGI-2 | Not reported | Not reported | 77.1% |
| GPQA (reasoning) | 93.2% (Pro variant) | Not reported | Not reported |
| Batch pricing | 50% off standard | 50% off standard | Available |
| Availability | API + ChatGPT Pro/Plus | API + Claude.ai + Bedrock + Vertex + Foundry | API + Gemini App + Vertex |
| Pro / Premium tier | $30 / $180 per 1M (GPT-5.4 Pro) | $15 / $75 per 1M (Opus 4.6) | Deep Think (Ultra subs) |
What Matters For Your Stack
⚙ Coding
Best pick: GPT-5.4 for complex, multi-file codebases. ~80% SWE-bench and 5-level reasoning control let you dial cost vs. quality per request. Sonnet 4.6 is the value play at $3 input — devs in early access often preferred it over Opus for frontend and financial code. Gemini 3.1 Pro is strong for agentic coding workflows with its efficient token usage.
🤖 Agents
Best pick: GPT-5.4 leads with native computer-use API and configurable reasoning (dial down for fast tool-calling loops, dial up for planning). Sonnet 4.6 has improved computer use and agent planning — great for multi-step browser automation. Gemini 3.1 Pro excels at long-horizon tool orchestration with its medium thinking mode.
📚 RAG
Best pick: Gemini 3.1 Pro — cheapest per-token ($2/$12) with a full 1M context window and native multimodal grounding (text, image, video, audio). Great for document-heavy pipelines. GPT-5.4 at 1.05M context is the largest window but costs more. Sonnet 4.6's 128K max output is ideal when you need long-form synthesis from retrieved docs.
🎨 Multimodal Apps
Best pick: Gemini 3.1 Pro is the clear leader — native text, image, video, and audio input with unified embeddings (gemini-embedding-2). GPT-5.4 handles text + image but no video/audio. Sonnet 4.6 is text + image only. For multimodal RAG or video analysis, Gemini is the only real option at the frontier.
Running Changelog
v1/prompts API and OpenAI-managed reusable prompt objects shut down Nov 30, 2026. Replacement: move prompt content into your application code. What it means: OpenAI is exiting the prompt-registry business — treat prompts as application configuration (Git + config files, DB rows, feature-flag system) rather than an OpenAI resource. Action: inventory any code referencing v1/prompts, move prompts to your own store, and version them in your repo for reproducibility. (OpenAI deprecations)gpt-image-1-mini, gpt-image-1.5, and chatgpt-image-latest on Dec 1, 2026. Replacement is gpt-image-2. This is notable because gpt-image-1 was the migration target for the May 12 DALL·E 2/3 shutdown — so anyone who just migrated to gpt-image-1-mini for cost reasons gets a second migration before year-end. Plan capacity and prompt-compatibility testing for gpt-image-2 over Q3 to avoid a rushed November cutover. No price/quality details on gpt-image-2 surfaced yet. (OpenAI deprecations)gemini-2.0-flash, gemini-2.0-flash-001, gemini-2.0-flash-lite, and gemini-2.0-flash-lite-001. Migrate to gemini-3.5-flash (next-gen mid-tier, went GA May 19) or gemini-3.1-flash-lite (cost/latency-optimized). If anything in production, RAG pipelines, embedding workers, or agent tool-routing still references 2.0 Flash SKUs, it will now error — fix is a string change. Worth auditing today. (Gemini API changelog)claude-opus-4-8. Standard pricing is unchanged from Opus 4.7: $5/1M input, $25/1M output. Fast mode is the headline upgrade — 2.5x speed at $10/1M input, $50/1M output, which Anthropic states is 3x cheaper than the previous fast mode tier. Three additional shipping features: (1) Messages API now accepts system entries inside the messages array, enabling mid-run system updates (useful for agents that need to re-scope mid-task without restarting the conversation); (2) Claude Code dynamic workflows (research preview); (3) effort control in claude.ai and Cowork. Opus 4.8 is now the top-tier choice for high-stakes reasoning and long-context coding agents — reserve for jobs where Sonnet 4.6 hits quality limits. (Anthropic news)gemini-3.1-flash-image (fast/cost-optimized) and gemini-3-pro-image (frontier quality). Notable new capability on Flash Image: video-as-context for video-to-image generation — pass a video and generate keyframes/derivatives without a separate vision pipeline. Preview SKUs deprecated effective May 28 and shut down June 25, 2026: migrate gemini-3.1-flash-image-preview → gemini-3.1-flash-image, and gemini-3-pro-image-preview → gemini-3-pro-image. Worth comparing against OpenAI gpt-image-1 and the older Imagen/Veo workflows. (Gemini API changelog)gemini-3.1-flash-lite. If you missed this and are seeing outages today, the swap is a string change in your model parameter; no schema or pricing differences. (Gemini API changelog)dall-e-2 and dall-e-3 snapshots are shut down as of May 12, 2026. Migrate any remaining image-generation calls to gpt-image-1 (frontier quality) or gpt-image-1-mini (cost/latency-optimized). Anything still pointing at DALL·E snapshots will now error — audit production code, marketing pipelines, and any user-facing image features. (OpenAI deprecations)return_token_budget parameter on the Responses API web search tool allows GPT-5+ to spend more tokens reasoning over web search results before returning an answer. Directly relevant for deep-research agents and any RAG-over-web pipeline where the model was previously cutting reasoning short. Tune higher for harder multi-source questions; expect higher latency and token cost in exchange for answer quality. (OpenAI API changelog)outputs to steps and changing response_format configuration. The new schema becomes default on May 20, 2026, and the legacy schema is removed entirely on June 6, 2026. This is a breaking change for any agent framework built on the Interactions API — plan migration tests before May 20 and confirm production is updated before June 6 to avoid outages. (Gemini API changelog)media_id and page_numbers fields, enabling proper image- and page-level citations in attributed-answer UIs. If you're using Gemini File Search for RAG, consider indexing images and surfacing the new metadata as inline citations. (Gemini API changelog)