AI Models

A reference guide to the major AI models available today — what they're good at, what they cost, and where they stand in the market.

Last updated: April 16, 2026 AI models change fast. Pricing, capabilities, and availability may have shifted since this page was last reviewed. Next review scheduled for June 2026.
Pricing is per 1 million tokens (roughly 750,000 words). Prices shown are standard API rates as of April 16, 2026. Cached/batch pricing is often 50–90% cheaper. Always check provider docs for current rates.

Anthropic (Claude)

Prices as of 2026-04-16 Docs
Model Released Context Input Output Best For Status
Claude Opus 4
claude-opus-4-20250514
May 2025 200K $15.00 $75.00 Complex analysis, research synthesis, application engineering Active
Claude Sonnet 4
claude-sonnet-4-20250514
May 2025 200K $3.00 $15.00 General-purpose, coding, conversational AI, chat assistants Active
Claude Sonnet 4.5
claude-sonnet-4-5-20250929
Sep 2025 200K $3.00 $15.00 Production chat applications, tool-heavy workflows Active
Claude Haiku 3.5
claude-haiku-3-5-20241022
Oct 2024 200K $0.80 $4.00 High-volume tasks, classification, routing, quick answers Active
Claude 3 Opus
claude-3-opus-20240229
Mar 2024 200K $15.00 $75.00 Legacy integrations Superseded by Claude Opus 4
Claude Opus 4

Most capable Claude model. Excels at complex reasoning, multi-step analysis, nuanced writing, and code generation. Highest accuracy on benchmarks.

Claude Sonnet 4

Strong balance of capability and cost. Excellent at coding, analysis, and conversational tasks. Faster than Opus with near-comparable quality on most tasks.

Claude Sonnet 4.5

Improved reasoning and instruction-following over Sonnet 4. Better at structured output, tool use, and complex multi-turn conversations.

Claude Haiku 3.5

Fastest Claude model. Strong for its price tier. Good at classification, extraction, and simple Q&A. Lower quality on complex reasoning.

OpenAI

Prices as of 2026-04-16 Docs
Model Released Context Input Output Best For Status
GPT-4.1
gpt-4.1
Apr 2025 1M $2.00 $8.00 Coding, long documents, general-purpose Active — newest flagship
GPT-4.1 Mini
gpt-4.1-mini
Apr 2025 1M $0.40 $1.60 Production workloads, cost-conscious deployments Active
GPT-4.1 Nano
gpt-4.1-nano
Apr 2025 1M $0.10 $0.40 Classification, extraction, high-volume simple tasks Active
GPT-4o
gpt-4o
May 2024 128K $2.50 $10.00 Multimodal tasks, proven production stability Active — being superseded by GPT-4.1
GPT-4o Mini
gpt-4o-mini
Jul 2024 128K $0.15 $0.60 Budget workloads, prototyping, high-volume Active — being superseded by GPT-4.1 Mini
o3
o3
Apr 2025 200K $10.00 $40.00 Math, science, complex reasoning, hard problems Active — premium reasoning
o3-mini
o3-mini
Jan 2025 200K $1.10 $4.40 Moderate reasoning tasks, cost-efficient problem solving Active
o4-mini
o4-mini
Apr 2025 200K $1.10 $4.40 Agentic tasks, coding with reasoning, tool use Active — newest reasoning
Reasoning models (o-series) work differently from standard chat models. They "think" internally before answering, using more tokens but producing more accurate results on hard problems. Pricing reflects the additional compute — output tokens include the model's hidden reasoning chain.
GPT-4.1

Latest OpenAI flagship. Excellent at coding, instruction following, and long-context tasks. 1M token context window. Strong all-around performer.

GPT-4.1 Mini

Cost-efficient version of GPT-4.1. Good balance of quality and speed. Strong for most production workloads that don't need peak reasoning.

GPT-4.1 Nano

Ultra-lightweight. Fastest and cheapest GPT-4.1 variant. Good for simple tasks where latency and cost matter more than depth.

o3

Dedicated reasoning model. Uses chain-of-thought internally before answering. Strongest on math, science, logic, and complex multi-step problems.

o3-mini

Lightweight reasoning model. Good for tasks that benefit from step-by-step thinking without the full o3 cost.

o4-mini

Latest small reasoning model. Improved over o3-mini on coding and tool use. Good agentic capabilities.

Google (Gemini)

Prices as of 2026-04-16 Docs
Model Released Context Input Output Best For Status
Gemini 2.5 Pro
gemini-2.5-pro
Mar 2025 1M $1.25 / $2.50 $10.00 / $15.00 Code generation, multimodal, long-context analysis Active — flagship
Gemini 2.5 Flash
gemini-2.5-flash
Mar 2025 1M $0.15 $0.60 Cost-efficient production, long documents, multimodal Active
Gemini 2.0 Flash
gemini-2.0-flash
Feb 2025 1M $0.10 $0.40 Budget workloads, agentic tasks Active — being superseded by 2.5 Flash
Gemini pricing has two tiers based on context length: standard (<200K tokens) and long (>200K tokens). The dual prices shown for Gemini 2.5 Pro reflect standard / long context rates. A generous free tier is available for experimentation.

Open Source / Open Weights

These models are available to download and self-host. Pricing depends on your infrastructure — the table below shows parameter counts instead of per-token pricing. Many are also available via hosted APIs (Together, Fireworks, Groq, etc.) at competitive rates.

Model Provider Released Context Parameters Best For Status
Llama 4 Maverick Meta Apr 2025 1M 400B (17B active) Self-hosted production, multilingual, cost control Active — open source
Llama 4 Scout Meta Apr 2025 10M 109B (17B active) Ultra-long context, document ingestion, self-hosted Active — open source
DeepSeek R1 DeepSeek Jan 2025 128K 671B MoE Math, coding, reasoning at low cost Active — open weights
Mistral Large Mistral Nov 2024 128K 123B Multilingual, EU data sovereignty, coding Active
Llama 4 Maverick Meta

Mixture-of-experts. Competitive with GPT-4o and Gemini 2.0 Flash on benchmarks. 128 experts, 17B active per token. Strong multilingual support (12 languages).

Llama 4 Scout Meta

Extraordinary context window (10M tokens). 16 experts, fits on a single H100 node. Good for massive document ingestion.

DeepSeek R1 DeepSeek

Reasoning-focused model from DeepSeek. Competitive with o1 on math and coding. Open weights. Very cost-effective via DeepSeek API.

Mistral Large Mistral

European-built model. Strong at multilingual tasks, coding, and instruction following. Good for EU compliance requirements.

Quick Comparison by Use Case

Best for Chat / Assistants

Claude Sonnet 4.5 GPT-4.1 Gemini 2.5 Flash

Best for Coding

Claude Sonnet 4.5 GPT-4.1 Gemini 2.5 Pro

Best for Hard Reasoning

Claude Opus 4 o3 Gemini 2.5 Pro

Best on a Budget

GPT-4.1 Nano Gemini 2.0 Flash Claude Haiku 3.5

Best for Long Documents

GPT-4.1 (1M) Gemini 2.5 Pro (1M) Llama 4 Scout (10M)

Best for Self-Hosting

Llama 4 Maverick DeepSeek R1 Mistral Large