AI Models

A reference guide to the major AI models available today — what they're good at, what they cost, and where they stand in the market.

Last updated: April 16, 2026 AI models change fast. Pricing, capabilities, and availability may have shifted since this page was last reviewed. Next review scheduled for June 2026.

Pricing is per 1 million tokens (roughly 750,000 words). Prices shown are standard API rates as of April 16, 2026. Cached/batch pricing is often 50–90% cheaper. Always check provider docs for current rates.

Anthropic (Claude)

Prices as of 2026-04-16 Docs

Model	Released	Context	Input	Output	Best For	Status
Claude Opus 4 claude-opus-4-20250514	May 2025	200K	$15.00	$75.00	Complex analysis, research synthesis, application engineering	Active
Claude Sonnet 4 claude-sonnet-4-20250514	May 2025	200K	$3.00	$15.00	General-purpose, coding, conversational AI, chat assistants	Active
Claude Sonnet 4.5 claude-sonnet-4-5-20250929	Sep 2025	200K	$3.00	$15.00	Production chat applications, tool-heavy workflows	Active
Claude Haiku 3.5 claude-haiku-3-5-20241022	Oct 2024	200K	$0.80	$4.00	High-volume tasks, classification, routing, quick answers	Active
Claude 3 Opus claude-3-opus-20240229	Mar 2024	200K	$15.00	$75.00	Legacy integrations	Superseded by Claude Opus 4

Claude Opus 4

Most capable Claude model. Excels at complex reasoning, multi-step analysis, nuanced writing, and code generation. Highest accuracy on benchmarks.

Claude Sonnet 4

Strong balance of capability and cost. Excellent at coding, analysis, and conversational tasks. Faster than Opus with near-comparable quality on most tasks.

Claude Sonnet 4.5

Improved reasoning and instruction-following over Sonnet 4. Better at structured output, tool use, and complex multi-turn conversations.

Claude Haiku 3.5

Fastest Claude model. Strong for its price tier. Good at classification, extraction, and simple Q&A. Lower quality on complex reasoning.

OpenAI

Prices as of 2026-04-16 Docs

Model	Released	Context	Input	Output	Best For	Status
GPT-4.1 gpt-4.1	Apr 2025	1M	$2.00	$8.00	Coding, long documents, general-purpose	Active — newest flagship
GPT-4.1 Mini gpt-4.1-mini	Apr 2025	1M	$0.40	$1.60	Production workloads, cost-conscious deployments	Active
GPT-4.1 Nano gpt-4.1-nano	Apr 2025	1M	$0.10	$0.40	Classification, extraction, high-volume simple tasks	Active
GPT-4o gpt-4o	May 2024	128K	$2.50	$10.00	Multimodal tasks, proven production stability	Active — being superseded by GPT-4.1
GPT-4o Mini gpt-4o-mini	Jul 2024	128K	$0.15	$0.60	Budget workloads, prototyping, high-volume	Active — being superseded by GPT-4.1 Mini
o3 o3	Apr 2025	200K	$10.00	$40.00	Math, science, complex reasoning, hard problems	Active — premium reasoning
o3-mini o3-mini	Jan 2025	200K	$1.10	$4.40	Moderate reasoning tasks, cost-efficient problem solving	Active
o4-mini o4-mini	Apr 2025	200K	$1.10	$4.40	Agentic tasks, coding with reasoning, tool use	Active — newest reasoning

Reasoning models (o-series) work differently from standard chat models. They "think" internally before answering, using more tokens but producing more accurate results on hard problems. Pricing reflects the additional compute — output tokens include the model's hidden reasoning chain.

GPT-4.1

Latest OpenAI flagship. Excellent at coding, instruction following, and long-context tasks. 1M token context window. Strong all-around performer.

GPT-4.1 Mini

Cost-efficient version of GPT-4.1. Good balance of quality and speed. Strong for most production workloads that don't need peak reasoning.

GPT-4.1 Nano

Ultra-lightweight. Fastest and cheapest GPT-4.1 variant. Good for simple tasks where latency and cost matter more than depth.

Dedicated reasoning model. Uses chain-of-thought internally before answering. Strongest on math, science, logic, and complex multi-step problems.

o3-mini

Lightweight reasoning model. Good for tasks that benefit from step-by-step thinking without the full o3 cost.

o4-mini

Latest small reasoning model. Improved over o3-mini on coding and tool use. Good agentic capabilities.

Google (Gemini)

Prices as of 2026-04-16 Docs

Model	Released	Context	Input	Output	Best For	Status
Gemini 2.5 Pro gemini-2.5-pro	Mar 2025	1M	$1.25 / $2.50	$10.00 / $15.00	Code generation, multimodal, long-context analysis	Active — flagship
Gemini 2.5 Flash gemini-2.5-flash	Mar 2025	1M	$0.15	$0.60	Cost-efficient production, long documents, multimodal	Active
Gemini 2.0 Flash gemini-2.0-flash	Feb 2025	1M	$0.10	$0.40	Budget workloads, agentic tasks	Active — being superseded by 2.5 Flash

Gemini pricing has two tiers based on context length: standard (<200K tokens) and long (>200K tokens). The dual prices shown for Gemini 2.5 Pro reflect standard / long context rates. A generous free tier is available for experimentation.

Open Source / Open Weights

These models are available to download and self-host. Pricing depends on your infrastructure — the table below shows parameter counts instead of per-token pricing. Many are also available via hosted APIs (Together, Fireworks, Groq, etc.) at competitive rates.

Model	Provider	Released	Context	Parameters	Best For	Status
Llama 4 Maverick	Meta	Apr 2025	1M	400B (17B active)	Self-hosted production, multilingual, cost control	Active — open source
Llama 4 Scout	Meta	Apr 2025	10M	109B (17B active)	Ultra-long context, document ingestion, self-hosted	Active — open source
DeepSeek R1	DeepSeek	Jan 2025	128K	671B MoE	Math, coding, reasoning at low cost	Active — open weights
Mistral Large	Mistral	Nov 2024	128K	123B	Multilingual, EU data sovereignty, coding	Active

Llama 4 Maverick Meta

Mixture-of-experts. Competitive with GPT-4o and Gemini 2.0 Flash on benchmarks. 128 experts, 17B active per token. Strong multilingual support (12 languages).

Llama 4 Scout Meta

Extraordinary context window (10M tokens). 16 experts, fits on a single H100 node. Good for massive document ingestion.

DeepSeek R1 DeepSeek

Reasoning-focused model from DeepSeek. Competitive with o1 on math and coding. Open weights. Very cost-effective via DeepSeek API.

Mistral Large Mistral

European-built model. Strong at multilingual tasks, coding, and instruction following. Good for EU compliance requirements.

Quick Comparison by Use Case

Best for Chat / Assistants

Claude Sonnet 4.5 GPT-4.1 Gemini 2.5 Flash

Best for Coding

Claude Sonnet 4.5 GPT-4.1 Gemini 2.5 Pro

Best for Hard Reasoning

Claude Opus 4 o3 Gemini 2.5 Pro

Best on a Budget

GPT-4.1 Nano Gemini 2.0 Flash Claude Haiku 3.5

Best for Long Documents

GPT-4.1 (1M) Gemini 2.5 Pro (1M) Llama 4 Scout (10M)

Best for Self-Hosting

Llama 4 Maverick DeepSeek R1 Mistral Large

View rate limits & tier details →