AI Models

A reference guide to the major AI models available today — what they're good at, what they cost, and where they stand in the market.

Last updated: May 12, 2026 AI models change fast. Pricing, capabilities, and availability may have shifted since this page was last reviewed. Next review scheduled for July 2026.
Pricing is per 1 million tokens (roughly 750,000 words). Prices shown are standard API rates as of May 12, 2026. Cached/batch pricing is often 50–90% cheaper. Always check provider docs for current rates.

Anthropic (Claude)

Prices as of 2026-05-12 Docs
Model Released Context Input Output Best For Status
Claude Opus 4.7
claude-opus-4-7
Mar 2026 1M $5.00 $25.00 Complex reasoning, agentic coding, deep research synthesis Active — current flagship
Claude Sonnet 4.6
claude-sonnet-4-6
Jan 2026 1M $3.00 $15.00 Production chat, structured analysis, tool-heavy workflows Active
Claude Opus 4.6
claude-opus-4-6
Dec 2025 1M $5.00 $25.00 Legacy integrations awaiting migration Superseded by Opus 4.7
Claude Haiku 4.5
claude-haiku-4-5
Oct 2025 200K $1.00 $5.00 High-volume tasks, classification, batch processing, fast assistants Active
Claude Sonnet 4.5
claude-sonnet-4-5-20250929
Sep 2025 200K $3.00 $15.00 Legacy integrations awaiting migration Superseded by Sonnet 4.6
Claude Opus 4.1
claude-opus-4-1-20250805
Aug 2025 200K $15.00 $75.00 Nothing — costs 3× more than 4.7 Legacy — migrate to 4.7
Claude Opus 4.7

Anthropic's current flagship. Step-change agentic coding over 4.6. 1M-token context. Notably, the Opus 4.x family dropped to $5/$25 from the original $15/$75 of Opus 4 — a 3× price cut. This is what powers the stocks pipeline's ai-findings narrative and the extended forensic "final boss" lenses.

Claude Sonnet 4.6

Best balance of speed and intelligence. 1M context. Extended + adaptive thinking. 5× cheaper than Opus 4.7 on output — the natural target if you need to bring narrative costs down without dropping to Haiku.

Claude Haiku 4.5

Fastest Claude with near-frontier intelligence. Extended thinking supported. 5× cheaper than Sonnet on output. Strong for classification, extraction, batch jobs, and quick narrative.

OpenAI

Prices as of 2026-05-12 Docs
Model Released Context Input Output Best For Status
GPT-5
gpt-5
Aug 2025 400K $1.25 $10.00 Reasoning, agentic workflows, second-opinion critiques Active — current flagship
GPT-5 mini
gpt-5-mini
Aug 2025 400K $0.25 $2.00 Production workloads, cost-conscious deployments Active
GPT-5 nano
gpt-5-nano
Aug 2025 400K $0.05 $0.40 Classification, extraction, simple high-volume tasks Active
o3
o3
Apr 2025 200K $2.00 $8.00 Math, science, multi-step reasoning, hard problems Active — reasoning
o4-mini
o4-mini
Apr 2025 200K $1.10 $4.40 Agentic tasks, coding with reasoning, tool use Active
GPT-4.1
gpt-4.1
Apr 2025 1M $2.00 $8.00 Long-context tasks where 400K is not enough Superseded by GPT-5
GPT-4o
gpt-4o
May 2024 128K $2.50 $10.00 Multimodal fallback, legacy integrations Legacy
Reasoning models (o-series) work differently from standard chat models. They "think" internally before answering, using more tokens but producing more accurate results on hard problems. Pricing reflects the additional compute — output tokens include the model's hidden reasoning chain.
GPT-5

OpenAI's current flagship. Strong reasoning + agentic tool use. Used in the stocks pipeline's gpt-critique step as a devil's-advocate voice on Claude's narrative. ~2.5× cheaper on output than Opus 4.7.

GPT-5 mini

Cost-efficient GPT-5. Near-flagship quality at 5× the discount. Best buy in the OpenAI lineup for most production workloads.

GPT-5 nano

Ultra-cheap GPT-5 variant. 25× cheaper than the flagship. Great for classification, extraction, high-volume routing.

o3

Dedicated reasoning model. Chain-of-thought internally before answering. Output tokens include the hidden reasoning chain, so per-call cost is higher than the rate suggests.

o4-mini

Lightweight reasoning model. Strong agentic capabilities + tool use. Most of o3's reasoning lift at half the price.

Google (Gemini)

Prices as of 2026-05-12 Docs
Model Released Context Input Output Best For Status
Gemini 3.1 Pro (preview)
gemini-3.1-pro-preview
2026 preview 1M $2.00 / $4.00 $12.00 / $18.00 Frontier reasoning, multimodal, long-context analysis Preview — newest flagship
Gemini 2.5 Pro
gemini-2.5-pro
Mar 2025 1M $1.25 / $2.50 $10.00 / $15.00 Code generation, multimodal, long-context analysis Active
Gemini 2.5 Flash
gemini-2.5-flash
Mar 2025 1M $0.30 $2.50 Cost-efficient production, long documents, multimodal Active
Gemini 2.5 Flash-Lite
gemini-2.5-flash-lite
2025 1M $0.10 $0.40 Budget workloads, classification, simple agentic tasks Active
Gemini pricing has two tiers based on context length: standard (<200K tokens) and long (>200K tokens). The dual prices shown for Gemini 2.5 Pro reflect standard / long context rates. A generous free tier is available for experimentation.

Open Source / Open Weights

These models are available to download and self-host. Pricing depends on your infrastructure — the table below shows parameter counts instead of per-token pricing. Many are also available via hosted APIs (Together, Fireworks, Groq, etc.) at competitive rates.

Model Provider Released Context Parameters Best For Status
Llama 4 Maverick Meta Apr 2025 1M 400B (17B active) Self-hosted production, multilingual, cost control Active — open source
Llama 4 Scout Meta Apr 2025 10M 109B (17B active) Ultra-long context, document ingestion, self-hosted Active — open source
DeepSeek R1 DeepSeek Jan 2025 128K 671B MoE Math, coding, reasoning at low cost Active — open weights
Mistral Large Mistral Nov 2024 128K 123B Multilingual, EU data sovereignty, coding Active
Llama 4 Maverick Meta

Mixture-of-experts. Competitive with GPT-4o and Gemini 2.0 Flash on benchmarks. 128 experts, 17B active per token. Strong multilingual support (12 languages).

Llama 4 Scout Meta

Extraordinary context window (10M tokens). 16 experts, fits on a single H100 node. Good for massive document ingestion.

DeepSeek R1 DeepSeek

Reasoning-focused model from DeepSeek. Competitive with o1 on math and coding. Open weights. Very cost-effective via DeepSeek API.

Mistral Large Mistral

European-built model. Strong at multilingual tasks, coding, and instruction following. Good for EU compliance requirements.

Quick Comparison by Use Case

Best for Chat / Assistants

Claude Sonnet 4.6 GPT-5 Gemini 2.5 Flash

Best for Coding

Claude Opus 4.7 Claude Sonnet 4.6 GPT-5

Best for Hard Reasoning

Claude Opus 4.7 GPT-5 + o3 Gemini 3.1 Pro

Best on a Budget

GPT-5 nano Gemini 2.5 Flash-Lite Claude Haiku 4.5

Best for Long Documents

Claude Opus 4.7 (1M) Claude Sonnet 4.6 (1M) Llama 4 Scout (10M)

Best for Self-Hosting

Llama 4 Maverick DeepSeek R1 Mistral Large