AI Models
A reference guide to the major AI models available today — what they're good at, what they cost, and where they stand in the market.
| Model | Released | Context | Input | Output | Best For | Status |
|---|---|---|---|---|---|---|
|
Claude Opus 4.7
claude-opus-4-7
|
Mar 2026 | 1M | $5.00 | $25.00 | Complex reasoning, agentic coding, deep research synthesis | Active — current flagship |
|
Claude Sonnet 4.6
claude-sonnet-4-6
|
Jan 2026 | 1M | $3.00 | $15.00 | Production chat, structured analysis, tool-heavy workflows | Active |
|
Claude Opus 4.6
claude-opus-4-6
|
Dec 2025 | 1M | $5.00 | $25.00 | Legacy integrations awaiting migration | Superseded by Opus 4.7 |
|
Claude Haiku 4.5
claude-haiku-4-5
|
Oct 2025 | 200K | $1.00 | $5.00 | High-volume tasks, classification, batch processing, fast assistants | Active |
|
Claude Sonnet 4.5
claude-sonnet-4-5-20250929
|
Sep 2025 | 200K | $3.00 | $15.00 | Legacy integrations awaiting migration | Superseded by Sonnet 4.6 |
|
Claude Opus 4.1
claude-opus-4-1-20250805
|
Aug 2025 | 200K | $15.00 | $75.00 | Nothing — costs 3× more than 4.7 | Legacy — migrate to 4.7 |
Anthropic's current flagship. Step-change agentic coding over 4.6. 1M-token context. Notably, the Opus 4.x family dropped to $5/$25 from the original $15/$75 of Opus 4 — a 3× price cut. This is what powers the stocks pipeline's ai-findings narrative and the extended forensic "final boss" lenses.
Best balance of speed and intelligence. 1M context. Extended + adaptive thinking. 5× cheaper than Opus 4.7 on output — the natural target if you need to bring narrative costs down without dropping to Haiku.
Fastest Claude with near-frontier intelligence. Extended thinking supported. 5× cheaper than Sonnet on output. Strong for classification, extraction, batch jobs, and quick narrative.
| Model | Released | Context | Input | Output | Best For | Status |
|---|---|---|---|---|---|---|
|
GPT-5
gpt-5
|
Aug 2025 | 400K | $1.25 | $10.00 | Reasoning, agentic workflows, second-opinion critiques | Active — current flagship |
|
GPT-5 mini
gpt-5-mini
|
Aug 2025 | 400K | $0.25 | $2.00 | Production workloads, cost-conscious deployments | Active |
|
GPT-5 nano
gpt-5-nano
|
Aug 2025 | 400K | $0.05 | $0.40 | Classification, extraction, simple high-volume tasks | Active |
|
o3
o3
|
Apr 2025 | 200K | $2.00 | $8.00 | Math, science, multi-step reasoning, hard problems | Active — reasoning |
|
o4-mini
o4-mini
|
Apr 2025 | 200K | $1.10 | $4.40 | Agentic tasks, coding with reasoning, tool use | Active |
|
GPT-4.1
gpt-4.1
|
Apr 2025 | 1M | $2.00 | $8.00 | Long-context tasks where 400K is not enough | Superseded by GPT-5 |
|
GPT-4o
gpt-4o
|
May 2024 | 128K | $2.50 | $10.00 | Multimodal fallback, legacy integrations | Legacy |
OpenAI's current flagship. Strong reasoning + agentic tool use. Used in the stocks pipeline's gpt-critique step as a devil's-advocate voice on Claude's narrative. ~2.5× cheaper on output than Opus 4.7.
Cost-efficient GPT-5. Near-flagship quality at 5× the discount. Best buy in the OpenAI lineup for most production workloads.
Ultra-cheap GPT-5 variant. 25× cheaper than the flagship. Great for classification, extraction, high-volume routing.
Dedicated reasoning model. Chain-of-thought internally before answering. Output tokens include the hidden reasoning chain, so per-call cost is higher than the rate suggests.
Lightweight reasoning model. Strong agentic capabilities + tool use. Most of o3's reasoning lift at half the price.
| Model | Released | Context | Input | Output | Best For | Status |
|---|---|---|---|---|---|---|
|
Gemini 3.1 Pro (preview)
gemini-3.1-pro-preview
|
2026 preview | 1M | $2.00 / $4.00 | $12.00 / $18.00 | Frontier reasoning, multimodal, long-context analysis | Preview — newest flagship |
|
Gemini 2.5 Pro
gemini-2.5-pro
|
Mar 2025 | 1M | $1.25 / $2.50 | $10.00 / $15.00 | Code generation, multimodal, long-context analysis | Active |
|
Gemini 2.5 Flash
gemini-2.5-flash
|
Mar 2025 | 1M | $0.30 | $2.50 | Cost-efficient production, long documents, multimodal | Active |
|
Gemini 2.5 Flash-Lite
gemini-2.5-flash-lite
|
2025 | 1M | $0.10 | $0.40 | Budget workloads, classification, simple agentic tasks | Active |
These models are available to download and self-host. Pricing depends on your infrastructure — the table below shows parameter counts instead of per-token pricing. Many are also available via hosted APIs (Together, Fireworks, Groq, etc.) at competitive rates.
| Model | Provider | Released | Context | Parameters | Best For | Status |
|---|---|---|---|---|---|---|
| Llama 4 Maverick | Meta | Apr 2025 | 1M | 400B (17B active) | Self-hosted production, multilingual, cost control | Active — open source |
| Llama 4 Scout | Meta | Apr 2025 | 10M | 109B (17B active) | Ultra-long context, document ingestion, self-hosted | Active — open source |
| DeepSeek R1 | DeepSeek | Jan 2025 | 128K | 671B MoE | Math, coding, reasoning at low cost | Active — open weights |
| Mistral Large | Mistral | Nov 2024 | 128K | 123B | Multilingual, EU data sovereignty, coding | Active |
Mixture-of-experts. Competitive with GPT-4o and Gemini 2.0 Flash on benchmarks. 128 experts, 17B active per token. Strong multilingual support (12 languages).
Extraordinary context window (10M tokens). 16 experts, fits on a single H100 node. Good for massive document ingestion.
Reasoning-focused model from DeepSeek. Competitive with o1 on math and coding. Open weights. Very cost-effective via DeepSeek API.
European-built model. Strong at multilingual tasks, coding, and instruction following. Good for EU compliance requirements.