AI Models
A reference guide to the major AI models available today — what they're good at, what they cost, and where they stand in the market.
| Model | Released | Context | Input | Output | Best For | Status |
|---|---|---|---|---|---|---|
|
Claude Opus 4
claude-opus-4-20250514
|
May 2025 | 200K | $15.00 | $75.00 | Complex analysis, research synthesis, application engineering | Active |
|
Claude Sonnet 4
claude-sonnet-4-20250514
|
May 2025 | 200K | $3.00 | $15.00 | General-purpose, coding, conversational AI, chat assistants | Active |
|
Claude Sonnet 4.5
claude-sonnet-4-5-20250929
|
Sep 2025 | 200K | $3.00 | $15.00 | Production chat applications, tool-heavy workflows | Active |
|
Claude Haiku 3.5
claude-haiku-3-5-20241022
|
Oct 2024 | 200K | $0.80 | $4.00 | High-volume tasks, classification, routing, quick answers | Active |
|
Claude 3 Opus
claude-3-opus-20240229
|
Mar 2024 | 200K | $15.00 | $75.00 | Legacy integrations | Superseded by Claude Opus 4 |
Most capable Claude model. Excels at complex reasoning, multi-step analysis, nuanced writing, and code generation. Highest accuracy on benchmarks.
Strong balance of capability and cost. Excellent at coding, analysis, and conversational tasks. Faster than Opus with near-comparable quality on most tasks.
Improved reasoning and instruction-following over Sonnet 4. Better at structured output, tool use, and complex multi-turn conversations.
Fastest Claude model. Strong for its price tier. Good at classification, extraction, and simple Q&A. Lower quality on complex reasoning.
| Model | Released | Context | Input | Output | Best For | Status |
|---|---|---|---|---|---|---|
|
GPT-4.1
gpt-4.1
|
Apr 2025 | 1M | $2.00 | $8.00 | Coding, long documents, general-purpose | Active — newest flagship |
|
GPT-4.1 Mini
gpt-4.1-mini
|
Apr 2025 | 1M | $0.40 | $1.60 | Production workloads, cost-conscious deployments | Active |
|
GPT-4.1 Nano
gpt-4.1-nano
|
Apr 2025 | 1M | $0.10 | $0.40 | Classification, extraction, high-volume simple tasks | Active |
|
GPT-4o
gpt-4o
|
May 2024 | 128K | $2.50 | $10.00 | Multimodal tasks, proven production stability | Active — being superseded by GPT-4.1 |
|
GPT-4o Mini
gpt-4o-mini
|
Jul 2024 | 128K | $0.15 | $0.60 | Budget workloads, prototyping, high-volume | Active — being superseded by GPT-4.1 Mini |
|
o3
o3
|
Apr 2025 | 200K | $10.00 | $40.00 | Math, science, complex reasoning, hard problems | Active — premium reasoning |
|
o3-mini
o3-mini
|
Jan 2025 | 200K | $1.10 | $4.40 | Moderate reasoning tasks, cost-efficient problem solving | Active |
|
o4-mini
o4-mini
|
Apr 2025 | 200K | $1.10 | $4.40 | Agentic tasks, coding with reasoning, tool use | Active — newest reasoning |
Latest OpenAI flagship. Excellent at coding, instruction following, and long-context tasks. 1M token context window. Strong all-around performer.
Cost-efficient version of GPT-4.1. Good balance of quality and speed. Strong for most production workloads that don't need peak reasoning.
Ultra-lightweight. Fastest and cheapest GPT-4.1 variant. Good for simple tasks where latency and cost matter more than depth.
Dedicated reasoning model. Uses chain-of-thought internally before answering. Strongest on math, science, logic, and complex multi-step problems.
Lightweight reasoning model. Good for tasks that benefit from step-by-step thinking without the full o3 cost.
Latest small reasoning model. Improved over o3-mini on coding and tool use. Good agentic capabilities.
| Model | Released | Context | Input | Output | Best For | Status |
|---|---|---|---|---|---|---|
|
Gemini 2.5 Pro
gemini-2.5-pro
|
Mar 2025 | 1M | $1.25 / $2.50 | $10.00 / $15.00 | Code generation, multimodal, long-context analysis | Active — flagship |
|
Gemini 2.5 Flash
gemini-2.5-flash
|
Mar 2025 | 1M | $0.15 | $0.60 | Cost-efficient production, long documents, multimodal | Active |
|
Gemini 2.0 Flash
gemini-2.0-flash
|
Feb 2025 | 1M | $0.10 | $0.40 | Budget workloads, agentic tasks | Active — being superseded by 2.5 Flash |
These models are available to download and self-host. Pricing depends on your infrastructure — the table below shows parameter counts instead of per-token pricing. Many are also available via hosted APIs (Together, Fireworks, Groq, etc.) at competitive rates.
| Model | Provider | Released | Context | Parameters | Best For | Status |
|---|---|---|---|---|---|---|
| Llama 4 Maverick | Meta | Apr 2025 | 1M | 400B (17B active) | Self-hosted production, multilingual, cost control | Active — open source |
| Llama 4 Scout | Meta | Apr 2025 | 10M | 109B (17B active) | Ultra-long context, document ingestion, self-hosted | Active — open source |
| DeepSeek R1 | DeepSeek | Jan 2025 | 128K | 671B MoE | Math, coding, reasoning at low cost | Active — open weights |
| Mistral Large | Mistral | Nov 2024 | 128K | 123B | Multilingual, EU data sovereignty, coding | Active |
Mixture-of-experts. Competitive with GPT-4o and Gemini 2.0 Flash on benchmarks. 128 experts, 17B active per token. Strong multilingual support (12 languages).
Extraordinary context window (10M tokens). 16 experts, fits on a single H100 node. Good for massive document ingestion.
Reasoning-focused model from DeepSeek. Competitive with o1 on math and coding. Open weights. Very cost-effective via DeepSeek API.
European-built model. Strong at multilingual tasks, coding, and instruction following. Good for EU compliance requirements.