Under the hood

The AI behind every stock report

Every reviewed stock runs through a deep, multi-model AI pipeline — dozens of independent analyses, several models, and an adversarial cross-examination before a verdict. This is a plain-English look at the models we use, how AI usage is measured, and how we keep it reliable. You don't manage any of it — no keys, no quotas, no setup. We run it all.

The full model lineup

No single model is best at everything, so each step of the analysis is routed to the model that fits it. Here's the current lineup from each maker with its public list pricing, broken out model by model. = actively used in our pipeline

Anthropic Claude USD / 1M tokens · list price

Claude Sonnet 5 Intro → $3/$15 on Sep 1 Released Jun 2026

Input	$2
Output	$10
Cache write	$2.50
Cache read	$0.20
Batch (in/out)	$1 / $5
Context	1M
Max out	128k

Claude Opus 4.8 Flagship Opus · deepest Released May 2026

Input	$5
Output	$25
Cache write	$6.25
Cache read	$0.50
Batch (in/out)	$2.50 / $12.50
Context	1M
Max out	128k

Claude Opus 4.7 Lenses · findings · memo Released Apr 2026

Input	$5
Output	$25
Cache write	$6.25
Cache read	$0.50
Batch (in/out)	$2.50 / $12.50
Context	1M
Max out	128k

Claude Sonnet 4.6 Workhorse Released Feb 2026

Input	$3
Output	$15
Cache write	$3.75
Cache read	$0.30
Batch (in/out)	$1.50 / $7.50
Context	1M
Max out	128k

Claude Opus 4.6 Legacy Released Feb 2026

Input	$5
Output	$25
Cache write	$6.25
Cache read	$0.50
Batch (in/out)	$2.50 / $12.50
Context	1M
Max out	128k

Claude Opus 4.5 Legacy Released Nov 2025

Input	$5
Output	$25
Cache write	$6.25
Cache read	$0.50
Batch (in/out)	$2.50 / $12.50
Context	200k
Max out	64k

Claude Haiku 4.5 Fastest · near-frontier Released Oct 2025

Input	$1
Output	$5
Cache write	$1.25
Cache read	$0.10
Batch (in/out)	$0.50 / $2.50
Context	200k
Max out	64k

Claude Sonnet 4.5 Workhorse Released Sep 2025

Input	$3
Output	$15
Cache write	$3.75
Cache read	$0.30
Batch (in/out)	$1.50 / $7.50
Context	200k
Max out	64k

Cache 5-min write = 1.25× input · 1-hr write = 2× · read = 0.1×. Batch = 50% off in+out. Opus 4.6+ and Sonnet ship 1M-token context.

OpenAI GPT USD / 1M tokens · list price

GPT-5.5 Flagship reasoning Released Apr 2026

Input	$5
Output	$30
Cached in	$0.50
Batch (in/out)	$2.50 / $15
Context	1.05M
Max out	128k

GPT-5.5-pro Max compute Released Apr 2026

Input	$30
Output	$180
Cached in	n/a
Batch (in/out)	$15 / $90
Context	1.05M
Max out	128k

chat-latest Non-reasoning chat Released Apr 2026

Input	$5
Output	$30
Cached in	$0.50

GPT-5.4 Independent 2nd opinion Released Mar 2026

Input	$2.50
Output	$15
Cached in	$0.25
Batch (in/out)	$1.25 / $7.50
Context	1.05M
Max out	128k

GPT-5.4-pro Pro reasoning Released Mar 2026

Input	$30
Output	$180
Cached in	n/a
Batch (in/out)	$15 / $90
Context	1.05M
Max out	128k

GPT-5.4-mini 2nd-opinion fallback Released Mar 2026

Input	$0.75
Output	$4.50
Cached in	$0.075
Batch (in/out)	$0.375 / $2.25
Context	400k
Max out	128k

GPT-5.4-nano Cheapest reasoning Released Mar 2026

Input	$0.20
Output	$1.25
Cached in	$0.02
Batch (in/out)	$0.10 / $0.625
Context	400k
Max out	128k

GPT-5.3-codex Agentic coding Released Feb 2026

Input	$1.75
Output	$14
Cached in	$0.175
Context	400k
Max out	128k

o3-deep-research Deep-research agent Released Jun 2025

Batch in	$5
Batch out	$20
Standard	batch-only

o4-mini-deep-research Deep-research mini Released Jun 2025

Batch in	$1
Batch out	$4
Standard	batch-only

One discounted “cached input” rate · no separate cache-write. Batch = 50% off. >272k input on 5.5/5.4 is priced higher. Legacy GPT-4.x / o-series are off the current page.

Google Gemini USD / 1M tokens · list price

Gemini 3.5 Flash Workhorse Flash Released May 2026

Input	$1.50
Output	$9
Cached in	$0.15
Cache store	$1/hr
Batch (in/out)	$0.75 / $4.50

Gemini 3.1 Flash-Lite Cheap high-volume Released May 2026

Input	$0.25
Output	$1.50
Cached in	$0.025
Cache store	$1/hr
Batch (in/out)	$0.125 / $0.75

Gemma 4 Open weights Released Apr 2026

Input	free
Output	free

Gemini 3.1 Pro Flagship · preview Released Feb 2026

Input	$2 / $4
Output	$12 / $18
Cached in	$0.20 / $0.40
Cache store	$4.50/hr
Batch (in/out)	$1 / $6 · $2 / $9
Tier	≤200k / >200k

Gemini 2.5 Flash-Lite Cheapest 2.5 Released Jul 2025

Input	$0.10
Output	$0.40
Cached in	$0.01
Cache store	$1/hr
Batch (in/out)	$0.05 / $0.20

Gemini 2.5 Pro Prior flagship Released Jun 2025

Input	$1.25 / $2.50
Output	$10 / $15
Cached in	$0.125 / $0.25
Cache store	$4.50/hr
Tier	≤200k / >200k

Gemini 2.5 Flash Prior-gen fast Released Jun 2025

Input	$0.30
Output	$2.50
Cached in	$0.03
Cache store	$1/hr
Batch (in/out)	$0.15 / $1.25

Reference only — not in our pipeline. Pro tiers are context-tiered (≤200k / >200k). Cache adds a per-1M-tokens/hour storage charge. Context windows live on the Models page.

xAI Grok USD / 1M tokens · list price

Grok Build 0.1 Code agent · beta Released May 2026

Input	$1
Output	$2
Cached in	$0.20
Batch (in/out)	$0.50 / $1
Context	256k

Grok 4.3 Flagship Released Apr 2026

Input	$1.25
Output	$2.50
Cached in	$0.20
Batch (in/out)	$0.625 / $1.25
Context	1M

Grok 4.20 · reasoning Reasoning Released Mar 2026

Input	$1.25
Output	$2.50
Cached in	$0.20
Batch (in/out)	$0.625 / $1.25
Context	1M

Grok 4.20 · non-reasoning Non-reasoning Released Mar 2026

Input	$1.25
Output	$2.50
Cached in	$0.20
Batch (in/out)	$0.625 / $1.25
Context	1M

Grok 4.20 · multi-agent Multi-agent Released Mar 2026

Input	$1.25
Output	$2.50
Cached in	$0.20
Batch (in/out)	$0.625 / $1.25
Context	1M

Reference only. Cached-input read published; no separate cache-write. Batch = 50% of list (derived). Live-search ≈ $5 per 1,000 calls.

DeepSeek V4 USD / 1M tokens · list price

DeepSeek V4 Flash Default · 284B/13B Released Apr 2026

Input (miss)	$0.14
Input (hit)	$0.0028
Output	$0.28
Context	1M
Max out	384k

DeepSeek V4 Pro High-capability · 1.6T Released Apr 2026

Input (miss)	$0.435
Input (hit)	$0.003625
Output	$0.87
Context	1M
Max out	384k

Reference only. Separate cache-HIT vs cache-MISS input rate (huge cache discount). V4 cache-hit figures are low-confidence — re-verify before any billing math.

Mistral La Plateforme USD / 1M tokens · list price

Mistral Medium 3.5 Flagship · multimodal Released Apr 2026

Input	$1.50
Output	$7.50
Batch (in/out)	$0.75 / $3.75
Context	256k

Mistral Small 4 Hybrid instruct Released Mar 2026

Input	$0.15
Output	$0.60
Batch (in/out)	$0.075 / $0.30
Context	256k

Devstral 2 Code agent · 123B Released Dec 2025

Input	$0.40
Output	$2
Batch (in/out)	$0.20 / $1
Context	256k

Mistral Large 3 Open-weight flagship Released Dec 2025

Input	$0.50
Output	$1.50
Batch (in/out)	$0.25 / $0.75
Context	256k

Ministral 3 (3/8/14B) Edge / small Released Dec 2025

Input	$0.10–0.20
Output	$0.10–0.20
Context	128k

Magistral Medium Reasoning Released Sep 2025

Input	$2
Output	$5
Batch (in/out)	$1 / $2.50
Context	128k

Magistral Small Reasoning Released Sep 2025

Input	$0.50
Output	$1.50
Batch (in/out)	$0.25 / $0.75
Context	128k

Codestral 2508 Code completion Released Aug 2025

Input	$0.30
Output	$0.90
Batch (in/out)	$0.15 / $0.45
Context	128k

Reference only. No prompt-cache prices published. Batch = 50% off (derived). Context from model cards. Embeddings, OCR ($4/1k pages) and voice are priced separately.

List prices are each provider's own published per-million-token rates (public info) — shown for context, not Delvantic's internal cost. Verified June 2026; providers change these often.

How usage is measured: tokens

LLMs don't bill per question — they bill per token, the small chunks of text a model reads and writes. Roughly one token ≈ 4 characters, or about ¾ of a word.

~1,300

tokens in a 1,000-word document

input + output

both are counted — the data sent in and the analysis written back

tens of thousands

of tokens flow through a single full stock report

Why it matters: token usage is what makes deep analysis affordable. We feed each model only what it needs, cache repeated context (the discounted cache rates above), and reuse shared research across tickers — so a full report stays a fraction of what a brute-force "one giant prompt" approach would burn.

Rate limits & reliability

AI providers cap how fast anyone can call their models — requests per minute, tokens per minute. Hit a cap and the API briefly says "slow down" (an HTTP 429). On most platforms that's your problem to handle. On Delvantic, it's ours.

We queue & pace the work

Runs are scheduled and throttled under the hood so the pipeline stays inside provider limits instead of slamming into them.

We retry automatically

If a provider throttles a step, it backs off and retries on its own — a momentary limit doesn't sink a report.

We pick the right model

Routing routine passes to faster models keeps throughput high and leaves headroom for the heavy reasoning steps.

Nothing for you to configure. No API keys, no tiers, no quotas to upgrade. Delvantic owns the provider relationships and the infrastructure — you just read the research.