Usage Limits

Deep research projects are powered by LLM APIs (Claude, GPT, Gemini). These APIs have rate limits — caps on how much you can use per minute, per day, or per month. Understanding these limits helps you plan projects and troubleshoot failed runs.

Why this matters: When a research project fails mid-run, the most common cause is hitting an API rate limit. Your Delvantic account doesn't impose these limits — they come directly from your LLM provider (Anthropic, OpenAI, or Google) based on your API key's tier.

How Rate Limits Work

Every LLM provider restricts API usage across several dimensions. When any one of these limits is exceeded, the API returns an error (HTTP 429 — "Too Many Requests") and your project run may fail or pause.

Requests Per Minute (RPM)

How many individual API calls you can make per minute. Each message sent to the model counts as one request.

Tokens Per Minute (TPM)

The total number of tokens (input + output) processed per minute. Long prompts and long responses both count.

Requests Per Day (RPD)

Some providers cap daily request volume, especially on lower tiers. This resets on a rolling 24-hour window.

What's a token? Tokens are the units LLMs use to process text. Roughly: 1 token ≈ 4 characters or ¾ of a word. A 1,000-word document is about 1,300 tokens. A deep research project might use 50,000–200,000+ tokens depending on complexity.

Understanding Tiers

All major LLM providers use a tier system. Your tier determines your rate limits, and you advance to higher tiers by spending more with the provider. New accounts start at the lowest paid tier.

Data last verified: February 2026 — Providers may update limits at any time. Check the links below for the latest.
Anthropic (Claude) View your limits
Tier 1
$5 credit purchase
Tier 2
$40 spent
Tier 3
$200 spent
Tier 4
$400 spent
Model Tier 1 Tier 2 Tier 3 Tier 4
Claude Opus 4.6 50 RPM
30K input TPM
8K output TPM
1,000 RPM
450K input TPM
90K output TPM
2,000 RPM
800K input TPM
160K output TPM
4,000 RPM
2M input TPM
400K output TPM
Claude Sonnet 4.5 50 RPM
30K input TPM
8K output TPM
1,000 RPM
450K input TPM
90K output TPM
2,000 RPM
800K input TPM
160K output TPM
4,000 RPM
2M input TPM
400K output TPM
Claude Haiku 4.5 50 RPM
50K input TPM
10K output TPM
1,000 RPM
450K input TPM
90K output TPM
2,000 RPM
1M input TPM
200K output TPM
4,000 RPM
4M input TPM
800K output TPM

Anthropic uses a token bucket algorithm — limits replenish continuously, not on a fixed clock. Cached input tokens do not count toward your input token limit on most models, which can effectively multiply your throughput.

OpenAI (GPT) View your limits
Tier 1
$5 paid
Tier 2
$50 + 7 days
Tier 3
$100 + 7 days
Tier 4
$250 + 14 days
Tier 5
$1,000 + 30 days
Model Tier 1 Tier 3 Tier 5
GPT-4o 500 RPM
30K TPM
5,000 RPM
800K TPM
10,000 RPM
30M TPM
GPT-4o Mini 500 RPM
200K TPM
5,000 RPM
4M TPM
30,000 RPM
150M TPM
o3 500 RPM
30K TPM
5,000 RPM
800K TPM
10,000 RPM
150M TPM

OpenAI also enforces monthly spend caps per tier (e.g., Tier 1 = $100/month max). Tier advancement requires both a minimum total spend and a minimum account age.

Google (Gemini) View your limits
Free
Eligible regions
Tier 1
Billing enabled
Tier 2
$250 + 30 days
Tier 3
$1,000 + 30 days
Model Free Tier 1 Tier 2
Gemini 2.5 Pro 5 RPM
250K TPM
100 RPD
150 RPM
2M TPM
10K RPD
1,000 RPM
4M TPM
Unlimited RPD
Gemini 2.5 Flash 15 RPM
250K TPM
250 RPD
1,000 RPM
4M TPM
10K RPD
2,000 RPM
4M TPM
Unlimited RPD

Google's free tier has strict daily request limits (100–250 RPD), making it unsuitable for deep research projects. Limits are applied per Google Cloud project, not per API key.

What Happens When You Hit a Limit

When a research project exceeds your API key's rate limit, here's the chain of events:

1
The API returns HTTP 429

The LLM provider rejects the request with a "Too Many Requests" error and a retry-after header.

2
The project run pauses or fails

Depending on the severity, Delvantic will either retry after the cooldown period or mark the run as failed.

3
Your run log captures the error

Check the Logs tab on your project detail page. Rate limit errors are clearly labeled so you can identify the cause.

4
Wait, then re-run

Rate limits reset quickly (usually within 1 minute). Wait for the cooldown, then clone or re-run your project.

Tips to Avoid Hitting Limits

Upgrade your tier

The single biggest improvement. Tier 1 limits are very tight — most deep research projects need at least Tier 2. Add credits to your provider account to advance.

Run during off-peak hours

API rate limits can be more forgiving during low-traffic periods. Consider scheduling large projects for overnight or early morning runs.

Break large projects into versions

Instead of one massive prompt, iterate in versions (V1, V2, V3). Each run stays within limits and you get to steer direction between runs.

Use the right model for the job

Smaller models (Haiku, GPT-4o Mini, Flash) have higher rate limits and lower costs. Use them for exploratory V1 runs, then upgrade to Opus or GPT-4o for deep dives.

Delvantic Limits vs. Provider Limits

Delvantic

  • Controls project structure, prompts, and output format
  • Manages your project queue and run scheduling
  • Tracks costs and token usage per project
  • Does not impose token or request limits

Your LLM Provider

  • Enforces RPM, TPM, and RPD limits on your API key
  • Controls tier advancement based on your spend
  • Returns rate limit errors (HTTP 429) when exceeded
  • Sets pricing per token for input and output
Your API key, your limits. Delvantic uses the API key you provide in your account settings. Your rate limits and costs are determined entirely by your relationship with the LLM provider — not by Delvantic. We simply pass your requests through and report back.

Check Your Current Limits

Visit your provider's dashboard to see your current tier and rate limits: