Usage Limits
Deep research projects are powered by LLM APIs (Claude, GPT, Gemini). These APIs have rate limits — caps on how much you can use per minute, per day, or per month. Understanding these limits helps you plan projects and troubleshoot failed runs.
How Rate Limits Work
Every LLM provider restricts API usage across several dimensions. When any one of these limits is exceeded, the API returns an error (HTTP 429 — "Too Many Requests") and your project run may fail or pause.
Requests Per Minute (RPM)
How many individual API calls you can make per minute. Each message sent to the model counts as one request.
Tokens Per Minute (TPM)
The total number of tokens (input + output) processed per minute. Long prompts and long responses both count.
Requests Per Day (RPD)
Some providers cap daily request volume, especially on lower tiers. This resets on a rolling 24-hour window.
Understanding Tiers
All major LLM providers use a tier system. Your tier determines your rate limits, and you advance to higher tiers by spending more with the provider. New accounts start at the lowest paid tier.
| Model | Tier 1 | Tier 2 | Tier 3 | Tier 4 |
|---|---|---|---|---|
| Claude Opus 4.6 | 50 RPM 30K input TPM 8K output TPM |
1,000 RPM 450K input TPM 90K output TPM |
2,000 RPM 800K input TPM 160K output TPM |
4,000 RPM 2M input TPM 400K output TPM |
| Claude Sonnet 4.5 | 50 RPM 30K input TPM 8K output TPM |
1,000 RPM 450K input TPM 90K output TPM |
2,000 RPM 800K input TPM 160K output TPM |
4,000 RPM 2M input TPM 400K output TPM |
| Claude Haiku 4.5 | 50 RPM 50K input TPM 10K output TPM |
1,000 RPM 450K input TPM 90K output TPM |
2,000 RPM 1M input TPM 200K output TPM |
4,000 RPM 4M input TPM 800K output TPM |
Anthropic uses a token bucket algorithm — limits replenish continuously, not on a fixed clock. Cached input tokens do not count toward your input token limit on most models, which can effectively multiply your throughput.
| Model | Tier 1 | Tier 3 | Tier 5 |
|---|---|---|---|
| GPT-4o | 500 RPM 30K TPM |
5,000 RPM 800K TPM |
10,000 RPM 30M TPM |
| GPT-4o Mini | 500 RPM 200K TPM |
5,000 RPM 4M TPM |
30,000 RPM 150M TPM |
| o3 | 500 RPM 30K TPM |
5,000 RPM 800K TPM |
10,000 RPM 150M TPM |
OpenAI also enforces monthly spend caps per tier (e.g., Tier 1 = $100/month max). Tier advancement requires both a minimum total spend and a minimum account age.
| Model | Free | Tier 1 | Tier 2 |
|---|---|---|---|
| Gemini 2.5 Pro | 5 RPM 250K TPM 100 RPD |
150 RPM 2M TPM 10K RPD |
1,000 RPM 4M TPM Unlimited RPD |
| Gemini 2.5 Flash | 15 RPM 250K TPM 250 RPD |
1,000 RPM 4M TPM 10K RPD |
2,000 RPM 4M TPM Unlimited RPD |
Google's free tier has strict daily request limits (100–250 RPD), making it unsuitable for deep research projects. Limits are applied per Google Cloud project, not per API key.
What Happens When You Hit a Limit
When a research project exceeds your API key's rate limit, here's the chain of events:
The LLM provider rejects the request with a "Too Many Requests" error and a retry-after header.
Depending on the severity, Delvantic will either retry after the cooldown period or mark the run as failed.
Check the Logs tab on your project detail page. Rate limit errors are clearly labeled so you can identify the cause.
Rate limits reset quickly (usually within 1 minute). Wait for the cooldown, then clone or re-run your project.
Tips to Avoid Hitting Limits
Upgrade your tier
The single biggest improvement. Tier 1 limits are very tight — most deep research projects need at least Tier 2. Add credits to your provider account to advance.
Run during off-peak hours
API rate limits can be more forgiving during low-traffic periods. Consider scheduling large projects for overnight or early morning runs.
Break large projects into versions
Instead of one massive prompt, iterate in versions (V1, V2, V3). Each run stays within limits and you get to steer direction between runs.
Use the right model for the job
Smaller models (Haiku, GPT-4o Mini, Flash) have higher rate limits and lower costs. Use them for exploratory V1 runs, then upgrade to Opus or GPT-4o for deep dives.
Delvantic Limits vs. Provider Limits
Delvantic
- Controls project structure, prompts, and output format
- Manages your project queue and run scheduling
- Tracks costs and token usage per project
- Does not impose token or request limits
Your LLM Provider
- Enforces RPM, TPM, and RPD limits on your API key
- Controls tier advancement based on your spend
- Returns rate limit errors (HTTP 429) when exceeded
- Sets pricing per token for input and output
Check Your Current Limits
Visit your provider's dashboard to see your current tier and rate limits: