Skip to main content
DeveloperAPI Reference

Rate Limits

NimbusOS API rate limits by authentication type and endpoint category, with 429 handling, Retry-After headers, and best practices for high-volume integrations.

8 min read
Updated April 23, 2026
1,750 words

Rate limits protect the NimbusOS platform and every workspace on it from runaway integrations. Limits are applied at three layers: per authentication credential, per endpoint category, and per workspace aggregate. This article covers the limits, the headers that expose remaining quota, the 429 handling pattern, and the best practices for integrations that need sustained throughput.

The Three Limit Layers

Layer 1: Per Credential

Every authenticated request counts against the credential it came in with.

User JWT. 120 requests per minute per user. Covers all endpoints collectively.

API key, default. 600 requests per minute per key. Higher than user JWT because API keys are used by server-side integrations that tend to burst.

API key, enterprise tier. Configurable up to 3,000 per minute. Provisioned per workspace by support.

Layer 2: Per Endpoint Category

On top of the per credential limit, specific endpoint categories have their own caps to prevent imbalance.

Read endpoints (GET). Generous. Up to the full per-credential limit.

Write endpoints (POST, PATCH, DELETE). 300 per minute per credential for API keys, 60 for user JWTs.

Bulk endpoints. 30 per minute per credential. Each bulk request can carry up to 1,000 sub-operations, so the effective throughput is 30,000 operations per minute.

Analytics endpoints. 120 per minute per credential. Analytics aggregations are expensive; the lower limit prevents dashboard scripts from hitting the DB hard.

Campaign launch and resume. 10 per minute per workspace. Expensive operations with downstream pipeline effects.

Layer 3: Per Workspace Aggregate

Protects the platform from a single workspace overwhelming shared resources. Very hard to hit in practice.

Total requests. 5,000 per minute per workspace. Summed across all credentials.

Write operations. 2,000 per minute per workspace.

Import jobs running concurrently. 5 per workspace. Queued after that.

Active webhook endpoints. 20 per workspace.

Rate Limit Headers

Every response includes rate limit headers.

X-RateLimit-Limit: 600
X-RateLimit-Remaining: 547
X-RateLimit-Reset: 1714150800
  • X-RateLimit-Limit: your credential's per-minute limit.
  • X-RateLimit-Remaining: requests remaining in the current window.
  • X-RateLimit-Reset: Unix timestamp when the window resets.

The headers reflect the most-constrained limit that applies to the request. If the endpoint-category limit is tighter than the per-credential limit, the header shows the category limit.

429 Responses

Exceeding the limit returns 429 with a response body:

{
  "error": {
    "code": "rate_limited",
    "message": "Rate limit exceeded",
    "limit": 600,
    "retry_after_seconds": 23
  }
}

The Retry-After header is also set. Honor it.

Best Practices for Rate Limit Management

Pattern 1: Exponential backoff on 429

import time, requests

def call_api(url, headers):
    for attempt in range(5):
        resp = requests.get(url, headers=headers)
        if resp.status_code != 429:
            return resp
        retry_after = int(resp.headers.get("Retry-After", "1"))
        time.sleep(retry_after * (2 ** attempt))
    raise RuntimeError("Rate limit exceeded persistently")

Works but not ideal. Better patterns below.

Pattern 2: Proactive throttling

Read the rate limit headers and slow down before hitting 429.

def call_and_throttle(url, headers):
    resp = requests.get(url, headers=headers)
    remaining = int(resp.headers.get("X-RateLimit-Remaining", "600"))
    limit = int(resp.headers.get("X-RateLimit-Limit", "600"))
    if remaining < limit * 0.1:
        time.sleep(2)  # Slow down when near limit
    return resp

Keeps throughput high while avoiding the hard limit.

Pattern 3: Bulk endpoints instead of loops

Instead of iterating contacts and calling POST /api/v1/contacts/ per contact, use POST /api/v1/contacts/bulk/ with up to 1,000 contacts at a time. A single bulk call counts as one request against the rate limit but processes many records.

A naive loop at 10k contacts hits 10k requests. A bulk loop at 10 bulk requests (1,000 each) hits 10 requests. The difference is orders of magnitude.

Pattern 4: Webhook subscriptions over polling

Polling for contact updates at 1 request per second burns 60 requests per minute. Subscribing to contact.updated webhook consumes zero API budget. Prefer webhooks for event-driven workflows.

Pattern 5: Caching

Cache read results that do not change frequently. Contact lookups, campaign definitions, and workspace settings are good cache targets. TTL 5 to 15 minutes is usually safe for these.

Pattern 6: Separate credentials per integration

One API key per integration. Each integration has its own rate limit budget. A runaway integration does not take down the others.

Specific Operation Notes

Large imports

A CSV import of 100,000 rows does not count as 100,000 API requests. The import endpoint takes one request to upload and start the job; the actual row processing happens on the server. Use imports for large ingests, not the contacts create endpoint.

Enrichment jobs

Similar to imports. One request starts the job; the per-contact enrichment happens server-side. Does not consume per-request rate limit for the individual enrichments.

Analytics dashboards

If you are building a dashboard that refreshes every 5 minutes and pulls 10 analytics endpoints, that is 120 analytics calls per hour. Well within the limit, but if you scale the dashboard to many workspaces, each workspace consumes its own budget.

For very large dashboards consider the streaming analytics option (enterprise tier) which pushes updates via webhook instead of requiring polling.

Webhook retries

Webhook delivery is outbound from NimbusOS and does not count against your rate limit. Your endpoint being slow does not affect your ability to make other API calls.

Rate Limit Debugging

The Integrations -> API Keys page shows per-key usage: requests in the last minute, hour, day. Useful for spotting:

  • Which integration is consuming the most budget
  • Whether usage is spiky or sustained
  • Whether you are approaching the limit under normal load

Per-endpoint usage breakdown is available on request for enterprise workspaces.

Burst Allowances

The rate limit is a sliding window, not a fixed-bucket reset. A 600 per minute limit means "no more than 600 requests in any 60-second rolling window", not "600 requests between 10:00 and 10:01 then 600 more between 10:01 and 10:02".

Burst traffic is tolerated up to a point. 100 requests in a single second is fine. 600 requests in 10 seconds is fine. But if you sustain over 10 per second for the full minute, you will hit 429.

Raising Rate Limits

On the default plan, limits are fixed. On the enterprise tier, limits are provisioned per workspace based on usage patterns.

Common triggers for requesting a raise:

  • You are running an import pipeline that needs 10k+ creates per minute.
  • You are building a dashboard that polls every 30 seconds across 100 workspaces.
  • You are doing a migration from another platform and need to cold-start with millions of records.

Contact support with the use case and the expected sustained throughput.

Rate Limits and Webhooks

Webhooks are bidirectional. Inbound from providers has its own rate handling (the provider's limits). Outbound to your endpoints does not count against your rate limit.

However, your endpoint's response time matters. If your endpoint is slow, delivery latency increases, which is a different problem from rate limiting.

Error Handling Summary

| Status | Meaning | Action | | ------ | ------- | ------ | | 200/201/204 | Success | Continue | | 429 | Rate limited | Wait for Retry-After, then retry | | 5xx | Server error | Retry with backoff, up to 3 attempts | | 4xx other | Client error | Do not retry; fix the request |

A well-built integration handles all four cases without operator intervention.

Frequently Asked Questions

Do read-only endpoints consume rate limit?

Yes, but they have generous limits. The per-credential limit applies uniformly; the per-category limit is higher for reads than writes.

Is there a way to pre-allocate capacity?

No. Rate limits are dynamic. For guaranteed capacity, use the enterprise tier with dedicated allocation.

Can I check my rate limit without making a request?

Yes. GET /api/v1/auth/rate-limit-status/ returns the current limit and usage for the authenticated credential without counting against the limit.

Does canceling a request help?

No. If the server has accepted the request and started processing, it counts. Abort your client-side wait, but do not expect the server to roll back the count.

How do limits apply across API keys in the same workspace?

Each API key has its own budget. The workspace aggregate limit applies on top. Splitting into separate keys gives each integration its own budget but contributes to the aggregate.

Useful next pages after this one: Authentication for the token and API key model, Contacts API for the endpoint patterns, and Webhooks API for the alternative to polling.

Related articles

Still stuck?

Our team answers every support ticket. If the answer is not in the docs, open a ticket and we will write the missing page.