Skip to main content
IntermediateContacts and Data

Deduplication

How NimbusOS deduplicates contacts at platform scale through GlobalContact, IdentityGraph, and ClientAssignment, so the same human is never outreach-hit twice.

8 min read
Updated April 23, 2026
1,850 words

Deduplication in NimbusOS is enforced at three layers: within a single workspace, across workspaces on the same platform instance, and across agency sub clients. The workhorse model is GlobalContact, keyed on email, holding cross-workspace engagement history and client assignment records. IdentityGraph extends this with cross-identifier matching for phones, cookies, and device IDs. This article explains how each layer works and what happens when a duplicate is detected during import, manual entry, or webhook intake.

The Three Layers

Layer 1: Workspace Scope

Within a single workspace, duplicates are prevented by a unique constraint on (workspace_id, email) on the Contact table. You cannot have two contacts in the same workspace with the same email address. The database enforces this.

Layer 2: Platform Scope via GlobalContact

Across workspaces on the same NimbusOS instance, the GlobalContact table is the dedup ledger. One row per unique email across the platform.

When any workspace imports or creates a contact, the email is looked up in GlobalContact. If it exists, the new contact links to the existing GlobalContact row. If not, a new GlobalContact row is created.

GlobalContact carries cross-workspace engagement data:

  • total_sent: Every email ever sent to this address from any workspace
  • total_responses: Every reply received from this address
  • response_rate: Derived from the above
  • bounce_status: Aggregate bounce state from any workspace
  • last_engagement_at: Most recent engagement timestamp

This data is useful context. When your workspace is about to outreach a contact, the Contact detail shows you that the email has received 12 prior sends from other workspaces with a 0 percent response rate. That is a signal to skip.

Layer 3: Agency Scope via ClientAssignment

Agencies running multiple sub clients on the same workspace use ClientAssignment. Each GlobalContact can have one active assignment per client. An assignment records:

  • global_contact_id
  • client_id
  • priority (integer)
  • assigned_at, expires_at
  • outcome (enum: active, converted, unsubscribed, expired, released)

The rule: a contact with an active unexpired assignment to client A cannot be assigned to client B. An import that would violate this is rejected at the row level with reason client_assignment_conflict.

This prevents two sub clients within an agency from both outreach-hitting the same human. The agency must either release the assignment from client A or wait for expiration.

What "Duplicate" Actually Means

Three duplicate patterns show up in real data.

Exact email match. The cleanest case. alice@acme.com equals alice@acme.com. Hash match on lowercase email.

Casing difference. Alice@Acme.com vs alice@acme.com. NimbusOS normalizes email to lowercase before any hash, so these are the same contact.

Plus alias. alice+newsletter@acme.com vs alice@acme.com. Gmail treats these as the same inbox. NimbusOS does not canonicalize plus aliases by default because not all providers collapse them. You can enable alias normalization per workspace if you confirm the recipient provider is Gmail.

Subdomain difference. alice@mail.acme.com vs alice@acme.com. Usually two different inboxes. NimbusOS treats them as distinct.

Different email, same person. alice.smith@acme.com vs alice@acme.com. Hard case; email equality does not catch this. The manual merge flow solves it.

The Import Dedup Flow

When an ImportJob processes rows, the dedup step happens after parsing and before enrichment.

For each row:

  1. Normalize email (lowercase, trim).
  2. Query GlobalContact by email hash.
  3. If no match, create new GlobalContact and a new Contact in the current workspace. Row status: new.
  4. If match and the Contact exists in the current workspace, update the existing Contact with any non-blank fields from the import. Row status: updated.
  5. If match and no Contact exists in the current workspace, check ClientAssignment. If no active assignment, create a new Contact and optionally a new ClientAssignment. Row status: new_linked.
  6. If match and an active assignment exists to another client, reject the row. Row status: skipped_assignment_conflict.

The import job summary shows counts for each status so you can see exactly how many rows went where.

Manual Merge

Sometimes dedup misses a true duplicate because the emails differ. Two paths to manually merge.

From the Contact detail. Click Find Duplicates. The platform shows candidates scored by similarity: same first name and last name, similar LinkedIn profile, same company, overlapping engagement history. You can merge any candidate into the current contact.

From the Duplicates dashboard. A dashboard that surfaces high-likelihood duplicates across the workspace. Useful for periodic cleanup runs.

Merge behavior:

  • The older contact (by creation date) is kept as the primary.
  • The newer contact is archived but its ID becomes a redirect.
  • Tags from both are merged.
  • Custom field values: if both had a value, the primary's value wins unless you override per field during merge.
  • Engagement history is combined.
  • Sequence enrollments: active enrollments from the archived contact continue under the primary.

Merge is reversible for 7 days. After that it is permanent.

IdentityGraph for Cross-Identifier Matching

For workflows that span channels (email, phone, LinkedIn, device), IdentityGraph extends dedup beyond email.

An IdentityGraph row represents a single identity resolved across multiple IdentityIdentifier rows. Identifier types: email, phone, linkedin_url, cookie, device_id, twitter_handle.

Two contacts with the same LinkedIn URL and different emails can be linked as one identity. Two contacts with the same phone can be linked. The graph is probabilistic (confidence scored) rather than strict-match.

IdentityGraph is useful mainly for multichannel outreach where email is one of several touch points. For pure cold email workflows, GlobalContact dedup is usually enough.

Deduplication at Webhook Intake

Webhook sources (like the Apify LinkedIn Jobs Scraper) run the same dedup as file imports. Every inbound batch creates an ImportJob and flows through the same pipeline. The dedup semantics are identical.

One gotcha: some webhook sources send the same contact multiple times in a short window. This is handled with a per-source deduplication buffer that collapses duplicates in a 60 second window. If the same contact arrives in a second batch 10 minutes later, it goes through normal dedup.

Bounce Propagation

When a contact hard bounces in workspace A, the bounce is recorded on the GlobalContact. Workspace B sending to the same email inherits the bounce status and will be warned on enrollment.

The warning is non-blocking by default. You can upgrade it to a hard block in Account Settings: "Block sends to addresses with bounce_status == 'hard' from any workspace". This is strong list hygiene protection for agencies.

Suppression Propagation

Unlike bounces, global suppression is enforced across workspaces automatically. A contact who marked a workspace A email as spam appears in the is_global=true suppression list and is blocked from sending from any workspace on the platform.

Workspace-level suppression (opt out from your specific emails) does not propagate. Your opt out does not prevent another workspace from sending.

Outcome Tracking

ClientAssignment.outcome tracks what happened after the assignment. Values:

  • active: ongoing
  • converted: became a customer
  • unsubscribed: opted out
  • expired: assignment timed out without conversion
  • released: manually released by the client or agency

Outcome tracking informs assignment rules. You can configure a rule: "Do not re-assign a contact whose previous outcome was converted within the last 12 months". This prevents accidentally prospecting an existing customer.

Cleanup Patterns

Two cleanup patterns show up in agency workflows.

Dormant assignment release. Weekly job that releases ClientAssignment rows where the last engagement is over 180 days and the outcome is still active. Frees the contact for another client.

Stale GlobalContact purge. Quarterly job that deletes GlobalContact rows with no Contact references in any workspace for over 365 days. Keeps the dedup table lean.

Both are optional and configured as AutomationRule objects.

Frequently Asked Questions

Can I disable cross-workspace dedup?

Not recommended. Dedup is a protection against cross-workspace collisions. Disabling is possible on request for enterprise workspaces with specific compliance requirements that need isolation.

What happens if two workspaces import the same email simultaneously?

The GlobalContact lookup is atomic. Whichever workspace's transaction commits first gets the GlobalContact assignment. The second workspace's import sees the existing GlobalContact and creates a linked Contact.

How many identifiers can an IdentityGraph row have?

No hard limit. Typical identities have 2 to 5 identifiers. Very active multichannel prospects can have 10 or more.

Is there a way to find contacts that are likely duplicates but not caught by email match?

Yes, the Duplicates dashboard. It runs a similarity heuristic across first name, last name, company, LinkedIn URL, and phone.

Can I reassign a contact from one client to another?

Yes. Release the current assignment and create a new one. The audit log records both actions.

Useful next pages after this one: Contact Management for editing and merging contacts, Importing Contacts for the intake pipeline where dedup runs, and Segments and Filters for querying across the deduplicated contact database.

Related articles

Still stuck?

Our team answers every support ticket. If the answer is not in the docs, open a ticket and we will write the missing page.