Data De-identification Services

Get critical data de-identified & anonymized by credentialed & certified domain experts

Data De-identification & Anonymization

The process of data de-identification, data masking, and data anonymization ensure the removal of all PHI/PII (personal health/identifying information) such as names and social security numbers that may directly or indirectly connect an individual to their data. Moreover, Shaip also provides proprietary APIs that can anonymize sensitive data in text content with extremely high accuracy. Our APIs then leverage the de-identification process to transform, mask, delete, or otherwise obscure the sensitive data.

Personal Identifiable Information (PII)

Personal Identifiable Information (PII)

Personally Identifiable Information (PII) is personally identifiable information is any data that can contact, locate, or identify a specific individual. Few of the data elements that might be used to identify an individual include:

PII includes: name, email, home address, phone #
If Standalone If paired with another identifier
Social Security Number Citizenship or Immigration status
Driver’s License or State ID Mother’s Maiden name
Passport Number Ethnic or religious affiliation
Alien Registration Number Sexual orientation
Financial Account Number Account Passwords
Biometric Identifiers Last 4 digits of SSN
Telephone numbers Date of birth
Email addresses Criminal History
Full face pictures

Protected Health Information (PHI)

Protected Health Information (PHI)

Protected Health Information (PHI) is any data that can contact, locate, or identify a specific individual.

  • Medical images, records, health plan beneficiary, certificate, social security and account numbers
  • Past, present, or future health or condition of an individual
  • Past, present, or future payment for the provision of healthcare to an individual
  • Every date linked directly to a person, such as date of birth, discharge date, date of death, and administration


When you need data in real-time you should be able to access APIs just as quickly. This is why Shaip APIs provide real time, on-demand access to the records you need. With Shaip APIs your teams now have fast and scalable access to de-identified records and quality contextualized medical data to complete their AI projects right the first time.

De-Identification API

Patient data is essential in developing the best possible healthcare AI projects. But protecting their personal information is just as essential. Shaip is a known industry leader in data de-identification, data masking, and data anonymization to remove all PHI/PII (personal health/identifying information).

  • De-identify, tokenize, and anonymize sensitive data for PHI, PII and PCI
  • Conform with HIPAA and Safe Harbor guidelines
  • Redact all 18 identifiers covered in HIPAA and Safe Harbor guidelines.
  • Expert certification and auditing of de-identification quality
  • Follow comprehensive PHI annotation guidelines to uniformly de-identify PHI data and adhere to the Safe Harbor guidelines

Read More

De-Identification API

Data De-identification Key Features

Data Masking - Human-In-The-Loop


World-class quality data with multiple levels of quality control and humans-in-the-loop.

De-Identification - A Single Optimized Platform for Data Integrity

Single Optimized Platform for Data Integrity

Data Masking through a single platform with integrated production, test, & development enables database integrity across geography

Data Masking - 100+ million documents de-identified

100+ million documents de-identified

A proven platform that facilitates effective de-identification of data reducing the risks of compromised PII/PHI.


Enhanced Data Security

Enhanced data security ensures data formats are policy controlled and preserved.

Anonymization - Enhanced Scalability

Enhanced Scalability

Automated, repeatable solution with the human in the loop process for de-identification that can be scaled as data grows

Masking - Availability & Delivery

Availability & Delivery

High network up-time & on-time delivery of data, services & solutions.

Data De-identification in Action

structured data

Deidentify Protected Health Information (PHI) from structured datasets, while enforcing HIPPA & GDPR compliance and ensuring linkage of clinical data across files.

De-identify free text

Deidentify free-text documents by either obscuring or masking PHI information with high-accuracy with our patented Healthcare API.

De-identify DICOM

Deidentify DICOM images by masking or obscuring PHI information

Use Case

Goal: Removing PII from financial documents including W2, Bank statement, 1099, 1040 etc.

Challenge: De-identification of 18 predefined identifiers in 10,000+ financial documents.

Our Contribution: De-identified PIIs from 10,000+ financial documents on the client’s platform utilizing Onshore personnel.

End Result: The client developed an AI-driven information extraction model to pull crucial data from financial documents.

Goal: Remove the PHI information from clinical documents.

Challenge: De-identification of 30,000+ clinical documents that can be used for developing AI models.

Our Contribution: De-identified PHIs from clinical documents adhering to HIPAA and Safe Harbor Guidelines

End Result: Client leveraged well-annotated and gold-standard dataset to solve their use case.

Comprehensive Compliance Coverage

Scale data de-identification across different regulatory jurisdictions including GDPR, HIPAA, and as per Safe Harbor, De-identification that reduces risks of compromise of PII/PHI

Safe Harbor De-identification by Shaip
GDPR Complient De-Identification by Shaip
HIPAA Complient Data Masking by Shaip

Our Capability



Dedicated and trained teams:

  • 7000+ collaborators for Data Creation, Labeling & QA
  • Credentialed Project Management Team
  • Experienced Product Development Team
  • Talent Pool Sourcing & Onboarding Team



Highest process efficiency is assured with:

  • Robust 6 Sigma Stage-Gate Process
  • A dedicated team of 6 Sigma black belts – Key process owners & Quality compliance
  • Continuous Improvement & Feedback Loop



The patented platform offers benefits:

  • Web-based end-to-end platform
  • Impeccable Quality
  • Faster TAT
  • Seamless Delivery

Featured Clients

Empowering teams to build world-leading AI products.

Clientele - Amazon Logo

Clientele - Google Logo

Clientele - Microsoft Logo

Start de-identifying your AI Data today