Electronic Health Records (EHR) datasets for AI & ML Projects
Off-the-shelf Electronic Health Records (EHR) Datasets to Jumpstart your Healthcare AI project.
Plug-in the medical data you’ve been missing today
Find the right Electronic Health Records (EHR) Data For Your Healthcare AI
Improve your machine learning models with best-in-class training data. Electronic Health Records or EHR are medical records that contains patient’s medical history, diagnoses, prescription, treatment plans, vaccination or immunization dates, allergies, radiology images (CT Scan, MRI, X-Rays), and laboratory tests & more. Our Off-the-shelf data catalog makes it easy for you to get medical training data you can trust.
Off-the-Shelf Electronic Health Records (EHR):
- 5.1M+ Records and physician audio files in 31 specialties
- Real-world gold-standard medical records to train Clinical NLP and other Document AI models
- Metadata information like MRN (Anonymized), Admission Date, Discharge Date, Length of Stay days, Gender, Patient Class, Payer, Financial Class, State, Discharge Disposition, Age, DRG, DRG Description, $ Reimbursement, AMLOS, GMLOS, Risk of mortality, Severity of illness, Grouper, Hospital Zip Code, etc.
- Medical Records from various US states and region- North East (46%), South (9%), Midwest (3%), West (28%), Others (14%)
- Medical Records belonging to all Patient Classes covered- Inpatient, Outpatient (Clinical, Rehab, Recurring, Surgical Day Care), Emergency.
- Medical Records belonging to all Patient Age Groups <10 yrs (7.9%), 11-20 yrs (5.7%), 21-30 yrs (10.9%), 31-40 yrs (11.7%), 41-50 yrs (10.4%), 51-60 yrs (13.8%), 61-70 yrs (16.1%), 71-80 yrs (13.3%), 81-90 yrs (7.8%), 90+ yrs (2.4%)
- Patient Gender ratio of 46% (Male) and 54% (Female)
- PII Redacted Documents adhering to Safe Harbor Guidelines in conformance with HIPAA
EHR Data by Location
Location | Text Documents |
---|---|
NorthEast | 4,473,573 |
South | 1,801,716 |
MidWest | 781,701 |
West | 1,509,109 |
EHR Data by Major Diagnosis Category
EHR Data by Major Diagnosis Category | Text Documents |
---|---|
Circulatory System | 589,730 |
Infectious & Parasitic Diseases | 559,244 |
Respiratory System | 561,983 |
Musculoskeletal System & Connective Tissue | 329,344 |
Digestive System | 346,369 |
Nervous System | 316,243 |
Mental Diseases & Disorders | 282,501 |
Kidney & Urinary Tract | 209,561 |
Pregnancy, Childbirth & the Puerperium | 165,303 |
Newborns & Other Neonates with Conditions Originating in the Perinatal Period | 163,605 |
Endocrine, Nutritional & Metabolic Diseases & Disorders | 142,808 |
Hepatobiliary System & Pancreas | 127,172 |
Skin, Subcutaneous Tissue & Breast | 89,577 |
Injuries, Poisonings & Toxic Effects of Drugs | 64,097 |
Blood, Blood Forming Organs, Immunologic Disorders | 48,990 |
Alcohol/Drug Use & Alcohol/Drug-Induced Organic Mental Disorders | 48,717 |
Multiple Significant Trauma | 27,902 |
Ear, Nose, Mouth & Throat | 22,987 |
Female Reproductive System | 17,010 |
Factors Influencing Health Status & Other Contacts with Health Services | 21,294 |
Myeloproliferative Diseases & Disorders, Poorly Differentiated Neoplasms | 15,620 |
Human Immunodeficiency Virus Infections | 12,422 |
Male Reproductive System | 9,230 |
Eye | 3,549 |
Burns | 444 |
Alcohol/Drug Use or Induced Mental Disorders | 48,717 |
Total with MDC | 4,175,702 |
Cases using a specialty grouper such as 3M (MDC not specified) | 1,619,682 |
Outpatient Cases (MDC not specified) | 1,980,606 |
Cases without reimbursement generated (MDC not specified) | 790,697 |
Total including everything (Cases with & without MDC category) | 8,566,687 |
We deal with all types of Data Licensing i.e., text, audio, video, or image. The datasets consist of Medical datasets for ML: Physician Dictation Dataset, Physician Clinical Notes, Medical Conversation Dataset, Medical Transcription Dataset, Doctor-Patient Conversation, Medical Text Data, Medical Images – CT Scan, MRI, Ultra Sound (collected basis custom requirements).
Can’t find what you are looking for?
New off-the-shelf medical datasets are being collected across all data types
Contact us now to let go of your healthcare training data collection worries