License High-quality
Healthcare/Medical Data
for AI/ML Models

Data Catalog – Medical

Our de-identified dataset for healthcare include 31 different specialties audio files dictated by physicians describing patients’ clinical condition and plan of care based on physician-patient encounters in the hospital/clinical setting.

Audio Data by Gender

SpecialityPatient Audio Files (Playtime in Hours)Total No. of Audio Files

Total

221,0004,306,000
Male80,0001,600,000
Female25,000475,000
Unknown116,0002,231,000

Audio Data by Specialty

SpecialityPatient Audio Files (Playtime in Hours)Total No. of Audio Files

Total

221,0004,306,000
Cardiology45,000900,000
Internal Medicine40,000750,000
Family Medicine20,000340,000
Surgery15,750365,000
Diagnostic Radiology
15,000330,000
Hematology – Oncology
15,000290,000
Physician Assistant/ Nurse Practitioner
10,000200,000
Psychiatry
9,000175,000
Emergency Medicine
6,000120,000
Nephrology
6,000100,000
Pulmonology
5,500110,000
Gastroenterology
5,000100,000
OB/GYN
3,50066,000
Orthopedics
3,00060,000
Medical Oncology
2,50050,000
Urology
2,00040,000
Neurology
1,50030,000
Pathology
1,50030,000
Pediatrics
1,50030,000
Otolaryngology
80016,000
Infectious Disease
50010,000
Podiatry
50010,000
Family Nurse Practitioner
50010,000
Geriatric Medicine
50010,000
Osteopathic
4008,000
Anesthesiology
4008,000
Ophthalmology
4008,000
Dermatology
4008,000
Family Practice
2505,000
Rehabilitation
1002,000
Physical Medicine & Rehabilitation
1002,000
Others
8,400123,000

Audio Data by Device

SpecialityPatient Audio Files (Playtime in Hours)Total No. of Audio Files

Total

221,0004,306,000
Telephone Dictation
120,0002,400,000
Digital Recorder
55,000850,000
SpeechMic
12,000235,000
Smartphone6,000120,000
Unknown
28,000701,000

SpecialtyApprox. No. of Medical RecordsApprox. No. of Characters

Total

4,306,0009,236,370,000
Cardiology900,0001,930,500,000
Internal Medicine750,0001,608,750,000
Family Medicine340,000729,300,000
Surgery365,000782,925,000
Diagnostic Radiology330,000707,850,000
Hematology – Oncology290,000622,050,000
PANP200,000429,000,000
Psychiatry175,000375,375,000
Emergency Medicine120,000257,400,000
Nephrology100,000214,500,000
Pulmonology110,000235,950,000
Gastroenterology100,000214,500,000
OB/GYN66,000141,570,000
Orthopedics60,000128,700,000
Rehabilitation2,0004,290,000
Medical Oncologist50,000107,250,000
Urology40,00085,800,000
Neurology30,00064,350,000
Pathology30,00064,350,000
Pediatrics30,00064,350,000
Physical Medicine & Rehabilitation2,0004,290,000
Otolaryngology16,00034,320,000
Infectious Disease10,00021,450,000
Podiatry10,00021,450,000
Family Nurse Practitioner10,00021,450,000
Geriatric Medicine10,00021,450,000
Osteopathic8,00017,160,000
Anesthesiology8,00017,160,000
Ophthalmology8,00017,160,000
Dermatology8,00017,160,000
Family Practice5,00010,725,000
Unknown123,000263,835,000

EHR Data by Location

LocationText Documents
NorthEast2,179,310
South384,871
MidWest145,820
West1,317,120

EHR Data by Major Diagnosis Category

EHR Data by Major Diagnosis CategoryText Documents
Circulatory System307,535
Infectious & Parasitic Diseases273,815
Respiratory System271,434
Musculoskeletal System & Connective Tissue184,300
Digestive System
182,112
Nervous System
171,184
Mental Diseases & Disorders
137,070
Kidney & Urinary Tract
113,632
Pregnancy, Childbirth & the Puerperium
91,321
Newborns & Other Neonates with Conditions Originating in the Perinatal Period
91,316
Endocrine, Nutritional & Metabolic Diseases & Disorders
76,449
Hepatobiliary System & Pancreas
61,006
Skin, Subcutaneous Tissue & Breast
50,516
Injuries, Poisonings & Toxic Effects of Drugs
36,249
Blood, Blood Forming Organs, Immunologic Disorders
28,170
Alcohol/Drug Use & Alcohol/Drug-Induced Organic Mental Disorders
20,821
Multiple Significant Trauma
17,236
Ear, Nose, Mouth & Throat
14,460
Female Reproductive System
12,949
Factors Influencing Health Status & Other Contacts with Health Services
9,674
Myeloproliferative Diseases & Disorders, Poorly Differentiated Neoplasms
8,544
Human Immunodeficiency Virus Infections
7,913
Male Reproductive System
5,650
Eye
2,352
Burns
315
Ungroupable
57
                                                                                  Total with MDC
2,176,080
Cases using a specialty grouper such as 3M (MDC not specified)
516,837
Outpatient Cases (MDC not specified)
2,183,951
Cases without reimbursement generated (MDC not specified)
18,446

Total including everything (Cases with & without MDC category)

4,895,257

Detailed Cohort building – by parsing data using our proprietary Clinical NLP

Entity TypeDefinition (Patients Cohorts can be identified using the following attributes)
Problemex. Breast cancer, CRC, Leukemia, Melanoma, Myeloma, NHL, Non small-cell lung, Prostate cancer, etc.
ProcedureCT scan, biopsy, MRI, bone marrow aspiration and biopsy, etc.
MedicineHerceptin, cyclophosphamide, carboplatin, etc.
Medical Deviceex. Catheters, pacemakers etc
Lab DataCA-125, PSA, AFP, beta-hCG, CA15-3/CA27.29, HE4, etc.
Body Measurementsex. Pulse, HR, Blood pressure etc
Anatomical Structureex. Breast, lungs, liver, etc.
Body functionex. Breathing, Sleep
Medication Attributesex. Medication dose, form, route, frequency etc
TaxonomiesSNOMED, RxNorm, LOINC

Possible Combination Queries
Smokers under age of 50 with malignant lung neoplasm
“Patients having a specific type of cancer with a specific stage. ex. Chronic myelomonocytic leukemia, in remission, Chronic myelomonocytic leukemia, in remission OR Chronic myelomonocytic leukemia, in relapse”
Patients on chemotherapy, hormonal therapy, radiation therapy, immunotherapy, etc.
Patients with stage II cancer undergoing adjuvant chemotherapy
Patients with family history of breast cancer

Additional Cohort building metadata available

Entity TypeDefinition (Patients Cohorts can be identified using the following attributes)
Patient ClassInpatient, Outpatient, ED
Service LineMedicine, Surgery, Cardiology etc
DateService Date, Admit Date, Discharge Date
Length of StayDays a patient stayed in the hospital
Payor/Financial Class
Medicare, Medicaid, BC/BS etc
Patient Demographics
Name, Age, DOB, Gender, Race, Ethnicity, Location, Reason for Visit
Physician Demographics
Age, DOB, Gender, Race, Ethnicity, Location, Specialty, Education
Discharge Disposition
Expired, Left AMA etc
Type of Notes
Clinic Notes, Consultation, Follow-up, Progress
Taxonomies/CodesICD-10 CM/PCS, HCPCS, CPT, E&M
DRG Parameters
Severity of Illness, Risk of Mortality, DRG Group, MDC, Expected LOS

Detailed Cohort building – by parsing data using our proprietary Clinical NLP

Entity TypeDefinition (Patients Cohorts can be identified using the following attributes)
ProblemEg. Hypertension, CHF, chest pain etc
ProcedureE.g. CABG, EKG, bypass etc
MedicineMetoprolol, beta-blockers, aspirin etc
Medical DeviceE.g. Catheters, pacemakers etc
Lab DataE.g. FBS, AFB, H&H, GFR etc
Body MeasurementsE.g. Pulse, HR, Blood pressure etc
Anatomical StructureE.g. Heart, lungs, left hand etc
Body functionE.g. Breathing, Sleep
Medication AttributesE.g. Medication dose, form, route, frequency etc
TaxonomiesSNOMED, RxNorm, LOINC

Possible Combination Queries
Patients having CHF, with EF<40, taking beta-blocker, and systolic heart failure not documented
Patients having CHF, with EF<40, not taking beta-blocker, not taking aspirin
Patients having diabetes mellitus documented but no HbA1c value documented
Smokers under age of 50 with malignant lung neoplasm
Patients having a specific types of cancer with a specific stages. Eg. Chronic myelomonocytic leukemia, in remission, Chronic myelomonocytic leukemia, in remission OR Chronic myelomonocytic leukemia, in relapse

Additional Cohort building metadata available

Entity TypeDefinition (Patients Cohorts can be identified using the following attributes)
Patient ClassInpatient, Outpatient, ED
Service LineMedicine, Surgery, Cardiology etc
DateService Date, Admit Date, Discharge Date
Length of StayDays a patient stayed in the hospital
Payor/Financial Class
Medicare, Medicaid, BC/BS etc
Patient Demographics
Name, Age, DOB, Gender, Race, Ethnicity, Location, Reason for Visit
Physician Demographics
Age, DOB, Gender, Race, Ethnicity, Location, Specialty, Education
Discharge Disposition
Expired, Left AMA etc
Type of Notes
Clinic Notes, Consultation, Follow-up, Progress
Taxonomies/CodesICD-10 CM/PCS, HCPCS, CPT, E&M
DRG Parameters
Severity of Illness, Risk of Mortality, DRG Group, MDC, Expected LOS

Tell us how we can help with your next AI initiative.