License High-quality
Healthcare/Medical Data
for AI/ML Models
Data Catalog – Medical
Physician Audio
Our de-identified dataset for healthcare include 31 different specialties audio files dictated by physicians describing patients’ clinical condition and plan of care based on physician-patient encounters in the hospital/clinical setting.
Audio Data by Gender
Speciality | Patient Audio Files (Playtime in Hours) | Total No. of Audio Files |
---|---|---|
Total | 221,000 | 4,306,000 |
Male | 80,000 | 1,600,000 |
Female | 25,000 | 475,000 |
Unknown | 116,000 | 2,231,000 |
Audio Data by Specialty
Speciality | Patient Audio Files (Playtime in Hours) | Total No. of Audio Files |
---|---|---|
Total | 221,000 | 4,306,000 |
Cardiology | 45,000 | 900,000 |
Internal Medicine | 40,000 | 750,000 |
Family Medicine | 20,000 | 340,000 |
Surgery | 15,750 | 365,000 |
Diagnostic Radiology | 15,000 | 330,000 |
Hematology – Oncology | 15,000 | 290,000 |
Physician Assistant/ Nurse Practitioner | 10,000 | 200,000 |
Psychiatry | 9,000 | 175,000 |
Emergency Medicine | 6,000 | 120,000 |
Nephrology | 6,000 | 100,000 |
Pulmonology | 5,500 | 110,000 |
Gastroenterology | 5,000 | 100,000 |
OB/GYN | 3,500 | 66,000 |
Orthopedics | 3,000 | 60,000 |
Medical Oncology | 2,500 | 50,000 |
Urology | 2,000 | 40,000 |
Neurology | 1,500 | 30,000 |
Pathology | 1,500 | 30,000 |
Pediatrics | 1,500 | 30,000 |
Otolaryngology | 800 | 16,000 |
Infectious Disease | 500 | 10,000 |
Podiatry | 500 | 10,000 |
Family Nurse Practitioner | 500 | 10,000 |
Geriatric Medicine | 500 | 10,000 |
Osteopathic | 400 | 8,000 |
Anesthesiology | 400 | 8,000 |
Ophthalmology | 400 | 8,000 |
Dermatology | 400 | 8,000 |
Family Practice | 250 | 5,000 |
Rehabilitation | 100 | 2,000 |
Physical Medicine & Rehabilitation | 100 | 2,000 |
Others | 8,400 | 123,000 |
Audio Data by Device
Speciality | Patient Audio Files (Playtime in Hours) | Total No. of Audio Files |
---|---|---|
Total | 221,000 | 4,306,000 |
Telephone Dictation | 120,000 | 2,400,000 |
Digital Recorder | 55,000 | 850,000 |
SpeechMic | 12,000 | 235,000 |
Smartphone | 6,000 | 120,000 |
Unknown | 28,000 | 701,000 |
Transcribed Medical Record
Specialty | Approx. No. of Medical Records | Approx. No. of Characters |
---|---|---|
Total | 4,306,000 | 9,236,370,000 |
Cardiology | 900,000 | 1,930,500,000 |
Internal Medicine | 750,000 | 1,608,750,000 |
Family Medicine | 340,000 | 729,300,000 |
Surgery | 365,000 | 782,925,000 |
Diagnostic Radiology | 330,000 | 707,850,000 |
Hematology – Oncology | 290,000 | 622,050,000 |
PANP | 200,000 | 429,000,000 |
Psychiatry | 175,000 | 375,375,000 |
Emergency Medicine | 120,000 | 257,400,000 |
Nephrology | 100,000 | 214,500,000 |
Pulmonology | 110,000 | 235,950,000 |
Gastroenterology | 100,000 | 214,500,000 |
OB/GYN | 66,000 | 141,570,000 |
Orthopedics | 60,000 | 128,700,000 |
Rehabilitation | 2,000 | 4,290,000 |
Medical Oncologist | 50,000 | 107,250,000 |
Urology | 40,000 | 85,800,000 |
Neurology | 30,000 | 64,350,000 |
Pathology | 30,000 | 64,350,000 |
Pediatrics | 30,000 | 64,350,000 |
Physical Medicine & Rehabilitation | 2,000 | 4,290,000 |
Otolaryngology | 16,000 | 34,320,000 |
Infectious Disease | 10,000 | 21,450,000 |
Podiatry | 10,000 | 21,450,000 |
Family Nurse Practitioner | 10,000 | 21,450,000 |
Geriatric Medicine | 10,000 | 21,450,000 |
Osteopathic | 8,000 | 17,160,000 |
Anesthesiology | 8,000 | 17,160,000 |
Ophthalmology | 8,000 | 17,160,000 |
Dermatology | 8,000 | 17,160,000 |
Family Practice | 5,000 | 10,725,000 |
Unknown | 123,000 | 263,835,000 |
EHR
EHR Data by Location
Location | Text Documents |
---|---|
NorthEast | 2,179,310 |
South | 384,871 |
MidWest | 145,820 |
West | 1,317,120 |
EHR Data by Major Diagnosis Category
EHR Data by Major Diagnosis Category | Text Documents |
---|---|
Circulatory System | 307,535 |
Infectious & Parasitic Diseases | 273,815 |
Respiratory System | 271,434 |
Musculoskeletal System & Connective Tissue | 184,300 |
Digestive System | 182,112 |
Nervous System | 171,184 |
Mental Diseases & Disorders | 137,070 |
Kidney & Urinary Tract | 113,632 |
Pregnancy, Childbirth & the Puerperium | 91,321 |
Newborns & Other Neonates with Conditions Originating in the Perinatal Period | 91,316 |
Endocrine, Nutritional & Metabolic Diseases & Disorders | 76,449 |
Hepatobiliary System & Pancreas | 61,006 |
Skin, Subcutaneous Tissue & Breast | 50,516 |
Injuries, Poisonings & Toxic Effects of Drugs | 36,249 |
Blood, Blood Forming Organs, Immunologic Disorders | 28,170 |
Alcohol/Drug Use & Alcohol/Drug-Induced Organic Mental Disorders | 20,821 |
Multiple Significant Trauma | 17,236 |
Ear, Nose, Mouth & Throat | 14,460 |
Female Reproductive System | 12,949 |
Factors Influencing Health Status & Other Contacts with Health Services | 9,674 |
Myeloproliferative Diseases & Disorders, Poorly Differentiated Neoplasms | 8,544 |
Human Immunodeficiency Virus Infections | 7,913 |
Male Reproductive System | 5,650 |
Eye | 2,352 |
Burns | 315 |
Ungroupable | 57 |
Total with MDC | 2,176,080 |
Cases using a specialty grouper such as 3M (MDC not specified) | 516,837 |
Outpatient Cases (MDC not specified) | 2,183,951 |
Cases without reimbursement generated (MDC not specified) | 18,446 |
Total including everything (Cases with & without MDC category) | 4,895,257 |
Oncology Cohorts
Detailed Cohort building – by parsing data using our proprietary Clinical NLP
Entity Type | Definition (Patients Cohorts can be identified using the following attributes) |
---|---|
Problem | ex. Breast cancer, CRC, Leukemia, Melanoma, Myeloma, NHL, Non small-cell lung, Prostate cancer, etc. |
Procedure | CT scan, biopsy, MRI, bone marrow aspiration and biopsy, etc. |
Medicine | Herceptin, cyclophosphamide, carboplatin, etc. |
Medical Device | ex. Catheters, pacemakers etc |
Lab Data | CA-125, PSA, AFP, beta-hCG, CA15-3/CA27.29, HE4, etc. |
Body Measurements | ex. Pulse, HR, Blood pressure etc |
Anatomical Structure | ex. Breast, lungs, liver, etc. |
Body function | ex. Breathing, Sleep |
Medication Attributes | ex. Medication dose, form, route, frequency etc |
Taxonomies | SNOMED, RxNorm, LOINC |
Possible Combination Queries |
---|
Smokers under age of 50 with malignant lung neoplasm |
“Patients having a specific type of cancer with a specific stage. ex. Chronic myelomonocytic leukemia, in remission, Chronic myelomonocytic leukemia, in remission OR Chronic myelomonocytic leukemia, in relapse” |
Patients on chemotherapy, hormonal therapy, radiation therapy, immunotherapy, etc. |
Patients with stage II cancer undergoing adjuvant chemotherapy |
Patients with family history of breast cancer |
Additional Cohort building metadata available
Entity Type | Definition (Patients Cohorts can be identified using the following attributes) |
---|---|
Patient Class | Inpatient, Outpatient, ED |
Service Line | Medicine, Surgery, Cardiology etc |
Date | Service Date, Admit Date, Discharge Date |
Length of Stay | Days a patient stayed in the hospital |
Payor/Financial Class | Medicare, Medicaid, BC/BS etc |
Patient Demographics | Name, Age, DOB, Gender, Race, Ethnicity, Location, Reason for Visit |
Physician Demographics | Age, DOB, Gender, Race, Ethnicity, Location, Specialty, Education |
Discharge Disposition | Expired, Left AMA etc |
Type of Notes | Clinic Notes, Consultation, Follow-up, Progress |
Taxonomies/Codes | ICD-10 CM/PCS, HCPCS, CPT, E&M |
DRG Parameters | Severity of Illness, Risk of Mortality, DRG Group, MDC, Expected LOS |
Cardiology Cohorts
Detailed Cohort building – by parsing data using our proprietary Clinical NLP
Entity Type | Definition (Patients Cohorts can be identified using the following attributes) |
---|---|
Problem | Eg. Hypertension, CHF, chest pain etc |
Procedure | E.g. CABG, EKG, bypass etc |
Medicine | Metoprolol, beta-blockers, aspirin etc |
Medical Device | E.g. Catheters, pacemakers etc |
Lab Data | E.g. FBS, AFB, H&H, GFR etc |
Body Measurements | E.g. Pulse, HR, Blood pressure etc |
Anatomical Structure | E.g. Heart, lungs, left hand etc |
Body function | E.g. Breathing, Sleep |
Medication Attributes | E.g. Medication dose, form, route, frequency etc |
Taxonomies | SNOMED, RxNorm, LOINC |
Possible Combination Queries |
---|
Patients having CHF, with EF<40, taking beta-blocker, and systolic heart failure not documented |
Patients having CHF, with EF<40, not taking beta-blocker, not taking aspirin |
Patients having diabetes mellitus documented but no HbA1c value documented |
Smokers under age of 50 with malignant lung neoplasm |
Patients having a specific types of cancer with a specific stages. Eg. Chronic myelomonocytic leukemia, in remission, Chronic myelomonocytic leukemia, in remission OR Chronic myelomonocytic leukemia, in relapse |
Additional Cohort building metadata available
Entity Type | Definition (Patients Cohorts can be identified using the following attributes) |
---|---|
Patient Class | Inpatient, Outpatient, ED |
Service Line | Medicine, Surgery, Cardiology etc |
Date | Service Date, Admit Date, Discharge Date |
Length of Stay | Days a patient stayed in the hospital |
Payor/Financial Class | Medicare, Medicaid, BC/BS etc |
Patient Demographics | Name, Age, DOB, Gender, Race, Ethnicity, Location, Reason for Visit |
Physician Demographics | Age, DOB, Gender, Race, Ethnicity, Location, Specialty, Education |
Discharge Disposition | Expired, Left AMA etc |
Type of Notes | Clinic Notes, Consultation, Follow-up, Progress |
Taxonomies/Codes | ICD-10 CM/PCS, HCPCS, CPT, E&M |
DRG Parameters | Severity of Illness, Risk of Mortality, DRG Group, MDC, Expected LOS |
Tell us how we can help with your next AI initiative.