License High-quality Healthcare/Medical Data
for AI & ML Models

Off-the-shelf Healthcare/Medical Datasets to jump start your Healthcare AI project

Physician Dictation Audio Data Datasets

Plug-in the medical data you’ve been missing today

Physician Dictation Audio Data datasets for Machine Learning

Our de-identified dataset for healthcare include 31 different specialties audio files dictated by physicians describing patients’ clinical condition and plan of care based on physician-patient encounters in the hospital/clinical setting.

Off-the-Shelf Physician Dictation Audio Files:

  • 257,977 hours of Real-world Physician Dictation Speech Dataset from 31 specialties’ to train Healthcare Speech models
  • Dictation audio captured from various devices like Telephone Dictation (54.3%), Digital Recorder (24.9%), Speech Mic (5.4%), Smart Phone (2.7%) and Unknown (12.7%)
  • PII Redacted Audio & Transcripts adhering to Safe Harbor Guidelines in conformance with HIPAA
Audio Data by Gender
SpecialityPatient Audio Files (Playtime in Hours)Total No. of Audio Files

Total

257,9775,172,766
Male58,8502,444,910
Female113,4061,290,900
Unknown85,7211,436,956
Audio Data by Specialty
SpecialityPatient Audio Files (Playtime in Hours)Total No. of Audio Files

Total

257,9775,172,766
Accident & emergency9359
Allergy and Immunology115222202
Anesthesiology67722280
Anesthetics19
APRN1631693
Cardiology675041566721
Cardiothoracic17122
Cardiothoracic surgery110
Clinical hematology02
Clinical physiology50160
Critical Care7079645
Dental551233
Dermatology1483474
Diagnostic Radiology2557591
Ear, Nose And Throat51658
ED Physician Assistant070
Emergency367562256
Emergency Room Specialist30378
Endocrinology2193212
Family Medicine13639263480
Family Nurse Practitioner4249018
Family Practice2622498
Gastroenterology312762158
General26313
General dental practice225
General medicine30327
General Psychiatry336
General Surgeon27893
General surgery2372220
Geriatric Medicine4615323
GI55550
Gynecology425
Hematology – Oncology22394
HIM019
Hospice & Palliative Medicine441
Hospitalist991493
IH-Industrial Health73945
Internal Medicine42604623072
Internal Medicine And Nephrology15111
Internal Medicine, Pulmonary Medicine, Critical Care Medicine And Sleep Medicine5102
Medical oncology1667
Medicine5122
Nephrology243139821
Neuro/TBI1731157
Neurology147617786
Neurosurgery86755
Nurse Practitioner81432
Nurse Practitioner – Family9113
OB/GYN242442739
Occupational medicine79763
Occupational Therapist868
Oncology681682300
OPERATIVE CARE05
Ophthalmology60919299
Oral & Maxillofacial Surgeon18
Oral surgery113
Orthopaedics & Sports Medicine1493165
Orthopedic4849145053
Osteopathic3105566
Otolaryngology99519548
Pain Management230
Pain Medicine111
PANP10760145960
Pathology114343462
Pediatric Dentistry15420
Pediatric pulmonology440
Pediatric specialty35682
Pediatric surgery223
Pediatrics8779271
Physical Medicine & Rehabilitation134723523
Physical Therapist1141713
Physician Asst.638
Plastic surgery – specialty13183
Podiatric Surgery424
Podiatry89212056
Preventive Medicine21191
PRIMARY CARE ATTENDING17
Psychiatry887170269
Psychotherapy (specialty)50229
Pulmonary380964368
Radiology10962630983
Rehabilitation251530078
Resident46641
Rheumatology13124
Speech Therapy29327
Sports Medicine349
Surgery14431236788
Surgery Physician Assistant03
Surgical specialty22290
Thoracic medicine527
Thoracic surgery437
Transplant332
Trauma & orthopedics1401308
Unknown42269748054
Upper gastrointestinal surgery458
Urology317096934
VASCULAR SURGERY19156
Vascular/General9268
Wound Care15211
Audio Data by Device
SpecialityPatient Audio Files (Playtime in Hours)Total No. of Audio Files

Total

257,9775,172,766
IPHONE66632,382
Digital Recorder1,65922,377
Mixed type 69,8181,408,679
SmartPhone51,5331,306,405
SpeechMic10,329257,730
Telephone Dictation120,8672,071,557
Unknown3,10473,636

We deal with all types of Data Licensing i.e., text, audio, video, or image. The datasets consist of Medical datasets for ML: Physician Dictation Dataset, Physician Clinical Notes, Medical Conversation Dataset, Medical Transcription Dataset, Doctor-Patient Conversation, Medical Text Data, Medical Images – CT Scan, MRI, Ultra Sound (collected basis custom requirements).

Shaip Contact Us

Can’t find what you are looking for?

New off-the-shelf medical datasets are being collected across all data types 

Contact us now to let go of your healthcare training data collection worries

  • By registering, I agree with Shaip Privacy Policy and Terms of Service and provide my consent to receive B2B marketing communication from Shaip.