License High-quality Healthcare/Medical Data for AI & ML Models

Off-the-shelf Healthcare/Medical Datasets to jump start your Healthcare AI project

Physician dictation audio data datasets

Plug-in the medical data you’ve been missing today

Physician Dictation Audio Data datasets for Machine Learning

Our de-identified dataset for healthcare include 31 different specialties audio files dictated by physicians describing patients’ clinical condition and plan of care based on physician-patient encounters in the hospital/clinical setting.

Off-the-Shelf Physician Dictation Audio Files:

  • 257,977 hours of Real-world Physician Dictation Speech Dataset from 31 specialties’ to train Healthcare Speech models
  • Dictation audio captured from various devices like Telephone Dictation (54.3%), Digital Recorder (24.9%), Speech Mic (5.4%), Smart Phone (2.7%) and Unknown (12.7%)
  • PII Redacted Audio & Transcripts adhering to Safe Harbor Guidelines in conformance with HIPAA
Audio Data by Gender
SpecialityPatient Audio Files (Playtime in Hours)Total No. of Audio Files

Total

257,9775,172,766
Male58,8502,444,910
Female113,4061,290,900
Unknown85,7211,436,956
Audio Data by Specialty
SpecialityPatient Audio Files (Playtime in Hours)Total No. of Audio Files
Pain Medicine111
Podiatric Surgery424
Plastic surgery – specialty13183
Physician Asst.638
Physical Therapist1141713
Physical Medicine & Rehabilitation134723523
Pediatrics8779271
Pediatric surgery223
Pediatric specialty35682
Pediatric pulmonology440
Pediatric Dentistry15420
Pathology114343462
PANP10760145960
Podiatry89212056
Pain Management230
Otolaryngology99519548
Osteopathic3105566
Orthopedic4849145053
Orthopaedics & Sports Medicine1493165
Oral surgery113
Oral & Maxillofacial Surgeon18
Ophthalmology60919299
OPERATIVE CARE05
Oncology681682300
Occupational Therapist868
Surgery14431236788
Wound Care15211
Vascular/General9268
VASCULAR SURGERY19156
Urology317096934
Upper gastrointestinal surgery458
Unknown42269748054
Trauma & orthopedics1401308
Transplant332
Thoracic surgery437
Thoracic medicine527
Surgical specialty22290
Surgery Physician Assistant03
Occupational medicine79763
Sports Medicine349
Speech Therapy29327
Rheumatology13124
Resident46641
Rehabilitation251530078
Radiology10962630983
Pulmonary380964368
Psychotherapy (specialty)50229
Psychiatry887170269
PRIMARY CARE ATTENDING17
Preventive Medicine21191
Dental551233
General26313
Gastroenterology312762158
Family Practice2622498
Family Nurse Practitioner4249018
Family Medicine13639263480
Endocrinology2193212
Emergency Room Specialist30378
Emergency367562256
ED Physician Assistant070
Ear, Nose And Throat51658
Diagnostic Radiology2557591
Dermatology1483474
General dental practice225
Critical Care7079645
Clinical physiology50160
Clinical hematology02
Cardiothoracic surgery110
Cardiothoracic17122
Cardiology675041566721
APRN1631693
Anesthetics19
Anesthesiology67722280
Allergy and Immunology115222202
Accident & emergency9359
IH-Industrial Health73945
OB/GYN242442739
Nurse Practitioner – Family9113
Nurse Practitioner81432
Neurosurgery86755
Neurology147617786
Neuro/TBI1731157
Nephrology243139821
Medicine5122
Medical oncology1667
Internal Medicine, Pulmonary Medicine, Critical Care Medicine And Sleep Medicine5102
Internal Medicine And Nephrology15111
Internal Medicine42604623072

Total

257,9775,172,766
Hospitalist991493
Hospice & Palliative Medicine441
HIM019
Hematology – Oncology22394
Gynecology425
GI55550
Geriatric Medicine4615323
General surgery2372220
General Surgeon27893
General Psychiatry336
General medicine30327
Audio Data by Device
SpecialityPatient Audio Files (Playtime in Hours)Total No. of Audio Files

Total

257,9775,172,766
IPHONE66632,382
Digital Recorder1,65922,377
Mixed type 69,8181,408,679
SmartPhone51,5331,306,405
SpeechMic10,329257,730
Telephone Dictation120,8672,071,557
Unknown3,10473,636

We deal with all types of Data Licensing i.e., text, audio, video, or image. The datasets consist of Medical datasets for ML: Physician Dictation Dataset, Physician Clinical Notes, Medical Conversation Dataset, Medical Transcription Dataset, Doctor-Patient Conversation, Medical Text Data, Medical Images – CT Scan, MRI, Ultra Sound (collected basis custom requirements).

Shaip contact us

Can’t find what you are looking for?

New off-the-shelf medical datasets are being collected across all data types 

Contact us now to let go of your healthcare training data collection worries

  • By registering, I agree with Shaip Privacy Policy and Terms of Service and provide my consent to receive B2B marketing communication from Shaip.