High Quality Curated Data to Train Your AI Model

Download our sample datasets for your Machine Learning Models

DatasetsFileUse CaseDescriptionDownload
Physician Dictation
Physician Dictation Audio Files
Audio Files
HealthcareAn hour of audio, dictated by physicians describing patients’ clinical condition & plan of care in the hospital/clinical setting.
Physician Dictation
Verbatim Transcribed Text Files
Verbatim Transcribed Text Files
HealthcareA set of transcribed documents corresponding to the dictation audio dataset. Verbatim transcription, as required to train speech recognition acoustic & vocabulary models.
Physician Clinical Notes
Physician Dictation Notes
Dictation Notes
HealthcareA set of clinical documents as dictated by the physician describing patients’ clinical condition.
Physician Clinical Notes
Physician Dictation Notes
De-identified Dictation Notes
HealthcareA set of formatted clinical documents as dictated by the physicians to train medical AI models.
Human-Bot Conversations
Canadian French
Canadian French
Conversational AIAn hour of audio conversation & transcribed json files
Human-Bot Conversations
Australian English
Australian English
Conversational AIAn hour of audio conversation & transcribed json files
Human-Bot Conversations
UK English
UK English
Conversational AIAn hour of audio conversation & transcribed json files
Conversations Datasets
Danish
Danish
Conversational AIAn hour of audio conversation & transcribed json files
Conversations Datasets
Hindi
Hindi
Conversational AIAn hour of audio conversation & transcribed json files
Conversations Datasets
Telugu
Telugu
Conversational AIAn hour of audio conversation & transcribed json files
Conversations Datasets
Indonesian
Indonesian
Conversational AIAn hour of audio conversation & transcribed json files
Conversations Datasets
Hebrew
Hebrew
Conversational AIAn hour of audio conversation & transcribed json files
Conversations Datasets
Malay
Malay
Conversational AIAn hour of audio conversation & transcribed json files
Conversations Datasets
Afrikaans
Afrikaans
Conversational AIAn hour of audio conversation & transcribed json files
Conversations Datasets
Arabic
Arabic
Conversational AIAn hour of audio conversation & transcribed json files
Conversations Datasets
Irish
Irish
Conversational AIAn hour of audio conversation & transcribed json files
Conversations Datasets
Scottish
Scottish
Conversational AIAn hour of audio conversation & transcribed json files
Conversations Datasets
Welsh
Welsh
Conversational AIAn hour of audio conversation & transcribed json files