High-quality Off-the-Shelf AI Training datasets to train your AI Model

Sample Datasets

Get a professional, scalable, & reliable sample dataset to train your Chatbot, Conversational AI, & Healthcare applications to train your ML Models

DatasetsFileUse CaseDescriptionDownload
Physician Dictation
Physician Dictation Audio Files
Audio Files
HealthcareAn hour of audio, dictated by physicians describing patients’ clinical condition & plan of care in the hospital/clinical setting.
Physician Dictation
Verbatim Transcribed Text Files
Verbatim Transcribed Text Files
HealthcareA set of transcribed documents corresponding to the dictation audio dataset. Verbatim transcription, as required to train speech recognition acoustic & vocabulary models.
Physician Clinical Notes
Physician Dictation Notes
Dictation Notes
HealthcareA set of clinical documents as dictated by the physician describing patients’ clinical condition.
Physician Clinical Notes
Physician Dictation Notes
De-identified Dictation Notes
HealthcareA set of formatted clinical documents as dictated by the physicians to train medical AI models.
Human-Bot Conversations
Australian English
Australian English
Conversational AIAn hour of audio conversation & transcribed json files
Human-Bot Conversations
Uk English
UK English
Conversational AIAn hour of audio conversation & transcribed json files
Conversations Datasets
Danish
Danish
Conversational AIAn hour of audio conversation & transcribed json files
Conversations Datasets
Hindi
Hindi
Conversational AIAn hour of audio conversation & transcribed json files
Conversations Datasets
Telugu
Telugu
Conversational AIAn hour of audio conversation & transcribed json files
Conversations Datasets
Indonesian
Indonesian
Conversational AIAn hour of audio conversation & transcribed json files
Conversations Datasets
Hebrew
Hebrew
Conversational AIAn hour of audio conversation & transcribed json files
Conversations Datasets
Malay
Malay
Conversational AIAn hour of audio conversation & transcribed json files
Conversations Datasets
Afrikaans
Afrikaans
Conversational AIAn hour of audio conversation & transcribed json files
Conversations Datasets
Arabic
Arabic
Conversational AIAn hour of audio conversation & transcribed json files
Conversations Datasets
Irish
Irish
Conversational AIAn hour of audio conversation & transcribed json files
Conversations Datasets
Scottish
Scottish
Conversational AIAn hour of audio conversation & transcribed json files
Conversations Datasets
Welsh
Welsh
Conversational AIAn hour of audio conversation & transcribed json files

We deal with all types of Data Licensing be it text, audio, video, or image. The above sample datasets consist of Human-Bot Conversations, Chatbot Training Dataset, Conversational AI Datasets, Physician Dictation Dataset, Physician Clinical Notes, Medical Conversation Dataset,
Medical Transcription Dataset, Doctor-Patient Conversational Dataset, etc. 

Can’t find what you are looking for? New off-the-shelf datasets are being collected across all data types i.e. text, audio, image, & video. Contact us today.