High Quality Curated Data to Train Your AI Model

Download to check the kind of data we can deliver.

Human-Bot Conversations
(Audio and Transcribed JSON)

Canadian French
5 hours of Canadian French language
human-bot audio conversation and transcribed json files
Australian English
5 hours of Australian English language
human-bot audio conversation and transcribed json files
UK English
5 hours of UK English language
human-bot audio conversation and transcribed json files

Conversations to Train your AI model
(Audio and JSON)

Danish
Audio & Transcribed
Files
A set of 5 hours of Danish language Audio & Transcribed files.
Hindi
Audio & Transcribed
Files
A set of 5 hours of Hindi
language Audio & Transcribed files.
Telugu
Audio & Transcribed
Files
A set of 5 hours of Telugu language Audio & Transcribed files.
Indonesian Audio & Transcribed
Files
A set of 5 hours of Indonesian language Audio & Transcribed files.
Hebrew
Audio & Transcribed
Files
A set of 5 hours of Hebrew language Audio & Transcribed files.
Malay
Audio & Transcribed
Files
A set of 5 hours of Malay language Audio & Transcribed files.
Afrikaans
Audio & Transcribed
Files
A set of 5 hours of Afrikaans language Audio & Transcribed files.
Arabic
Audio & Transcribed
Files
A set of 5 hours of Arabic language Audio & Transcribed files.
Irish
Audio & Transcribed
Files
A set of 5 hours of Irish language Audio & Transcribed
files.
Scottish
Audio & Transcribed
Files
A set of 5 hours of Scottish language Audio & Transcribed files.
Welsh
Audio & Transcribed
Files
A set of 5 hours of Welsh language Audio & Transcribed files.

Physician Dictation
Audio & Transcribed Reports

Physician Dictation Audio Files
A set of 16 hours of audio, dictated by physicians describing patients’ clinical condition and plan of care based on physician-patient encounters in the hospital/clinical setting.
Verbatim Transcribed Text Files
A set of transcribed documents corresponding to the dictation audio dataset. Transcription has been done verbatim, as required to train speech recognition acoustic and vocabulary models.

Still have questions about shAIp Data Services?