High-quality Audio / Speech / Voice Datasets to Train Your Conversational AI Model 

Off-the-shelf Voice / Speech / Audio Datasets in multiple languages to jump start your automatic speech recognition (ASR) models

Speech Datasets

Plug-in the audio data catalog you’ve been missing today

DetailsLanguage DatasetSample RateDataset TypeTotal Audio HoursShort DescriptionDataset DescriptionAudio ChannelRecording PlatformWER (%)Audio FormatTranscription FormatUse CaseNumber of SpeakersCTA
Speechen_US_CC_8African American VernacularAfrican American Vernacularen_US8 kHzCall-center211African American Vernacular Call-center dataUnscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale: 612, Male: 1242, and Unknown: 12
Speechen_US_MA_16African American VernacularAfrican American Vernacularen_US16 kHzMedia Audio154African American Vernacular Media dataLicensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoWeb Sourcing5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale: 151, Male: 150, and Unknown: 10
SpeechAfrikaans_GC_8AfrikaansAfrikaansaf_ZA8 kHzGeneral Conversation368Afrikaans General Conversation dataUnscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Afrikaans spoken in AfricaDualDesktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale: 502, Male: 390, and Unknown: 2
SpeechAfrikaans_MA_16AfrikaansAfrikaansaf_ZA16 kHzMedia Audio658Afrikaans Media FilesLicensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoWeb Sourcing5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale: 750, Male: 1278, and Unknown: 52
SpeechArabic_GC_8ArabicArabicar_AE8 kHzGeneral Conversation292Arabic General Conversation dataUnscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Arabic from Gulf countriesDualDesktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale: 171, Male: 534, and Unknown: 1
SpeechArabic_SM_48ArabicArabicar-SA48 kHzScripted Monologue1,947Arabic Scripted MonologueSingle-utterance recordings, which tend to fall in the 5 to 30 second rangeMonoMobile App5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 838 Male 1209 Unknown 78
SpeechAssamese_CC_8AssameseAssamese (In Pipeline) as_INCall-Center60Assamese (In Pipeline) Call-Center dataUnscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,Desktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechAssamese_GCAssameseAssamese (In Pipeline) as_INGeneral Conversation100Assamese (In Pipeline) General Conversation dataUnscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,Desktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechAssamese_MAAssameseAssamese (In Pipeline) as_INMedia Audio40Assamese (In Pipeline) Media audio dataLicensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesWeb Sourcing5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechBengali_CC_8BengaliBengali (In Pipeline) bn_INCall-Center60Bengali (In Pipeline) Call-Center dataUnscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,Desktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechBengali_GCBengaliBengali (In Pipeline) bn_INGeneral Conversation100Bengali (In Pipeline) General Conversation dataUnscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,Desktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechBengali_MABengaliBengali (In Pipeline) bn_INMedia Audio40Bengali (In Pipeline) Media audio dataLicensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesWeb Sourcing5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechBoston_CC_8Boston EnglishBoston Englishen_US8 kHzCall-center177Boston Call-center dataUnscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale: 605, Male: 711, and Unknown: 0
SpeechBoston_GC_8Boston EnglishBoston Englishen_US8 kHzGeneral Conversation32Boston General Conversation dataUnscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, DualDesktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale: 53, Male: 83, and Unknown: 0
SpeechBoston_MA_16Boston EnglishBoston Englishen_US16 kHzMedia Audio93Boston Media audio dataLicensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoWeb Sourcing5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale: 43, Male: 181, and Unknown: 2
SpeechCanadian_SM_48Canadian FrenchCanadian Frenchfr-CA48 kHzScripted Monologue1,222Canadian FrenchSingle-utterance recordings, which tend to fall in the 5 to 30 second rangeMonoMobile App5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 974 Male 631 Unknown 1
SpeechChinese_CC_8Chinese EnglishChinese Englishen_US8 kHzCall-center169Chinese Call-center dataUnscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale: 1790, Male: 523 and Unknown: 13
SpeechChinese_MA_16Chinese EnglishChinese Englishen_US16 kHzMedia Audio249Chinese Media audio dataLicensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoWeb Sourcing5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale: 126, Male: 346 and Unknown: 6
SpeechChinese Simplified_SM_48Chinese SimplifiedChinese Simplifiedzh-CN48 kHzScripted Monologue2,762Chinese SimplifiedSingle-utterance recordings, which tend to fall in the 5 to 30 second rangeMonoMobile App5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 1920 Male 1535 Unknown 270
SpeechChinese Traditional_SM_48Chinese TraditionalChinese Traditionalzh-TW48 kHzScripted Monologue1,028Chinese TraditionalSingle-utterance recordings, which tend to fall in the 5 to 30 second rangeMonoMobile App5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 1069 Male 262 Unknown 3
SpeechDanish_GC_8DanishDanishda_DK8 kHzGeneral Conversation372Danish General Conversation dataUnscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, DualDesktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale: 311, Male: 417, Unknown: 0
SpeechDanish_MA_16DanishDanishda_DK16 kHzMedia Audio664Danish Media audio dataLicensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoWeb Sourcing5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale: 369, Male: 864, Unknown: 27
SpeechDanish_SM_48DanishDanishda-DK48 kHzScripted Monologue2,579Danish Scripted MonologueSingle-utterance recordings, which tend to fall in the 5 to 30 second range, Danish from DenmarkMonoMobile App5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 1551 Male 1233 Unknown 42
SpeechEnglish Deep South_CC_8English Deep SouthEnglish Deep Southen_US8 kHzCall-center151English Deep South Call-center dataUnscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 221 , Male 1004 , Unknown 7
SpeechEnglish Deep South_GC_8English Deep SouthEnglish Deep Southen_US8 kHzGeneral Conversation56English Deep South General Conversation dataUnscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, DualDesktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 99, Male 31, Unknown 0
SpeechEnglish Deep South_MA_16English Deep SouthEnglish Deep Southen_US16 kHzMedia Audio266English Deep South Media audio dataLicensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoWeb Sourcing5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 204, Male 356, Unknown 21
SpeechGerman_CC_8GermanGermande-De8 kHzCall-center64German Call-center data Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,MonoDesktop.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 478 Male 1440 Unknown 0
SpeechGerman_IVR_8GermanGermande-De8 kHz IVR200German IVR dataHuman to Machine. An IVR type of flow where there is a TTS prompt (e.g. ”How may I help you”) followed by a spontaneous human responseMonoDesktop.wav .jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling Female 10115 Male 8750 Unknown 0
SpeechGujarati_CC_8GujaratiGujarati (In Pipeline) gu_INCall-Center60Gujarati (In Pipeline) Call-Center dataUnscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,Desktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechGujarati_GCGujaratiGujarati (In Pipeline) gu_INGeneral Conversation100Gujarati (In Pipeline) General Conversation dataUnscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,Desktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechGujarati_MAGujaratiGujarati (In Pipeline) gu_INMedia Audio40Gujarati (In Pipeline) Media audio dataLicensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesWeb Sourcing5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechHebrew_General Conversation_8HebrewHebrewhe_IL8 kHzGeneral Conversation399Hebrew General Conversation dataUnscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Hebrew in IsraelDualDesktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 414 , Male 399 , Unknown 1
SpeechHebrew_MA_16HebrewHebrewhe_IL16 kHzMedia Audio427Hebrew Media audio dataLicensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoWeb Sourcing5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 361 , Male 513, Unknown 13
SpeechHindi_MA_16HindiHindihi_IN16 kHzMedia Audio219Hindi Media audio dataLicensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoWeb Sourcing5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 83 , Male 309, Unknown 0
SpeechHindi_SM_48HindiHindihi-IN48 kHzScripted Monologue2,867Hindi Scripted MonologueSingle-utterance recordings, which tend to fall in the 5 to 30 second rangeMonoMobile App5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 1977 Male 1864 Unknown 147
SpeechHINGLISH_CC_8HinglishHinglishhg_IN8 kHzCall-center208HINGLISH Call-center dataUnscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 822, Male 1262 , Unknown 0
SpeechHINGLISH_MA_16HinglishHinglishhg_IN16 kHzMedia Audio216HINGLISH Media audio dataLicensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoWeb Sourcing5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 75, Male 380, Unknown 0
SpeechHispanic_CC_8Hispanic EnglishHispanic Englishen_US8 kHzCall-center212Hispanic Call-center dataUnscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 822, Male 1262, Unknown 0
SpeechHispanic_MA_16Hispanic EnglishHispanic Englishen_US16 kHzMedia Audio155Hispanic Call Media audioLicensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoWeb Sourcing5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 140, Male 219, Unknown 5
SpeechIndonesian_GC_8IndonesianIndonesianid_ID8 kHzGeneral Conversation496Indonesian General Conversation dataUnscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Bahasa IndonesianDualDesktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 524, Male 454, Unknown 2
SpeechIndonesian_MA_16IndonesianIndonesianid_ID16 kHzMedia Audio643Indonesian Media audio dataLicensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoWeb Sourcing5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 746, Male 1507, Unknown 129
SpeechIrish_GC_8IrishIrishen_IE8 kHzGeneral Conversation192Irish General Conversation dataUnscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, DualDesktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 213 , Male 153 , Unknown 0
SpeechJapanese_SM_48JapaneseJapaneseja-JP48 kHzScripted Monologue2,335Japanese Scripted MonologueSingle-utterance recordings, which tend to fall in the 5 to 30 second rangeMonoMobile App5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 1460 Male 1221 Unknown 194
SpeechKannada_CC_8KannadaKannada (In Pipeline) kn_INCall-Center60Kannada (In Pipeline) Call-Center dataUnscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,Desktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechKannada_GCKannadaKannada (In Pipeline) kn_INGeneral Conversation100Kannada (In Pipeline) General Conversation dataUnscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,Desktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechKannada_MAKannadaKannada (In Pipeline) kn_INMedia Audio40Kannada (In Pipeline) Media audio dataLicensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesWeb Sourcing5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechKorean_CC_8KoreanKoreanko_KR8 kHzCall-center107Korean Call-center dataUnscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 1086, Male 210 , Unknown 4
SpeechKorean_MA_16KoreanKoreanko_KR16 kHzMedia Audio204Korean media audio dataLicensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoWeb Sourcing5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 70 Male 303, Unknown 25
SpeechKorean_SM_48KoreanKoreanko-KR48 kHzScripted Monologue1,955Korean Scripted MonologueSingle-utterance recordings, which tend to fall in the 5 to 30 second rangeMonoMobile App5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 1195 Male 1134 Unknown 122
SpeechMalay_GC_8MalayMalayms_MY8 kHzGeneral Conversation266Malay General Conversation dataUnscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Malay in MalaysiaDualDesktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 316, Male 176 , Unknown 0
SpeechMalay_MA_16MalayMalayms_MY16 kHzMedia Audio344Malay Media audio dataLicensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoWeb Sourcing5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 236, Male 626, Unknown 47
SpeechMalayalam_CC_8MalayalamMalayalam (In Pipeline) ml_INCall-Center60Malayalam (In Pipeline) Call-Center dataUnscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,Desktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechMalayalam_GCMalayalamMalayalam (In Pipeline) ml_INGeneral Conversation100Malayalam (In Pipeline) General Conversation dataUnscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,Desktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechMalayalam_MAMalayalamMalayalam (In Pipeline) ml_INMedia Audio40Malayalam (In Pipeline) Media audio dataLicensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesWeb Sourcing5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechMarathi_CC_8MarathiMarathi (In Pipeline) mr_INCall-Center60Marathi (In Pipeline) Call-Center dataUnscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,Desktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechMarathi_GCMarathiMarathi (In Pipeline) mr_INGeneral Conversation100Marathi (In Pipeline) General Conversation dataUnscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,Desktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechMarathi_MAMarathiMarathi (In Pipeline) mr_INMedia Audio40Marathi (In Pipeline) Media audio dataLicensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesWeb Sourcing5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechMexican_SM_48Spanish (Mexico)Spanish (Mexico)es-MX48 kHzScripted Monologue1,492Mexican Spanish Scripted MonologueSingle-utterance recordings, which tend to fall in the 5 to 30 second rangeMonoMobile App5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 1016 Male 1069 Unknown 95
SpeechNetherlands_SM_48DutchDutchnl-NL48 kHzScripted Monologue1,205Dutch Scripted MonologueSingle-utterance recordings, which tend to fall in the 5 to 30 second rangeMonoMobile App5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 1285 Male 531 Unknown 3
SpeechNew York English_CC_8New York EnglishNew York Englishen_US8 kHzCall-center103New York English Call-center dataUnscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 610, Male 532, Unknow 0
SpeechNew York English_GC_8New York EnglishNew York Englishen_US8 kHzGeneral Conversation107New York English General Conversation dataUnscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, DualDesktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 118, Male 114, Unknown 0
SpeechNew York English_MA_16New York EnglishNew York Englishen_US16 kHzMedia Audio140New York English Media audio dataLicensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoWeb Sourcing5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 66, Male 230, Unknown 11
SpeechNew Zealand_GC_8New Zealand English New Zealand English en_NZ8 kHzGeneral Conversation148New Zealand English General Conversation dataUnscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, DualDesktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 167, male 121, Unknown 4
SpeechNew Zealand_MA_16New Zealand English New Zealand English en_NZ16 kHzMedia Audio400New Zealand English Media audioLicensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoWeb Sourcing5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 367, male 678, Unknown 26
SpeechOriya_CC_8OriyaOriya (In Pipeline) or_INCall-Center60Oriya (In Pipeline) Call-Center dataUnscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,Desktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechOriya_GCOriyaOriya (In Pipeline) or_INGeneral Conversation100Oriya (In Pipeline) General Conversation dataUnscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,Desktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechOriya_MAOriyaOriya (In Pipeline) or_INMedia Audio40Oriya (In Pipeline) Media audio dataLicensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesWeb Sourcing5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechPolish_MA_16PolishPolishpl_PL16 kHzMedia Audio269Polish Media audioLicensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoWeb Sourcing5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 173 Male 354 Unknown 6
SpeechPolish Poland_SM_48Polish (Poland)Polish (Poland)pl-PL48 kHzScripted Monologue1,482Polish Poland - Scripted MonologueSingle-utterance recordings, which tend to fall in the 5 to 30 second rangeMonoMobile App5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 1324 Male 701 Unknown 24
SpeechPunjabi_CC_8PunjabiPunjabi (In Pipeline) PunjabiCall-Center60Punjabi (In Pipeline) Call-Center dataUnscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,Desktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechPunjabi_GCPunjabiPunjabi (In Pipeline) PunjabiGeneral Conversation100Punjabi (In Pipeline) General Conversation dataUnscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,Desktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechPunjabi_MAPunjabiPunjabi (In Pipeline) Punjabi Media Audio40Punjabi (In Pipeline) Media audio dataLicensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesWeb Sourcing5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechRussian_SM_48RussianRussianru-RU48 kHzScripted Monologue2,398Russian Scripted MonologueSingle-utterance recordings, which tend to fall in the 5 to 30 second rangeMonoMobile App5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 1689 Male 1937 Unknown 214
SpeechScottish_GC_8Scottish (English Accent)Scottish (English Accent)en_AB8 kHzGeneral Conversation292Scottish General Conversation dataUnscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, DualDesktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 285 , Male 260, Unknown 3
SpeechSingapore_CC_8Singapore EnglishSingapore Englishen_SG8 kHzCall-center218Singapore Call-center dataUnscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 2139 , Male 884, Unknown 21
SpeechSingapore_MA_16Singapore EnglishSingapore Englishen_SG16 kHzMedia Audio247Singapore Media audio dataLicensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoWeb Sourcing5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 160, Male 455, Unknown 37
SpeechSouth African English_CC_8South African EnglishSouth African Englishen_ZA8 kHzCall-center261South African English Call-center dataUnscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 1274 , Male 935 , Unknown 1
SpeechSouth African English_MA_16South African EnglishSouth African Englishen_ZA16 kHzMedia Audio251South African English Media audio dataLicensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoWeb Sourcing5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 235, Male 432, Unknown 36
SpeechSwahili_CC_8SwahiliSwahilisw_KE8 kHzCall-center230Swahili Call-center dataUnscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 611, Male 833, Unknown 0
SpeechSwahili_MA_16SwahiliSwahilisw_KE16 kHzMedia Audio265Swahili Media audio dataLicensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoWeb Sourcing5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 118, Male 493, Unknown 25
SpeechSwedish_CC_8SwedishSwedishsv_SE8 kHzCall-center250Swedish Call-center dataUnscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 1581, male 727, Unknown 2
SpeechSwedish_MA_16SwedishSwedishsv_SE16 kHzMedia Audio278Swedish Media audio dataLicensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoWeb Sourcing5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 195, male 500, Unknown 21
SpeechTamil_CC_8TamilTamil (In Pipeline) ta_INCall-Center60Tamil (In Pipeline) Call-Center dataUnscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,Desktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechTamil_GCTamilTamil (In Pipeline) ta_INGeneral Conversation100Tamil (In Pipeline) General Conversation dataUnscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,Desktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechTamil_MATamil Tamil (In Pipeline) ta_INMedia Audio40Tamil (In Pipeline) Media audio dataLicensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesWeb Sourcing5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechTelugu_GC_8TeluguTelugute_IN8 kHzGeneral Conversation553Telugu General Conversation dataUnscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, DualDesktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 574 , Male 564, Unknown 0
SpeechTelugu_MA_16TeluguTelugute_IN16 kHzMedia Audio648Telugu Media audio dataLicensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoWeb Sourcing5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 207, Male 963, Unknown 2
SpeechTelugu_CC_8TeluguTelugu (In Pipeline) te_INCall-Center30Telugu (In Pipeline) Call-Center dataUnscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,Desktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechTelugu_GCTeluguTelugu (In Pipeline) te_INGeneral Conversation50Telugu (In Pipeline) General Conversation dataUnscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,Desktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechTelugu_MATeluguTelugu (In Pipeline) te_INMedia Audio20Telugu (In Pipeline) Media audio dataLicensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesWeb Sourcing5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechThai_GC_8ThaiThaith_TH8 kHzGeneral Conversation183Thai General ConversationUnscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, An informal register used between friendsDualDesktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 338, Male 96, Unknown 8
SpeechThai_MA_8ThaiThaith_TH16 kHzMedia Audio173Thai Media audioLicensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoWeb Sourcing5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 143, Male 502, Unknown 26
SpeechTurkish Turkey_SM_48Turkish TurkeyTurkish Turkeytr-TR48 kHzScripted Monologue2,027Turkish TurkeySingle-utterance recordings, which tend to fall in the 5 to 30 second rangeMonoMobile App5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 1561 Male 1241 Unknown 31
SpeechVietnamese_GC_8VietnameseVietnamesevi_VN8 kHzGeneral Conversation295Vietnamese General Conversation dataUnscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, Northern (e.g.,Hanoi), Central, and Southern (e.g., Ho Chi Minh City).DualDesktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 400, male 380, Unknowns 2
SpeechVietnamese_MA_16VietnameseVietnamesevi_VN16 kHzMedia Audio257Vietnamese Media audio dataLicensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoWeb Sourcing5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 249, male 200, Unknowns 45
SpeechWelsh_GC_8Welsh (English Accent)Welsh (English Accent)en_WL8 kHzGeneral Conversation278Welsh General Conversation dataUnscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingFemale 270, Male 324, Unknown 0
SpeechUK English_WW_16UK EnglishUK Englishen_uk16 kHzWake Word200Wake Word UK Englishkeyphrases collection of data
  • 200 speakers
  • 4 unique keyphrases per speaker
  • 25-30 repeated keyphrases recordings per unique keyphrase
  • 25-30 audio files per unique keyphrase
  • 120 total recorded utterances per speaker
1 channelMobile App5.0.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language ModellingGender: 50% male, 50% female, +/- 10%.

Ground Truth Audio & Speech Data to accelerate your Conversational AI Development

With over 40k hours of audio dataset/voice dataset, Shaip can help you scale your conversational AI models with high-quality speech datasets. The gold-standard voice datasets are collected in multiple languages and dialects, demographics, speaker traits, dialogue types, environments, and scenarios. If you can’t find what you are looking for? – Shaip can help you with any voice dataset in any gender, age, language, or settings.

Few of the Language Datasets we support: We have datasets on all major languages and dialects. Some of our most popular languages include:

Afrikan Voice Datasets

Arabic Voice Datasets

Canadian Voice Datasets

Chinese Voice Datasets

Danish Voice Datasets

English Voice Datasets

German Voice Datasets

Hebrew Voice Datasets

Indonesian Voice Datasets

Irish Voice Datasets

Japanese Voice Datasets

Korean Voice Datasets

Mexican Voice Datasets

Polish Voice Datasets

Russian Voice Datasets

Scottish Voice Datasets

Spanish Voice Datasets

Swedish Voice Datasets

Thai Voice Datasets

Turkish Voice Datasets

Vietnamese Voice Datasets

Dataset Description

Call Center Conversations 8khz: Unscripted, synthetic telephonic conversation: “agent” & “customer”

Generic Conversations 8khz: Unscripted telephonic conversation between 2 people

Media & Podcasts 16khz: Public domain audio/video interviews, podcasts, etc. 1-5 people

Utterance/Scripted Monologue 16khz: Recording based on Prompt 

Shaip Contact Us

Can’t find what you are looking for?

New off-the-shelf audio & speech datasets are being collected across all data types 

Contact us now to let go of your audio/speech training data collection worries

  • By registering, I agree with Shaip Privacy Policy and Terms of Service and provide my consent to receive B2B marketing communication from Shaip.