High-quality Audio / Speech / Voice Datasets to Train Your Conversational AI Model 

Off-the-shelf Voice / Speech / Audio Datasets in multiple languages to jump start your automatic speech recognition (ASR) models

Speech Datasets

Plug-in the audio data catalog you’ve been missing today

DetailsLanguage DatasetSample RateDataset TypeTotal Audio HoursTotal Speech HoursDataset descriptionAudio ChannelRecording PlatformWER (%)Audio FormatTranscription FormatUse CaseCTA
SpeechAfrican AmericanAfrican American Vernacular8 kHzCall-Center214211Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechAfrican AmericanAfrican American Vernacular16 kHzMedia Audio159149Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechAfrikaansAfrikaans8 kHzGeneral Conversation368404Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Afrikaans spoken in AfricaDualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechAfrikaansAfrikaans16 kHzMedia Audio658615Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechArabicArabic8 kHzGeneral Conversation293297Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Arabic from Gulf countriesDualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechBostonBoston8 kHzCall-Center177175Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechBostonBoston8 kHzGeneral Conversation3232Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechBostonBoston16 kHzMedia Audio9393Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechChinese EnglishChinese English8 kHzCall-Center169130Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechChinese EnglishChinese English16 kHzMedia Audio249236Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechDanishDanish8 kHzGeneral Conversation372395Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechDanishDanish16 kHzMedia Audio664603Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechEnglishEnglish16 kHzMedia Audio109Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechEnglish Deep SouthEnglish Deep South8 kHzCall-Center151149Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechEnglish Deep SouthEnglish Deep South8 kHzGeneral Conversation5656Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechEnglish Deep SouthEnglish Deep South16 kHzMedia Audio266248Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechHebrewHebrew8 kHzGeneral Conversation399397Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Hebrew in IsraelDualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechHebrewHebrew16 kHzMedia Audio427400Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechHinglishHinglish8 kHzCall-Center208185Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechHinglishHinglish16 kHzMedia Audio216219Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechHispanic EnglishHispanic English8 kHzCall-Center212209Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechHispanic EnglishHispanic English16 kHzMedia Audio155150Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechIndian EnglishIndian English16 kHzMedia Audio13787Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechIndonesianIndonesian8 kHzGeneral Conversation496598Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Bahasa IndonesianDualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechIndonesianIndonesian16 kHzMedia Audio643610Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechIrishIrish8 kHzGeneral Conversation192180Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechKoreanKorean8 kHzCall-Center107103Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechKoreanKorean16 kHzMedia Audio204197Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechMalayMalay8 kHzGeneral Conversation266302Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Malay in MalaysiaDualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechMalayMalay16 kHzMedia Audio344305Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechNew Zealand English New Zealand English 8 kHzGeneral Conversation148142Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechNew Zealand English New Zealand English 16 kHzMedia Audio400400Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechNew York EnglishNew York English8 kHzCall-Center103103Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechNew York EnglishNew York English8 kHzGeneral Conversation107106Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechNew York EnglishNew York English16 kHzMedia Audio140140Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechPolishPolish16 kHzMedia Audio269255Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechScottishScottish8 kHzGeneral Conversation292267Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechSingapore EnglishSingapore English8 kHzCall-Center218194Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechSingapore EnglishSingapore English16 kHzMedia Audio247240Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechSouth African EnglishSouth African English8 kHzCall-Center261204Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechSouth African EnglishSouth African English16 kHzMedia Audio251245Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechSpanishSpanish16 kHzMedia Audio32Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechSwahiliSwahili8 kHzCall-Center184165Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechSwahiliSwahili8 kHzCall-Center4644Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechSwahiliSwahili16 kHzMedia Audio203191Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechSwahiliSwahili16 kHzMedia Audio6258Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechSwedishSwedish8 kHzCall-Center250224Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechSwedishSwedish16 kHzMedia Audio278255Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechTeluguTelugu8 kHzGeneral Conversation553582Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechTeluguTelugu16 kHzMedia Audio648599Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechThaiThai8 kHzGeneral Conversation183201Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, An informal register used between friendsDualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechThaiThai16 kHzMedia Audio173167Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechVietnameseVietnamese8 kHzGeneral Conversation295293Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, Northern (e.g.,Hanoi), Central, and Southern (e.g., Ho Chi Minh City).DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechVietnameseVietnamese16 kHzMedia Audio257248Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechWelshWelsh8 kHzGeneral Conversation278299Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechIndian EnglishIndian English8 kHzCall-Center200200Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,MonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechTelugu Telugu NACall-Center3030Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,NADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechTamil Tamil NACall-Center6060Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,NADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechKannada Kannada NACall-Center6060Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,NADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechMalayalam Malayalam NACall-Center6060Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,NADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechBengali Bengali NACall-Center6060Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,NADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechGujarati Gujarati NACall-Center6060Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,NADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechMarathi Marathi NACall-Center6060Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,NADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechAssamese Assamese NACall-Center6060Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,NADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechOriya Oriya NACall-Center6060Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,NADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechPunjabi Punjabi NACall-Center6060Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,NADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechTelugu Telugu NAGeneral Conversation5050Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,NADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechTamil Tamil NAGeneral Conversation100100Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,NADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechKannada Kannada NAGeneral Conversation100100Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,NADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechMalayalam Malayalam NAGeneral Conversation100100Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,NADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechBengali Bengali NAGeneral Conversation100100Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,NADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechGujarati Gujarati NAGeneral Conversation100100Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,NADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechMarathi Marathi NAGeneral Conversation100100Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,NADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechAssamese Assamese NAGeneral Conversation100100Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,NADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechOriya Oriya NAGeneral Conversation100100Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,NADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechPunjabi Punjabi NAGeneral Conversation100100Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,NADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechTelugu Telugu NAMedia Audio2020Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesNADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechTamil Tamil NAMedia Audio4040Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesNADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechKannada Kannada NAMedia Audio4040Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesNADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechMalayalam Malayalam NAMedia Audio4040Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesNADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechBengali Bengali NAMedia Audio4040Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesNADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechGujarati Gujarati NAMedia Audio4040Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesNADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechMarathi Marathi NAMedia Audio4040Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesNADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechAssamese Assamese NAMedia Audio4040Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesNADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechOriya Oriya NAMedia Audio4040Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesNADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechPunjabiPunjabiNAMedia Audio4040Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesNADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechEnglish USEnglish US48 kHzScripted Monologue54Single-utterance recordings, which tend to fall in the 5 to 30 second rangeMonoMobile App5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechSpanish SpainSpanish Spain48 kHzScripted Monologue108Single-utterance recordings, which tend to fall in the 5 to 30 second rangeMonoMobile App5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechMexicanMexican48 kHzScripted Monologue1,4921,228Single-utterance recordings, which tend to fall in the 5 to 30 second rangeMonoMobile App5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechCanadianCanadian48 kHzScripted Monologue1,2221,049Single-utterance recordings, which tend to fall in the 5 to 30 second rangeMonoMobile App5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechNetherlandsNetherlands48 kHzScripted Monologue1,2051,021Single-utterance recordings, which tend to fall in the 5 to 30 second rangeMonoMobile App5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechPolish PolandPolish Poland48 kHzScripted Monologue1,4821,266Single-utterance recordings, which tend to fall in the 5 to 30 second rangeMonoMobile App5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechTurkish TurkeyTurkish Turkey48 kHzScripted Monologue2,0271,735Single-utterance recordings, which tend to fall in the 5 to 30 second rangeMonoMobile App5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechChinese TraditionalChinese Traditional48 kHzScripted Monologue1,028891Single-utterance recordings, which tend to fall in the 5 to 30 second rangeMonoMobile App5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechArabicArabic48 kHzScripted Monologue1,9471,594Single-utterance recordings, which tend to fall in the 5 to 30 second rangeMonoMobile App5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechDanishDanish48 kHzScripted Monologue2,5792,041Single-utterance recordings, which tend to fall in the 5 to 30 second range, Danish from DenmarkMonoMobile App5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechHindiHindi8 kHzCall-center122131Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechHindiHindi16 kHzMedia audio219202Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechHindiHindi48 kHzScripted Monologue2,8672,105Single-utterance recordings, which tend to fall in the 5 to 30 second rangeMonoMobile App5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechJapaneseJapanese48 kHzScripted Monologue2,3352,029Single-utterance recordings, which tend to fall in the 5 to 30 second rangeMonoMobile App5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechKoreanKorean48 kHzScripted Monologue1,9551,548Single-utterance recordings, which tend to fall in the 5 to 30 second rangeMonoMobile App5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechRussianRussian48 kHzScripted Monologue2,3982,046Single-utterance recordings, which tend to fall in the 5 to 30 second rangeMonoMobile App5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechChinese SimplifiedChinese Simplified48 kHzScripted Monologue2,7622,181Single-utterance recordings, which tend to fall in the 5 to 30 second rangeMonoMobile App5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechGermanGerman8 kHzCall-Center640Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling

Ground Truth Audio & Speech Data to accelerate your Conversational AI Development

With over 40k hours of audio dataset/voice dataset, Shaip can help you scale your conversational AI models with high-quality speech datasets. The gold-standard voice datasets are collected in multiple languages and dialects, demographics, speaker traits, dialogue types, environments, and scenarios. If you can’t find what you are looking for? – Shaip can help you with any voice dataset in any gender, age, language, or settings.

Few of the Language Datasets we support: We have datasets on all major languages and dialects. Some of our most popular languages include:

Afrikan Voice Datasets

Arabic Voice Datasets

Canadian Voice Datasets

Chinese Voice Datasets

Danish Voice Datasets

English Voice Datasets

German Voice Datasets

Hebrew Voice Datasets

Indonesian Voice Datasets

Irish Voice Datasets

Japanese Voice Datasets

Korean Voice Datasets

Mexican Voice Datasets

Polish Voice Datasets

Russian Voice Datasets

Scottish Voice Datasets

Spanish Voice Datasets

Swedish Voice Datasets

Thai Voice Datasets

Turkish Voice Datasets

Vietnamese Voice Datasets

Dataset Description

Call Center Conversations 8khz: Unscripted, synthetic telephonic conversation: “agent” & “customer”

Generic Conversations 8khz: Unscripted telephonic conversation between 2 people

Media & Podcasts 16khz: Public domain audio/video interviews, podcasts, etc. 1-5 people

Utterance/Scripted Monologue 16khz: Recording based on Prompt 

Shaip Contact Us

Can’t find what you are looking for?

New off-the-shelf audio & speech datasets are being collected across all data types 

Contact us now to let go of your audio/speech training data collection worries

  • By registering, I agree with Shaip Privacy Policy and Terms of Service and provide my consent to receive B2B marketing communication from Shaip.