High-quality Audio / Speech / Voice Datasets to Train Your Conversational AI Model 

Off-the-shelf Voice / Speech / Audio Datasets in multiple languages to jump start your automatic speech recognition (ASR) models

Speech Datasets

Plug-in the audio data catalog you’ve been missing today

DetailsLanguage DatasetSample RateDataset TypeTotal Audio HoursTotal Speech HoursDataset descriptionAudio ChannelRecording PlatformWER (%)Audio FormatTranscription FormatUse CaseCTA
SpeechAfrican AmericanAfrican American Vernacular8 kHzCall-Center214211Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechAfrican AmericanAfrican American Vernacular16 kHzMedia Audio159149Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechAfrikaansAfrikaans8 kHzGeneral Conversation368404Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Afrikaans spoken in AfricaDualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechAfrikaansAfrikaans16 kHzMedia Audio658615Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechArabicArabic8 kHzGeneral Conversation293297Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Arabic from Gulf countriesDualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechBostonBoston8 kHzCall-Center177175Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechBostonBoston8 kHzGeneral Conversation3232Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechBostonBoston16 kHzMedia Audio9393Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechChinese EnglishChinese English8 kHzCall-Center169130Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechChinese EnglishChinese English16 kHzMedia Audio249236Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechDanishDanish8 kHzGeneral Conversation372395Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechDanishDanish16 kHzMedia Audio664603Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechEnglishEnglish16 kHzMedia Audio109Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechEnglish Deep SouthEnglish Deep South8 kHzCall-Center151149Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechEnglish Deep SouthEnglish Deep South8 kHzGeneral Conversation5656Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechEnglish Deep SouthEnglish Deep South16 kHzMedia Audio266248Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechHebrewHebrew8 kHzGeneral Conversation399397Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Hebrew in IsraelDualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechHebrewHebrew16 kHzMedia Audio427400Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechHinglishHinglish8 kHzCall-Center208185Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechHinglishHinglish16 kHzMedia Audio216219Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechHispanic EnglishHispanic English8 kHzCall-Center212209Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechHispanic EnglishHispanic English16 kHzMedia Audio155150Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechIndian EnglishIndian English16 kHzMedia Audio13787Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechIndonesianIndonesian8 kHzGeneral Conversation496598Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Bahasa IndonesianDualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechIndonesianIndonesian16 kHzMedia Audio643610Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechIrishIrish8 kHzGeneral Conversation192180Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechKoreanKorean8 kHzCall-Center107103Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechKoreanKorean16 kHzMedia Audio204197Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechMalayMalay8 kHzGeneral Conversation266302Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Malay in MalaysiaDualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechMalayMalay16 kHzMedia Audio344305Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechNew Zealand English New Zealand English 8 kHzGeneral Conversation148142Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechNew Zealand English New Zealand English 16 kHzMedia Audio400400Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechNew York EnglishNew York English8 kHzCall-Center103103Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechNew York EnglishNew York English8 kHzGeneral Conversation107106Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechNew York EnglishNew York English16 kHzMedia Audio140140Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechPolishPolish16 kHzMedia Audio269255Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechScottishScottish8 kHzGeneral Conversation292267Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechSingapore EnglishSingapore English8 kHzCall-Center218194Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechSingapore EnglishSingapore English16 kHzMedia Audio247240Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechSouth African EnglishSouth African English8 kHzCall-Center261204Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechSouth African EnglishSouth African English16 kHzMedia Audio251245Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechSpanishSpanish16 kHzMedia Audio32Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechSwahiliSwahili8 kHzCall-Center184165Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechSwahiliSwahili8 kHzCall-Center4644Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechSwahiliSwahili16 kHzMedia Audio203191Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechSwahiliSwahili16 kHzMedia Audio6258Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechSwedishSwedish8 kHzCall-Center250224Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechSwedishSwedish16 kHzMedia Audio278255Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechTeluguTelugu8 kHzGeneral Conversation553582Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechTeluguTelugu16 kHzMedia Audio648599Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechThaiThai8 kHzGeneral Conversation183201Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, An informal register used between friendsDualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechThaiThai16 kHzMedia Audio173167Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechVietnameseVietnamese8 kHzGeneral Conversation295293Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, Northern (e.g.,Hanoi), Central, and Southern (e.g., Ho Chi Minh City).DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechVietnameseVietnamese16 kHzMedia Audio257248Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechWelshWelsh8 kHzGeneral Conversation278299Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechIndian EnglishIndian English8 kHzCall-Center200200Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,MonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechTelugu Telugu NACall-Center3030Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,NADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechTamil Tamil NACall-Center6060Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,NADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechKannada Kannada NACall-Center6060Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,NADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechMalayalam Malayalam NACall-Center6060Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,NADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechBengali Bengali NACall-Center6060Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,NADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechGujarati Gujarati NACall-Center6060Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,NADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechMarathi Marathi NACall-Center6060Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,NADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechAssamese Assamese NACall-Center6060Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,NADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechOriya Oriya NACall-Center6060Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,NADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechPunjabi Punjabi NACall-Center6060Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,NADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechTelugu Telugu NAGeneral Conversation5050Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,NADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechTamil Tamil NAGeneral Conversation100100Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,NADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechKannada Kannada NAGeneral Conversation100100Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,NADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechMalayalam Malayalam NAGeneral Conversation100100Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,NADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechBengali Bengali NAGeneral Conversation100100Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,NADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechGujarati Gujarati NAGeneral Conversation100100Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,NADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechMarathi Marathi NAGeneral Conversation100100Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,NADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechAssamese Assamese NAGeneral Conversation100100Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,NADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechOriya Oriya NAGeneral Conversation100100Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,NADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechPunjabi Punjabi NAGeneral Conversation100100Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,NADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechTelugu Telugu NAMedia Audio2020Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesNADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechTamil Tamil NAMedia Audio4040Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesNADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechKannada Kannada NAMedia Audio4040Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesNADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechMalayalam Malayalam NAMedia Audio4040Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesNADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechBengali Bengali NAMedia Audio4040Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesNADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechGujarati Gujarati NAMedia Audio4040Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesNADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechMarathi Marathi NAMedia Audio4040Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesNADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechAssamese Assamese NAMedia Audio4040Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesNADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechOriya Oriya NAMedia Audio4040Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesNADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechPunjabiPunjabiNAMedia Audio4040Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesNADesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechEnglish USEnglish US48 kHzScripted Monologue54Single-utterance recordings, which tend to fall in the 5 to 30 second rangeMonoMobile App5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechSpanish SpainSpanish Spain48 kHzScripted Monologue108Single-utterance recordings, which tend to fall in the 5 to 30 second rangeMonoMobile App5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechMexicanMexican48 kHzScripted Monologue1,4921,228Single-utterance recordings, which tend to fall in the 5 to 30 second rangeMonoMobile App5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechCanadianCanadian48 kHzScripted Monologue1,2221,049Single-utterance recordings, which tend to fall in the 5 to 30 second rangeMonoMobile App5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechNetherlandsNetherlands48 kHzScripted Monologue1,2051,021Single-utterance recordings, which tend to fall in the 5 to 30 second rangeMonoMobile App5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechPolish PolandPolish Poland48 kHzScripted Monologue1,4821,266Single-utterance recordings, which tend to fall in the 5 to 30 second rangeMonoMobile App5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechTurkish TurkeyTurkish Turkey48 kHzScripted Monologue2,0271,735Single-utterance recordings, which tend to fall in the 5 to 30 second rangeMonoMobile App5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechChinese TraditionalChinese Traditional48 kHzScripted Monologue1,028891Single-utterance recordings, which tend to fall in the 5 to 30 second rangeMonoMobile App5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechArabicArabic48 kHzScripted Monologue1,9471,594Single-utterance recordings, which tend to fall in the 5 to 30 second rangeMonoMobile App5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechDanishDanish48 kHzScripted Monologue2,5792,041Single-utterance recordings, which tend to fall in the 5 to 30 second range, Danish from DenmarkMonoMobile App5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechHindiHindi8 kHzCall-center122131Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechHindiHindi16 kHzMedia audio219202Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutesMonoDesktop5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechHindiHindi48 kHzScripted Monologue2,8672,105Single-utterance recordings, which tend to fall in the 5 to 30 second rangeMonoMobile App5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechJapaneseJapanese48 kHzScripted Monologue2,3352,029Single-utterance recordings, which tend to fall in the 5 to 30 second rangeMonoMobile App5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechKoreanKorean48 kHzScripted Monologue1,9551,548Single-utterance recordings, which tend to fall in the 5 to 30 second rangeMonoMobile App5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechRussianRussian48 kHzScripted Monologue2,3982,046Single-utterance recordings, which tend to fall in the 5 to 30 second rangeMonoMobile App5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechChinese SimplifiedChinese Simplified48 kHzScripted Monologue2,7622,181Single-utterance recordings, which tend to fall in the 5 to 30 second rangeMonoMobile App5.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
SpeechGermanGerman8 kHzCall-Center640Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes,DualDesktop.wav.jsonASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling

Description

Call Center Conversations 8khz: Unscripted, synthetic telephonic conversation: “agent” & “customer”

Generic Conversations 8khz: Unscripted telephonic conversation between 2 people

Media & Podcasts 16khz: Public domain audio/video interviews, podcasts etc. 1-5 people

Utterance/Scripted Monologue 16khz: Recording based on Prompt 

Shaip Contact Us

Can’t find what you are looking for?

New off-the-shelf audio & speech datasets are being collected across all data types 

Contact us now to let go of your audio/speech training data collection worries

  • By registering, I agree with Shaip Privacy Policy and Terms of Service and provide my consent to receive B2B marketing communication from Shaip.