Most Trusted Speech Data Collection Services for your AI
Train your NLP models, VAs, TTS prototypes, and more with quality conversational data, with our audio and speech data collection services
Discover audio data pipelines without bottlenecks.
Featured Clients
Why Speech Training Dataset is needed for Natural Language Processing?
Have you ever noticed your smartphone VA, i.e. Siri, Bixby, or anything else, interacting? The way they answer every question and analyze and present results as per your requirements!
Well, as much as these VAs intrigue us, these intelligent resources and programs need to be trained progressively to be able to respond, as accurately. This is the reason why you should consider outsourcing speech/audio, & voice data collection to specialized data collection companies, with validating professional expertise.
Investing in audio data collection prepares your purported NLP to cater to a multilingual audience. Not just that, speech data collection for NLP, as and when handled by an expert, even takes in-field collection, semantic analysis, and audio transcription into account. With professional speech data collection solutions, you can:
- Procure high-quality audio datasets to improve accuracy
- Target diverse scenario setup
- Collect multilingual AI training data
- Scale your ML model to suit diverse demographics and verticals
Professional Audio / Voice Data Collection Services for NLP
Any subject. Any scenario.
Intelligent NLP systems are anything but generic. Depending on the functionality of the program, you might have to focus on spatial and multilingual audio data services, which can only be offered by reputed voice/audio data collection companies. This is where Shaip comes into the scheme of things as a highly reliable data connection service provider that takes pride in doing the heavy lifting for your supposedly intelligent AIs.
At Shaip, our primary focus is on feeding models with the highest possible volume of custom speech samples, in the least possible time. With us on board, you can expect:
- Curated audio / voice data collection for NLP
- Tailor-made programs that respond as per specific use cases
- Making audio dataset mining ready
- Pattern-specific and automated data processing
- Highest possible level of domain specificity
- Faster time to market with accelerated AI models
Our Expertise
Align Audio Data to Prepare Smart NLP Models
Shaip offers end-to-end speech/audio data collection services in over 100+ languages to enable voice-enabled technologies to cater to a diverse set of audiences across the globe. We can work on projects of any scope and size; from licensing existing off-the-shelf audio datasets, to managing custom audio data collection, to audio transcription and annotation. No matter how big is your speech data collection project, we can customize the audio collection services to suit your needs to build high-quality NLP datasets that target dialects, tones, and languages. Choose from our wide range of speech datasets and audio data collection resources, for voice-enabling intelligent setups.
Monologue Speech Collection
Handle speech-based requirements pertaining to a standalone speaker for your Text-to-Speed prototypes and transcriptions-specific requirements with scripted prompt feeding, via single-channel files.
Dialogue Speech
Collection
Set up intelligent Virtual Assistants, speed-specific chatbots, and Automatic Speech Recognition models with multilingual exposure via dual-channel files and transcribed resources.
Acoustic Data
Collection
We can professionally record studio-quality audio data be it restaurants, offices, or homes or from various environments and languages, through our global network of collaborators, whilst covering a wider acoustic range
Natural Language Utterance Collection
Train smart commercial setups to identify differently uttered customer phrases with similar meaning, for making the AIs more autonomous in time
Digital / Virtual
Assistants
Focus on building your upcoming Virtual Assistant by training models with caveats of human speech, multilingual exposure, contextual analysis, and NLU.
Automatic Speech Recognition (ASR)
Improve accuracy of your automatic speech recognition (ASR) systems by having access to state-of-art diversified speech/audio datasets, from a wide array of demographics.
Multilingual Speech/Audio Training Data
Our highly skilled language professionals across the globe, offer Multilingual audio/speech training data in multiple languages & dialects including Arabic, Danish, Chinese, Afrikaans, Singapore, New Zealand, Hebrew, Indonesian, Irish, Korean, Malay, Polish, Scottish, Swedish, French, German, Vietnamese, Thai, Italian, Spanish & more.
Text-to-Speech
(TTS)
To offer a better user experience with TTS, developing a system to sound natural is critical. Build a text-to-speech (TTS) multilingual model with the help of our global workforce, who help you collect speech data in 150+ languages & dialects to enhance your AI models from in-car controls to chatbots and learning solutions with high-quality audio data.
Reasons to choose Shaip as your Trustworthy Speech Data Collection Partner
People
Dedicated and trained teams:
- 30,000+ collaborators for Data Creation, Labeling & QA
- Credentialed Project Management Team
- Experienced Product Development Team
- Talent Pool Sourcing & Onboarding Team
Process
Highest process efficiency is assured with:
- Robust 6 Sigma Stage-Gate Process
- A dedicated team of 6 Sigma black belts – Key process owners & Quality compliance
- Continuous Improvement & Feedback Loop
Platform
The patented platform offers benefits:
- Web-based end-to-end platform
- Impeccable Quality
- Faster TAT
- Seamless Delivery
People
Dedicated and trained teams:
- 30,000+ collaborators for Data Creation, Labeling & QA
- Credentialed Project Management Team
- Experienced Product Development Team
- Talent Pool Sourcing & Onboarding Team
Process
Highest process efficiency is assured with:
- Robust 6 Sigma Stage-Gate Process
- A dedicated team of 6 Sigma black belts – Key process owners & Quality compliance
- Continuous Improvement & Feedback Loop
Platform
The patented platform offers benefits:
- Web-based end-to-end platform
- Impeccable Quality
- Faster TAT
- Seamless Delivery
Language: Audio Datasets Collected
Off-the-Shelf Speech / Audio Datasets
Details | Corpus ID (Unique) | Keyword | Language Dataset | Language code | Sample Rate | Dataset Type | Total Audio Hours | Short Description | Dataset Description | Audio Channel | Recording Platform | WER (%) | Audio Format | Transcription Format | Use Case | Number of Speakers | CTA |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
en_US_CC_8 | African American Vernacular | African American Vernacular | en_US | 8 kHz | Call-center | 211 | African American Vernacular Call-center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female: 612, Male: 1242, and Unknown: 12 | Contact | |
en_US_MA_16 | African American Vernacular | African American Vernacular | en_US | 16 kHz | Media Audio | 154 | African American Vernacular Media data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female: 151, Male: 150, and Unknown: 10 | Contact | |
Afrikaans_GC_8 | Afrikaans | Afrikaans | af_ZA | 8 kHz | General Conversation | 368 | Afrikaans General Conversation data | Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Afrikaans spoken in Africa | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female: 502, Male: 390, and Unknown: 2 | Contact | |
Afrikaans_MA_16 | Afrikaans | Afrikaans | af_ZA | 16 kHz | Media Audio | 658 | Afrikaans Media Files | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female: 750, Male: 1278, and Unknown: 52 | Contact | |
Arabic_GC_8 | Arabic | Arabic | ar_AE | 8 kHz | General Conversation | 292 | Arabic General Conversation data | Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Arabic from Gulf countries | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female: 171, Male: 534, and Unknown: 1 | Contact | |
Arabic_SM_48 | Arabic | Arabic | ar-SA | 48 kHz | Scripted Monologue | 1,947 | Arabic Scripted Monologue | Single-utterance recordings, which tend to fall in the 5 to 30 second range | Mono | Mobile App | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 838 Male 1209 Unknown 78 | Contact | |
Assamese_CC_8 | Assamese | Assamese (In Pipeline) | as_IN | Call-Center | 60 | Assamese (In Pipeline) Call-Center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Assamese_GC | Assamese | Assamese (In Pipeline) | as_IN | General Conversation | 100 | Assamese (In Pipeline) General Conversation data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Assamese_MA | Assamese | Assamese (In Pipeline) | as_IN | Media Audio | 40 | Assamese (In Pipeline) Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Bengali_CC_8 | Bengali | Bengali (In Pipeline) | bn_IN | Call-Center | 60 | Bengali (In Pipeline) Call-Center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Bengali_GC | Bengali | Bengali (In Pipeline) | bn_IN | General Conversation | 100 | Bengali (In Pipeline) General Conversation data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Bengali_MA | Bengali | Bengali (In Pipeline) | bn_IN | Media Audio | 40 | Bengali (In Pipeline) Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Boston_CC_8 | Boston English | Boston English | en_US | 8 kHz | Call-Center | 177 | Boston Call-center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female: 605, Male: 711, and Unknown: 0 | Contact | |
Boston_GC_8 | Boston English | Boston English | en_US | 8 kHz | General Conversation | 32 | Boston General Conversation data | Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female: 53, Male: 83, and Unknown: 0 | Contact | |
Boston_MA_16 | Boston English | Boston English | en_US | 16 kHz | Media Audio | 93 | Boston Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female: 43, Male: 181, and Unknown: 2 | Contact | |
Canadian_SM_48 | Canadian French | Canadian French | fr-CA | 48 kHz | Scripted Monologue | 1,222 | Canadian French | Single-utterance recordings, which tend to fall in the 5 to 30 second range | Mono | Mobile App | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 974 Male 631 Unknown 1 | Contact | |
Chinese_CC_8 | Chinese English | Chinese English | en_US | 8 kHz | Call-Center | 169 | Chinese Call-center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female: 1790, Male: 523 and Unknown: 13 | Contact | |
Chinese_MA_16 | Chinese English | Chinese English | en_US | 16 kHz | Media Audio | 249 | Chinese Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female: 126, Male: 346 and Unknown: 6 | Contact | |
Chinese Simplified_SM_48 | Chinese Simplified | Chinese Simplified | zh-CN | 48 kHz | Scripted Monologue | 2,762 | Chinese Simplified | Single-utterance recordings, which tend to fall in the 5 to 30 second range | Mono | Mobile App | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 1920 Male 1535 Unknown 270 | Contact | |
Chinese Traditional_SM_48 | Chinese Traditional | Chinese Traditional | zh-TW | 48 kHz | Scripted Monologue | 1,028 | Chinese Traditional | Single-utterance recordings, which tend to fall in the 5 to 30 second range | Mono | Mobile App | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 1069 Male 262 Unknown 3 | Contact | |
Danish_GC_8 | Danish | Danish | da_DK | 8 kHz | General Conversation | 372 | Danish General Conversation data | Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female: 311, Male: 417, Unknown: 0 | Contact | |
Danish_MA_16 | Danish | Danish | da_DK | 16 kHz | Media Audio | 664 | Danish Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female: 369, Male: 864, Unknown: 27 | Contact | |
Danish_SM_48 | Danish | Danish | da-DK | 48 kHz | Scripted Monologue | 2,579 | Danish Scripted Monologue | Single-utterance recordings, which tend to fall in the 5 to 30 second range, Danish from Denmark | Mono | Mobile App | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 1551 Male 1233 Unknown 42 | Contact | |
English Deep South_CC_8 | English Deep South | English Deep South | en_US | 8 kHz | Call-Center | 151 | English Deep South Call-center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 221 , Male 1004 , Unknown 7 | Contact | |
English Deep South_GC_8 | English Deep South | English Deep South | en_US | 8 kHz | General Conversation | 56 | English Deep South General Conversation data | Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 99, Male 31, Unknown 0 | Contact | |
English Deep South_MA_16 | English Deep South | English Deep South | en_US | 16 kHz | Media Audio | 266 | English Deep South Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 204, Male 356, Unknown 21 | Contact | |
German_CC_8 | German | German | de-De | 8 kHz | Call-Center | 64 | German Call-center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Mono | Desktop | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 478 Male 1440 Unknown 0 | Contact | ||
German_IVR_8 | German | German | de-De | 8 kHz | IVR | 200 | German IVR data | Human to Machine. An IVR type of flow where there is a TTS prompt (e.g. ”How may I help you”) followed by a spontaneous human response | Mono | Desktop | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 10115 Male 8750 Unknown 0 | Contact | ||
Gujarati_CC_8 | Gujarati | Gujarati (In Pipeline) | gu_IN | Call-Center | 60 | Gujarati (In Pipeline) Call-Center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Gujarati_GC | Gujarati | Gujarati (In Pipeline) | gu_IN | General Conversation | 100 | Gujarati (In Pipeline) General Conversation data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Gujarati_MA | Gujarati | Gujarati (In Pipeline) | gu_IN | Media Audio | 40 | Gujarati (In Pipeline) Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Hebrew_General Conversation_8 | Hebrew | Hebrew | he_IL | 8 kHz | General Conversation | 399 | Hebrew General Conversation data | Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Hebrew in Israel | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 414 , Male 399 , Unknown 1 | Contact | |
Hebrew_MA_16 | Hebrew | Hebrew | he_IL | 16 kHz | Media Audio | 427 | Hebrew Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 361 , Male 513, Unknown 13 | Contact | |
Hindi_MA_16 | Hindi | Hindi | hi_IN | 16 kHz | Media Audio | 219 | Hindi Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 83 , Male 309, Unknown 0 | Contact | |
Hindi_SM_48 | Hindi | Hindi | hi-IN | 48 kHz | Scripted Monologue | 2,867 | Hindi Scripted Monologue | Single-utterance recordings, which tend to fall in the 5 to 30 second range | Mono | Mobile App | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 1977 Male 1864 Unknown 147 | Contact | |
HINGLISH_CC_8 | Hinglish | Hinglish | hg_IN | 8 kHz | Call-Center | 208 | HINGLISH Call-center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 822, Male 1262 , Unknown 0 | Contact | |
HINGLISH_MA_16 | Hinglish | Hinglish | hg_IN | 16 kHz | Media Audio | 216 | HINGLISH Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 75, Male 380, Unknown 0 | Contact | |
Hispanic_CC_8 | Hispanic English | Hispanic English | en_US | 8 kHz | Call-Center | 212 | Hispanic Call-center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 822, Male 1262, Unknown 0 | Contact | |
Hispanic_MA_16 | Hispanic English | Hispanic English | en_US | 16 kHz | Media Audio | 155 | Hispanic Call Media audio | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 140, Male 219, Unknown 5 | Contact | |
Indonesian_GC_8 | Indonesian | Indonesian | id_ID | 8 kHz | General Conversation | 496 | Indonesian General Conversation data | Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Bahasa Indonesian | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 524, Male 454, Unknown 2 | Contact | |
Indonesian_MA_16 | Indonesian | Indonesian | id_ID | 16 kHz | Media Audio | 643 | Indonesian Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 746, Male 1507, Unknown 129 | Contact | |
Irish_GC_8 | Irish | Irish | en_IE | 8 kHz | General Conversation | 192 | Irish General Conversation data | Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 213 , Male 153 , Unknown 0 | Contact | |
Japanese_SM_48 | Japanese | Japanese | ja-JP | 48 kHz | Scripted Monologue | 2,335 | Japanese Scripted Monologue | Single-utterance recordings, which tend to fall in the 5 to 30 second range | Mono | Mobile App | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 1460 Male 1221 Unknown 194 | Contact | |
Kannada_CC_8 | Kannada | Kannada (In Pipeline) | kn_IN | Call-Center | 60 | Kannada (In Pipeline) Call-Center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Kannada_GC | Kannada | Kannada (In Pipeline) | kn_IN | General Conversation | 100 | Kannada (In Pipeline) General Conversation data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Kannada_MA | Kannada | Kannada (In Pipeline) | kn_IN | Media Audio | 40 | Kannada (In Pipeline) Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Korean_CC_8 | Korean | Korean | ko_KR | 8 kHz | Call-Center | 107 | Korean Call-center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 1086, Male 210 , Unknown 4 | Contact | |
Korean_MA_16 | Korean | Korean | ko_KR | 16 kHz | Media Audio | 204 | Korean media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 70 Male 303, Unknown 25 | Contact | |
Korean_SM_48 | Korean | Korean | ko-KR | 48 kHz | Scripted Monologue | 1,955 | Korean Scripted Monologue | Single-utterance recordings, which tend to fall in the 5 to 30 second range | Mono | Mobile App | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 1195 Male 1134 Unknown 122 | Contact | |
Malay_GC_8 | Malay | Malay | ms_MY | 8 kHz | General Conversation | 266 | Malay General Conversation data | Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Malay in Malaysia | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 316, Male 176 , Unknown 0 | Contact | |
Malay_MA_16 | Malay | Malay | ms_MY | 16 kHz | Media Audio | 344 | Malay Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 236, Male 626, Unknown 47 | Contact | |
Malayalam_CC_8 | Malayalam | Malayalam (In Pipeline) | ml_IN | Call-Center | 60 | Malayalam (In Pipeline) Call-Center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Malayalam_GC | Malayalam | Malayalam (In Pipeline) | ml_IN | General Conversation | 100 | Malayalam (In Pipeline) General Conversation data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Malayalam_MA | Malayalam | Malayalam (In Pipeline) | ml_IN | Media Audio | 40 | Malayalam (In Pipeline) Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Marathi_CC_8 | Marathi | Marathi (In Pipeline) | mr_IN | Call-Center | 60 | Marathi (In Pipeline) Call-Center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Marathi_GC | Marathi | Marathi (In Pipeline) | mr_IN | General Conversation | 100 | Marathi (In Pipeline) General Conversation data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Marathi_MA | Marathi | Marathi (In Pipeline) | mr_IN | Media Audio | 40 | Marathi (In Pipeline) Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Mexican_SM_48 | Spanish (Mexico) | Spanish (Mexico) | es-MX | 48 kHz | Scripted Monologue | 1,492 | Mexican Spanish Scripted Monologue | Single-utterance recordings, which tend to fall in the 5 to 30 second range | Mono | Mobile App | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 1016 Male 1069 Unknown 95 | Contact | |
Netherlands_SM_48 | Dutch | Dutch | nl-NL | 48 kHz | Scripted Monologue | 1,205 | Dutch Scripted Monologue | Single-utterance recordings, which tend to fall in the 5 to 30 second range | Mono | Mobile App | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 1285 Male 531 Unknown 3 | Contact | |
New York English_CC_8 | New York English | New York English | en_US | 8 kHz | Call-Center | 103 | New York English Call-center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 610, Male 532, Unknow 0 | Contact | |
New York English_GC_8 | New York English | New York English | en_US | 8 kHz | General Conversation | 107 | New York English General Conversation data | Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 118, Male 114, Unknown 0 | Contact | |
New York English_MA_16 | New York English | New York English | en_US | 16 kHz | Media Audio | 140 | New York English Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 66, Male 230, Unknown 11 | Contact | |
New Zealand_GC_8 | New Zealand English | New Zealand English | en_NZ | 8 kHz | General Conversation | 148 | New Zealand English General Conversation data | Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 167, male 121, Unknown 4 | Contact | |
New Zealand_MA_16 | New Zealand English | New Zealand English | en_NZ | 16 kHz | Media Audio | 400 | New Zealand English Media audio | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 367, male 678, Unknown 26 | Contact | |
Oriya_CC_8 | Oriya | Oriya (In Pipeline) | or_IN | Call-Center | 60 | Oriya (In Pipeline) Call-Center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Oriya_GC | Oriya | Oriya (In Pipeline) | or_IN | General Conversation | 100 | Oriya (In Pipeline) General Conversation data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Oriya_MA | Oriya | Oriya (In Pipeline) | or_IN | Media Audio | 40 | Oriya (In Pipeline) Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Polish_MA_16 | Polish | Polish | pl_PL | 16 kHz | Media Audio | 269 | Polish Media audio | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 173 Male 354 Unknown 6 | Contact | |
Polish Poland_SM_48 | Polish (Poland) | Polish (Poland) | pl-PL | 48 kHz | Scripted Monologue | 1,482 | Polish Poland - Scripted Monologue | Single-utterance recordings, which tend to fall in the 5 to 30 second range | Mono | Mobile App | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 1324 Male 701 Unknown 24 | Contact | |
Punjabi_CC_8 | Punjabi | Punjabi (In Pipeline) | Punjabi | Call-Center | 60 | Punjabi (In Pipeline) Call-Center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Punjabi_GC | Punjabi | Punjabi (In Pipeline) | Punjabi | General Conversation | 100 | Punjabi (In Pipeline) General Conversation data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Punjabi_MA | Punjabi | Punjabi (In Pipeline) | Punjabi | Media Audio | 40 | Punjabi (In Pipeline) Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Russian_SM_48 | Russian | Russian | ru-RU | 48 kHz | Scripted Monologue | 2,398 | Russian Scripted Monologue | Single-utterance recordings, which tend to fall in the 5 to 30 second range | Mono | Mobile App | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 1689 Male 1937 Unknown 214 | Contact | |
Scottish_GC_8 | Scottish (English Accent) | Scottish (English Accent) | en_AB | 8 kHz | General Conversation | 292 | Scottish General Conversation data | Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 285 , Male 260, Unknown 3 | Contact | |
Singapore_CC_8 | Singapore English | Singapore English | en_SG | 8 kHz | Call-Center | 218 | Singapore Call-Center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 2139 , Male 884, Unknown 21 | Contact | |
Singapore_MA_16 | Singapore English | Singapore English | en_SG | 16 kHz | Media Audio | 247 | Singapore Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 160, Male 455, Unknown 37 | Contact | |
South African English_CC_8 | South African English | South African English | en_ZA | 8 kHz | Call-Center | 261 | South African English Call-Center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 1274 , Male 935 , Unknown 1 | Contact | |
South African English_MA_16 | South African English | South African English | en_ZA | 16 kHz | Media Audio | 251 | South African English Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 235, Male 432, Unknown 36 | Contact | |
Swahili_CC_8 | Swahili | Swahili | sw_KE | 8 kHz | Call-Center | 230 | Swahili Call-Center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 611, Male 833, Unknown 0 | Contact | |
Swahili_MA_16 | Swahili | Swahili | sw_KE | 16 kHz | Media Audio | 265 | Swahili Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 118, Male 493, Unknown 25 | Contact | |
Swedish_CC_8 | Swedish | Swedish | sv_SE | 8 kHz | Call-Center | 250 | Swedish Call-Center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 1581, male 727, Unknown 2 | Contact | |
Swedish_MA_16 | Swedish | Swedish | sv_SE | 16 kHz | Media Audio | 278 | Swedish Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 195, male 500, Unknown 21 | Contact | |
Tamil_CC_8 | Tamil | Tamil (In Pipeline) | ta_IN | Call-Center | 60 | Tamil (In Pipeline) Call-Center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Tamil_GC | Tamil | Tamil (In Pipeline) | ta_IN | General Conversation | 100 | Tamil (In Pipeline) General Conversation data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Tamil_MA | Tamil | Tamil (In Pipeline) | ta_IN | Media Audio | 40 | Tamil (In Pipeline) Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Telugu_GC_8 | Telugu | Telugu | te_IN | 8 kHz | General Conversation | 553 | Telugu General Conversation data | Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 574 , Male 564, Unknown 0 | Contact | |
Telugu_MA_16 | Telugu | Telugu | te_IN | 16 kHz | Media Audio | 648 | Telugu Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 207, Male 963, Unknown 2 | Contact | |
Telugu_CC_8 | Telugu | Telugu (In Pipeline) | te_IN | Call-Center | 30 | Telugu (In Pipeline) Call-Center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Telugu_GC | Telugu | Telugu (In Pipeline) | te_IN | General Conversation | 50 | Telugu (In Pipeline) General Conversation data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Telugu_MA | Telugu | Telugu (In Pipeline) | te_IN | Media Audio | 20 | Telugu (In Pipeline) Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Thai_GC_8 | Thai | Thai | th_TH | 8 kHz | General Conversation | 183 | Thai General Conversation | Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, An informal register used between friends | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 338, Male 96, Unknown 8 | Contact | |
Thai_MA_8 | Thai | Thai | th_TH | 16 kHz | Media Audio | 173 | Thai Media audio | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 143, Male 502, Unknown 26 | Contact | |
Turkish Turkey_SM_48 | Turkish Turkey | Turkish Turkey | tr-TR | 48 kHz | Scripted Monologue | 2,027 | Turkish Turkey | Single-utterance recordings, which tend to fall in the 5 to 30 second range | Mono | Mobile App | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 1561 Male 1241 Unknown 31 | Contact | |
Vietnamese_GC_8 | Vietnamese | Vietnamese | vi_VN | 8 kHz | General Conversation | 295 | Vietnamese General Conversation data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, Northern (e.g.,Hanoi), Central, and Southern (e.g., Ho Chi Minh City). | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 400, male 380, Unknowns 2 | Contact | |
Vietnamese_MA_16 | Vietnamese | Vietnamese | vi_VN | 16 kHz | Media Audio | 257 | Vietnamese Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 249, male 200, Unknowns 45 | Contact | |
Welsh_GC_8 | Welsh (English Accent) | Welsh (English Accent) | en_WL | 8 kHz | General Conversation | 278 | Welsh General Conversation data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 270, Male 324, Unknown 0 | Contact | |
UK English_WW_16 | UK English | UK English | en_uk | 16 kHz | Wake Word | 200 Speakers | Wake Word UK English | keyphrases collection of data
| 1 channel | Mobile App | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Gender: 50% male, 50% female, +/- 10%. | Contact |
Services Offered
Expert audio data collection isn’t all-hands-on-deck for comprehensive AI setups. At Shaip, you can even consider the following services to make models way more widespread than usual:
Text Data Collection
Services
The true value of Shaip cognitive data collection services is that it gives organizations the key to unlock critical information found within unstructured data
Image Data Collection Services
Make sure that your computer vision model identifies every image accurately, to seamlessly train next-gen AI models of the future
Video Data Collection Services
Now focus on computer vision along with NLP for training your models to identify objects, individuals, deterrents, and other visual elements to perfection
Recommended Resources
Offering
Audio Annotation for Intelligent AIs
Audio annotation services have been a forte of Shaip since the beginning. Develop, train & improve conversational AI, chatbots & speech recognition engines with our state-of-the-art audio annotation services.
Buyer’s Guide
Buyer’s Guide: Complete Guide to Conversational AI
The chatbot you conversed with runs on an advanced conversational AI system that is trained, tested, and built using tons of speech recognition datasets.
Data Catalog
Off-the-Shelf Speech Data Catalog & Licensing
There are a wide variety of common applications for speech data in AI projects. We offer you vast amounts of high-quality data ready for your voice recognition.
Want to build your own audio dataset?
Connect with our in-house speech data collection expert to set up an audio repository that best fits your requirement
Frequently Asked Questions (FAQ)
Speech Data Collection for an ML Model refers to the process of gathering audio recordings of spoken language. This collection aids in training and refining machine learning algorithms, particularly those centered on understanding and processing human voices.
When aiming to collect audio data for Automatic Speech Recognition (ASR), you should start by defining your project’s specific needs, including the desired language, accent, and type of speech. After setting these parameters, ensure you obtain all necessary permissions to respect user privacy. Then, use appropriate recording devices or software to capture clear audio samples. Each recording should be meticulously annotated with its transcription or other pertinent metadata and stored systematically for effortless access.
A speech dataset in machine learning is pivotal for training, testing, and validating models tailored to recognize, transcribe, or interpret spoken language. Such datasets pave the way for a myriad of applications, from voice assistants and transcription services to voice biometrics.
For collecting precise data from diverse languages and accents, collaboration with native speakers of the desired linguistic backgrounds is vital. Aim for a varied and representative sample to cover a broad spectrum of demographic nuances. Employ standardized recording equipment in uniform environments to ensure audio consistency. And importantly, annotate each data piece with detailed transcriptions and metadata, denoting the specific language and accent.