Most Trusted Speech Data Collection Services for your AI
Train your NLP models, VAs, TTS prototypes, and more with quality conversational data, with our audio and speech data collection services
Discover audio data pipelines without bottlenecks
Featured Clients
Professional Audio / Voice Data Collection Services
Any subject. Any scenario.
At Shaip, our expertise lies in creating high-quality speech datasets designed for varied AI/ML requirements. We offer an expansive range of languages and record in diverse settings making our datasets comprehensive and adaptable. Our focus is on feeding models with the highest volume of custom speech data, in the least possible time. With us on board, you can expect:
- Curated high-quality multilingualaudio / voice data to improve accuracy
- Highest possible level of domain specificity to target diverse scenario setup
- Scale your ML model to suit diverse demographics and verticals
- Recording Environments: Studio Quality, featuring crystal-clear audio with minimal background noise, & Natural Environments, where recordings incorporate ambient sounds to mimic real-world situations.
100+
55K+
Hours of Speech Data
250+
Projects
60+
Languages (100+ Dialects)
8/16/44/48 kHz
Sampling rate
Our Expertise
Align Audio Data to for Smarter NLP Models
Shaip offers end-to-end speech/audio data collection services in over 100+ languages to enable voice-enabled technologies to cater to a diverse set of audiences across the globe. We can work on projects of any scope and size; from licensing existing off-the-shelf audio datasets, to managing custom audio data collection, to audio transcription and annotation. No matter how big is your speech data collection project, we can customize the audio collection services to suit your needs to build high-quality NLP datasets that target dialects, tones, and languages. Choose from our wide range of speech datasets and audio data collection resources, for voice-enabling intelligent setups.
Monologue Scripted & Spontaneous Speech
It focuses on processing speech from a single speaker. Utilize scripted prompts to feed into single-channel audio files, ensuring the capture of unique speech patterns, tones, and nuances specific to that individual.
Dialogue Scripted & Spontaneous Speech
Two-person interaction, replicating real-world conversations and dialogues with multilingual exposure via dual-channel files and transcribed resources.
Group / Muti-party
Conversations
Multi-person discussions, capturing group dynamics, overlaps, and varied tones so as to accurately train speech models.
Wake-word / Key Phrase / Utterances Collection
Train AIs to identify key phrases or wake words or utterances with similar meanings using diverse, rich, and authentic utterances for advanced natural language processing and understanding.
Acoustic Data
Collection
We can professionally record studio-quality audio data be it restaurants, offices, or homes or from various environments and languages, whilst covering a wider acoustic range (Comprehensive Sound Datasets).
Automatic Speech Recognition (ASR)
Improve accuracy of your automatic speech recognition (ASR) systems by having access to state-of-art diversified speech/audio datasets, from a wide array of demographics.
Multilingual Speech/Audio Training Data
Our skilled language professionals, across the globe offer multilingual audio/speech data in various languages and dialects. This effort fosters global communication and bridges language barriers, contributing to more inclusive and effective AI solutions.
Text-to-Speech
(TTS)
Build a text-to-speech (TTS) multilingual model with the help of our global workforce, who help you collect speech data in 150+ languages & dialects to enhance your AI models from in-car controls to chatbots and learning solutions with high-quality audio data.
Call Center
Conversations
Genuine exchanges between agents and clients, supporting numerous languages such as Spanish, German, American English, Bengali, Japanese, Chinese, and Hindi.
Success Stories
Conversational AI datasets with over 3k hours of data across 8 languages
Looking to build a multilingual platform for Indian languages, the client partnered with Shaip to collect, segment and transcribe large datasets in multiple Indian languages. This would help develop effective speech models that could power the client’s innovative new platform.
Problem: Over 3,000 hours of audio data collected in 8 Indian languages, segmented and transcribed to develop automatic speech recognition.
Solution: We provided data collection, segmentation, transcription, and delivered JSON files with metadata. We collected 3000 hours of audio data in 8 Indian languages at scale for the client’s speech technology project.
Reasons to choose Shaip as your Trustworthy Speech Data Collection Partner
People
Dedicated and trained teams:
- 30,000+ collaborators for Data Creation, Labeling & QA
- Credentialed Project Management Team
- Experienced Product Development Team
- Talent Pool Sourcing & Onboarding Team
Process
Highest process efficiency is assured with:
- Robust 6 Sigma Stage-Gate Process
- A dedicated team of 6 Sigma black belts – Key process owners & Quality compliance
- Continuous Improvement & Feedback Loop
Platform
The patented platform offers benefits:
- Web-based end-to-end platform
- Impeccable Quality
- Faster TAT
- Seamless Delivery
People
Dedicated and trained teams:
- 30,000+ collaborators for Data Creation, Labeling & QA
- Credentialed Project Management Team
- Experienced Product Development Team
- Talent Pool Sourcing & Onboarding Team
Process
Highest process efficiency is assured with:
- Robust 6 Sigma Stage-Gate Process
- A dedicated team of 6 Sigma black belts – Key process owners & Quality compliance
- Continuous Improvement & Feedback Loop
Platform
The patented platform offers benefits:
- Web-based end-to-end platform
- Impeccable Quality
- Faster TAT
- Seamless Delivery
Off-the-Shelf Speech / Audio Datasets
Details | Corpus ID (Unique) | Keyword | Language Dataset | Language code | Sample Rate | Dataset Type | Total Audio Hours | Short Description | Dataset Description | Audio Channel | Recording Platform | WER (%) | Audio Format | Transcription Format | Use Case | Number of Speakers | CTA |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
New York English_GC_8 | New York English | New York English | en_US | 8 kHz | General Conversation | 107 | New York English General Conversation data | Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 118, Male 114, Unknown 0 | Contact | |
Russian_SM_48 | Russian | Russian | ru-RU | 48 kHz | Scripted Monologue | 2,398 | Russian Scripted Monologue | Single-utterance recordings, which tend to fall in the 5 to 30 second range | Mono | Mobile App | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 1689 Male 1937 Unknown 214 | Contact | |
Punjabi_MA | Punjabi | Punjabi (In Pipeline) | Punjabi | Media Audio | 40 | Punjabi (In Pipeline) Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Punjabi_GC | Punjabi | Punjabi (In Pipeline) | Punjabi | General Conversation | 100 | Punjabi (In Pipeline) General Conversation data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Punjabi_CC_8 | Punjabi | Punjabi (In Pipeline) | Punjabi | Call-Center | 60 | Punjabi (In Pipeline) Call-Center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Polish Poland_SM_48 | Polish (Poland) | Polish (Poland) | pl-PL | 48 kHz | Scripted Monologue | 1,482 | Polish Poland - Scripted Monologue | Single-utterance recordings, which tend to fall in the 5 to 30 second range | Mono | Mobile App | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 1324 Male 701 Unknown 24 | Contact | |
Polish_MA_16 | Polish | Polish | pl_PL | 16 kHz | Media Audio | 269 | Polish Media audio | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 173 Male 354 Unknown 6 | Contact | |
Oriya_MA | Oriya | Oriya (In Pipeline) | or_IN | Media Audio | 40 | Oriya (In Pipeline) Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Oriya_GC | Oriya | Oriya (In Pipeline) | or_IN | General Conversation | 100 | Oriya (In Pipeline) General Conversation data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Oriya_CC_8 | Oriya | Oriya (In Pipeline) | or_IN | Call-Center | 60 | Oriya (In Pipeline) Call-Center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
New Zealand_MA_16 | New Zealand English | New Zealand English | en_NZ | 16 kHz | Media Audio | 400 | New Zealand English Media audio | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 367, male 678, Unknown 26 | Contact | |
New Zealand_GC_8 | New Zealand English | New Zealand English | en_NZ | 8 kHz | General Conversation | 148 | New Zealand English General Conversation data | Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 167, male 121, Unknown 4 | Contact | |
New York English_MA_16 | New York English | New York English | en_US | 16 kHz | Media Audio | 140 | New York English Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 66, Male 230, Unknown 11 | Contact | |
Scottish_GC_8 | Scottish (English Accent) | Scottish (English Accent) | en_AB | 8 kHz | General Conversation | 292 | Scottish General Conversation data | Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 285 , Male 260, Unknown 3 | Contact | |
New York English_CC_8 | New York English | New York English | en_US | 8 kHz | Call-Center | 103 | New York English Call-center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 610, Male 532, Unknow 0 | Contact | |
Netherlands_SM_48 | Dutch | Dutch | nl-NL | 48 kHz | Scripted Monologue | 1,205 | Dutch Scripted Monologue | Single-utterance recordings, which tend to fall in the 5 to 30 second range | Mono | Mobile App | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 1285 Male 531 Unknown 3 | Contact | |
Mexican_SM_48 | Spanish (Mexico) | Spanish (Mexico) | es-MX | 48 kHz | Scripted Monologue | 1,492 | Mexican Spanish Scripted Monologue | Single-utterance recordings, which tend to fall in the 5 to 30 second range | Mono | Mobile App | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 1016 Male 1069 Unknown 95 | Contact | |
Marathi_MA | Marathi | Marathi (In Pipeline) | mr_IN | Media Audio | 40 | Marathi (In Pipeline) Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Marathi_GC | Marathi | Marathi (In Pipeline) | mr_IN | General Conversation | 100 | Marathi (In Pipeline) General Conversation data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Marathi_CC_8 | Marathi | Marathi (In Pipeline) | mr_IN | Call-Center | 60 | Marathi (In Pipeline) Call-Center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Malayalam_MA | Malayalam | Malayalam (In Pipeline) | ml_IN | Media Audio | 40 | Malayalam (In Pipeline) Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Malayalam_GC | Malayalam | Malayalam (In Pipeline) | ml_IN | General Conversation | 100 | Malayalam (In Pipeline) General Conversation data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Malayalam_CC_8 | Malayalam | Malayalam (In Pipeline) | ml_IN | Call-Center | 60 | Malayalam (In Pipeline) Call-Center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Malay_MA_16 | Malay | Malay | ms_MY | 16 kHz | Media Audio | 344 | Malay Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 236, Male 626, Unknown 47 | Contact | |
Malay_GC_8 | Malay | Malay | ms_MY | 8 kHz | General Conversation | 266 | Malay General Conversation data | Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Malay in Malaysia | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 316, Male 176 , Unknown 0 | Contact | |
Telugu_GC_8 | Telugu | Telugu | te_IN | 8 kHz | General Conversation | 553 | Telugu General Conversation data | Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 574 , Male 564, Unknown 0 | Contact | |
UK English_WW_16 | UK English | UK English | en_uk | 16 kHz | Wake Word | 200 Speakers | Wake Word UK English | keyphrases collection of data
| 1 channel | Mobile App | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Gender: 50% male, 50% female, +/- 10%. | Contact | |
Welsh_GC_8 | Welsh (English Accent) | Welsh (English Accent) | en_WL | 8 kHz | General Conversation | 278 | Welsh General Conversation data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 270, Male 324, Unknown 0 | Contact | |
Vietnamese_MA_16 | Vietnamese | Vietnamese | vi_VN | 16 kHz | Media Audio | 257 | Vietnamese Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 249, male 200, Unknowns 45 | Contact | |
Vietnamese_GC_8 | Vietnamese | Vietnamese | vi_VN | 8 kHz | General Conversation | 295 | Vietnamese General Conversation data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, Northern (e.g.,Hanoi), Central, and Southern (e.g., Ho Chi Minh City). | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 400, male 380, Unknowns 2 | Contact | |
Turkish Turkey_SM_48 | Turkish Turkey | Turkish Turkey | tr-TR | 48 kHz | Scripted Monologue | 2,027 | Turkish Turkey | Single-utterance recordings, which tend to fall in the 5 to 30 second range | Mono | Mobile App | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 1561 Male 1241 Unknown 31 | Contact | |
Thai_MA_8 | Thai | Thai | th_TH | 16 kHz | Media Audio | 173 | Thai Media audio | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 143, Male 502, Unknown 26 | Contact | |
Thai_GC_8 | Thai | Thai | th_TH | 8 kHz | General Conversation | 183 | Thai General Conversation | Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, An informal register used between friends | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 338, Male 96, Unknown 8 | Contact | |
Telugu_MA | Telugu | Telugu (In Pipeline) | te_IN | Media Audio | 20 | Telugu (In Pipeline) Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Telugu_GC | Telugu | Telugu (In Pipeline) | te_IN | General Conversation | 50 | Telugu (In Pipeline) General Conversation data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Telugu_CC_8 | Telugu | Telugu (In Pipeline) | te_IN | Call-Center | 30 | Telugu (In Pipeline) Call-Center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Telugu_MA_16 | Telugu | Telugu | te_IN | 16 kHz | Media Audio | 648 | Telugu Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 207, Male 963, Unknown 2 | Contact | |
Korean_SM_48 | Korean | Korean | ko-KR | 48 kHz | Scripted Monologue | 1,955 | Korean Scripted Monologue | Single-utterance recordings, which tend to fall in the 5 to 30 second range | Mono | Mobile App | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 1195 Male 1134 Unknown 122 | Contact | |
Tamil_MA | Tamil | Tamil (In Pipeline) | ta_IN | Media Audio | 40 | Tamil (In Pipeline) Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Tamil_GC | Tamil | Tamil (In Pipeline) | ta_IN | General Conversation | 100 | Tamil (In Pipeline) General Conversation data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Tamil_CC_8 | Tamil | Tamil (In Pipeline) | ta_IN | Call-Center | 60 | Tamil (In Pipeline) Call-Center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Swedish_MA_16 | Swedish | Swedish | sv_SE | 16 kHz | Media Audio | 278 | Swedish Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 195, male 500, Unknown 21 | Contact | |
Swedish_CC_8 | Swedish | Swedish | sv_SE | 8 kHz | Call-Center | 250 | Swedish Call-Center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 1581, male 727, Unknown 2 | Contact | |
Swahili_MA_16 | Swahili | Swahili | sw_KE | 16 kHz | Media Audio | 265 | Swahili Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 118, Male 493, Unknown 25 | Contact | |
Swahili_CC_8 | Swahili | Swahili | sw_KE | 8 kHz | Call-Center | 230 | Swahili Call-Center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 611, Male 833, Unknown 0 | Contact | |
South African English_MA_16 | South African English | South African English | en_ZA | 16 kHz | Media Audio | 251 | South African English Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 235, Male 432, Unknown 36 | Contact | |
South African English_CC_8 | South African English | South African English | en_ZA | 8 kHz | Call-Center | 261 | South African English Call-Center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 1274 , Male 935 , Unknown 1 | Contact | |
Singapore_MA_16 | Singapore English | Singapore English | en_SG | 16 kHz | Media Audio | 247 | Singapore Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 160, Male 455, Unknown 37 | Contact | |
Singapore_CC_8 | Singapore English | Singapore English | en_SG | 8 kHz | Call-Center | 218 | Singapore Call-Center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 2139 , Male 884, Unknown 21 | Contact | |
Boston_CC_8 | Boston English | Boston English | en_US | 8 kHz | Call-Center | 177 | Boston Call-center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female: 605, Male: 711, and Unknown: 0 | Contact | |
English Deep South_CC_8 | English Deep South | English Deep South | en_US | 8 kHz | Call-Center | 151 | English Deep South Call-center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 221 , Male 1004 , Unknown 7 | Contact | |
Danish_SM_48 | Danish | Danish | da-DK | 48 kHz | Scripted Monologue | 2,579 | Danish Scripted Monologue | Single-utterance recordings, which tend to fall in the 5 to 30 second range, Danish from Denmark | Mono | Mobile App | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 1551 Male 1233 Unknown 42 | Contact | |
Danish_MA_16 | Danish | Danish | da_DK | 16 kHz | Media Audio | 664 | Danish Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female: 369, Male: 864, Unknown: 27 | Contact | |
Danish_GC_8 | Danish | Danish | da_DK | 8 kHz | General Conversation | 372 | Danish General Conversation data | Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female: 311, Male: 417, Unknown: 0 | Contact | |
Chinese Traditional_SM_48 | Chinese Traditional | Chinese Traditional | zh-TW | 48 kHz | Scripted Monologue | 1,028 | Chinese Traditional | Single-utterance recordings, which tend to fall in the 5 to 30 second range | Mono | Mobile App | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 1069 Male 262 Unknown 3 | Contact | |
Chinese Simplified_SM_48 | Chinese Simplified | Chinese Simplified | zh-CN | 48 kHz | Scripted Monologue | 2,762 | Chinese Simplified | Single-utterance recordings, which tend to fall in the 5 to 30 second range | Mono | Mobile App | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 1920 Male 1535 Unknown 270 | Contact | |
Chinese_MA_16 | Chinese English | Chinese English | en_US | 16 kHz | Media Audio | 249 | Chinese Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female: 126, Male: 346 and Unknown: 6 | Contact | |
Chinese_CC_8 | Chinese English | Chinese English | en_US | 8 kHz | Call-Center | 169 | Chinese Call-center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female: 1790, Male: 523 and Unknown: 13 | Contact | |
Canadian_SM_48 | Canadian French | Canadian French | fr-CA | 48 kHz | Scripted Monologue | 1,222 | Canadian French | Single-utterance recordings, which tend to fall in the 5 to 30 second range | Mono | Mobile App | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 974 Male 631 Unknown 1 | Contact | |
Boston_MA_16 | Boston English | Boston English | en_US | 16 kHz | Media Audio | 93 | Boston Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female: 43, Male: 181, and Unknown: 2 | Contact | |
Boston_GC_8 | Boston English | Boston English | en_US | 8 kHz | General Conversation | 32 | Boston General Conversation data | Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female: 53, Male: 83, and Unknown: 0 | Contact | |
English Deep South_GC_8 | English Deep South | English Deep South | en_US | 8 kHz | General Conversation | 56 | English Deep South General Conversation data | Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 99, Male 31, Unknown 0 | Contact | |
Bengali_MA | Bengali | Bengali (In Pipeline) | bn_IN | Media Audio | 40 | Bengali (In Pipeline) Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Bengali_GC | Bengali | Bengali (In Pipeline) | bn_IN | General Conversation | 100 | Bengali (In Pipeline) General Conversation data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Bengali_CC_8 | Bengali | Bengali (In Pipeline) | bn_IN | Call-Center | 60 | Bengali (In Pipeline) Call-Center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Assamese_MA | Assamese | Assamese (In Pipeline) | as_IN | Media Audio | 40 | Assamese (In Pipeline) Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Assamese_GC | Assamese | Assamese (In Pipeline) | as_IN | General Conversation | 100 | Assamese (In Pipeline) General Conversation data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Assamese_CC_8 | Assamese | Assamese (In Pipeline) | as_IN | Call-Center | 60 | Assamese (In Pipeline) Call-Center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Arabic_SM_48 | Arabic | Arabic | ar-SA | 48 kHz | Scripted Monologue | 1,947 | Arabic Scripted Monologue | Single-utterance recordings, which tend to fall in the 5 to 30 second range | Mono | Mobile App | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 838 Male 1209 Unknown 78 | Contact | |
Arabic_GC_8 | Arabic | Arabic | ar_AE | 8 kHz | General Conversation | 292 | Arabic General Conversation data | Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Arabic from Gulf countries | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female: 171, Male: 534, and Unknown: 1 | Contact | |
Afrikaans_MA_16 | Afrikaans | Afrikaans | af_ZA | 16 kHz | Media Audio | 658 | Afrikaans Media Files | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female: 750, Male: 1278, and Unknown: 52 | Contact | |
Afrikaans_GC_8 | Afrikaans | Afrikaans | af_ZA | 8 kHz | General Conversation | 368 | Afrikaans General Conversation data | Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Afrikaans spoken in Africa | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female: 502, Male: 390, and Unknown: 2 | Contact | |
en_US_MA_16 | African American Vernacular | African American Vernacular | en_US | 16 kHz | Media Audio | 154 | African American Vernacular Media data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female: 151, Male: 150, and Unknown: 10 | Contact | |
HINGLISH_MA_16 | Hinglish | Hinglish | hg_IN | 16 kHz | Media Audio | 216 | HINGLISH Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 75, Male 380, Unknown 0 | Contact | |
Korean_MA_16 | Korean | Korean | ko_KR | 16 kHz | Media Audio | 204 | Korean media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 70 Male 303, Unknown 25 | Contact | |
Korean_CC_8 | Korean | Korean | ko_KR | 8 kHz | Call-Center | 107 | Korean Call-center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 1086, Male 210 , Unknown 4 | Contact | |
Kannada_MA | Kannada | Kannada (In Pipeline) | kn_IN | Media Audio | 40 | Kannada (In Pipeline) Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Kannada_GC | Kannada | Kannada (In Pipeline) | kn_IN | General Conversation | 100 | Kannada (In Pipeline) General Conversation data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Kannada_CC_8 | Kannada | Kannada (In Pipeline) | kn_IN | Call-Center | 60 | Kannada (In Pipeline) Call-Center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Japanese_SM_48 | Japanese | Japanese | ja-JP | 48 kHz | Scripted Monologue | 2,335 | Japanese Scripted Monologue | Single-utterance recordings, which tend to fall in the 5 to 30 second range | Mono | Mobile App | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 1460 Male 1221 Unknown 194 | Contact | |
Irish_GC_8 | Irish | Irish | en_IE | 8 kHz | General Conversation | 192 | Irish General Conversation data | Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 213 , Male 153 , Unknown 0 | Contact | |
Indonesian_MA_16 | Indonesian | Indonesian | id_ID | 16 kHz | Media Audio | 643 | Indonesian Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 746, Male 1507, Unknown 129 | Contact | |
Indonesian_GC_8 | Indonesian | Indonesian | id_ID | 8 kHz | General Conversation | 496 | Indonesian General Conversation data | Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Bahasa Indonesian | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 524, Male 454, Unknown 2 | Contact | |
Hispanic_MA_16 | Hispanic English | Hispanic English | en_US | 16 kHz | Media Audio | 155 | Hispanic Call Media audio | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 140, Male 219, Unknown 5 | Contact | |
Hispanic_CC_8 | Hispanic English | Hispanic English | en_US | 8 kHz | Call-Center | 212 | Hispanic Call-center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 822, Male 1262, Unknown 0 | Contact | |
en_US_CC_8 | African American Vernacular | African American Vernacular | en_US | 8 kHz | Call-center | 211 | African American Vernacular Call-center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female: 612, Male: 1242, and Unknown: 12 | Contact | |
HINGLISH_CC_8 | Hinglish | Hinglish | hg_IN | 8 kHz | Call-Center | 208 | HINGLISH Call-center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 822, Male 1262 , Unknown 0 | Contact | |
Hindi_SM_48 | Hindi | Hindi | hi-IN | 48 kHz | Scripted Monologue | 2,867 | Hindi Scripted Monologue | Single-utterance recordings, which tend to fall in the 5 to 30 second range | Mono | Mobile App | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 1977 Male 1864 Unknown 147 | Contact | |
Hindi_MA_16 | Hindi | Hindi | hi_IN | 16 kHz | Media Audio | 219 | Hindi Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 83 , Male 309, Unknown 0 | Contact | |
Hebrew_MA_16 | Hebrew | Hebrew | he_IL | 16 kHz | Media Audio | 427 | Hebrew Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 361 , Male 513, Unknown 13 | Contact | |
Hebrew_General Conversation_8 | Hebrew | Hebrew | he_IL | 8 kHz | General Conversation | 399 | Hebrew General Conversation data | Unscripted telephonic conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Hebrew in Israel | Dual | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 414 , Male 399 , Unknown 1 | Contact | |
Gujarati_MA | Gujarati | Gujarati (In Pipeline) | gu_IN | Media Audio | 40 | Gujarati (In Pipeline) Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Gujarati_GC | Gujarati | Gujarati (In Pipeline) | gu_IN | General Conversation | 100 | Gujarati (In Pipeline) General Conversation data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
Gujarati_CC_8 | Gujarati | Gujarati (In Pipeline) | gu_IN | Call-Center | 60 | Gujarati (In Pipeline) Call-Center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Desktop | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Contact | ||||
German_IVR_8 | German | German | de-De | 8 kHz | IVR | 200 | German IVR data | Human to Machine. An IVR type of flow where there is a TTS prompt (e.g. ”How may I help you”) followed by a spontaneous human response | Mono | Desktop | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 10115 Male 8750 Unknown 0 | Contact | ||
German_CC_8 | German | German | de-De | 8 kHz | Call-Center | 64 | German Call-center data | Unscripted, synthetic telephonic conversation between "agent" and "customer", Approx. Audio Duration (Range) 5-15 Minutes, | Mono | Desktop | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 478 Male 1440 Unknown 0 | Contact | ||
English Deep South_MA_16 | English Deep South | English Deep South | en_US | 16 kHz | Media Audio | 266 | English Deep South Media audio data | Licensable Public domain audio/video files such as interviews, podcasts etc - 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes | Mono | Web Sourcing | 5.0 | .wav | .json | ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling | Female 204, Male 356, Unknown 21 | Contact |
Services Offered
Expert audio data collection isn’t all-hands-on-deck for comprehensive AI setups. At Shaip, you can even consider the following services to make models way more widespread than usual:
Text Data Collection
Services
The true value of Shaip cognitive data collection services is that it gives organizations the key to unlock critical information found within unstructured data
Image Data Collection Services
Make sure that your computer vision model identifies every image accurately, to seamlessly train next-gen AI models of the future
Video Data Collection Services
Now focus on computer vision along with NLP for training your models to identify objects, individuals, deterrents, and other visual elements to perfection
Recommended Resources
Offering
Audio Annotation for Intelligent AIs
Audio annotation services have been a forte of Shaip since the beginning. Develop, train & improve conversational AI, chatbots & speech recognition engines with our state-of-the-art audio annotation services.
Buyer’s Guide
Buyer’s Guide: Complete Guide to Conversational AI
The chatbot you conversed with runs on an advanced conversational AI system that is trained, tested, and built using tons of speech recognition datasets.
Data Catalog
Off-the-Shelf Speech Data Catalog & Licensing
There are a wide variety of common applications for speech data in AI projects. We offer you vast amounts of high-quality data ready for your voice recognition.
Want to build your own audio dataset?
Connect with our in-house speech data collection expert to set up an audio repository that best fits your requirement
Frequently Asked Questions (FAQ)
Speech Data Collection for an ML Model refers to the process of gathering audio recordings of spoken language. This collection aids in training and refining machine learning algorithms, particularly those centered on understanding and processing human voices.
When aiming to collect audio data for Automatic Speech Recognition (ASR), you should start by defining your project’s specific needs, including the desired language, accent, and type of speech. After setting these parameters, ensure you obtain all necessary permissions to respect user privacy. Then, use appropriate recording devices or software to capture clear audio samples. Each recording should be meticulously annotated with its transcription or other pertinent metadata and stored systematically for effortless access.
A speech dataset in machine learning is pivotal for training, testing, and validating models tailored to recognize, transcribe, or interpret spoken language. Such datasets pave the way for a myriad of applications, from voice assistants and transcription services to voice biometrics.
For collecting precise data from diverse languages and accents, collaboration with native speakers of the desired linguistic backgrounds is vital. Aim for a varied and representative sample to cover a broad spectrum of demographic nuances. Employ standardized recording equipment in uniform environments to ensure audio consistency. And importantly, annotate each data piece with detailed transcriptions and metadata, denoting the specific language and accent.