Case Study: Utterance Collection

Delivered 7M+ Utterances to build Multi-lingual digital assistants in 13 languages

Real World Solution

Data that powers global conversations

The need for Utterance training arises because not all customers use the exact words or phrases while interacting or asking questions to their voice assistants in a scripted format. That’s why specific voice applications must be trained on spontaneous speech data. E.g., “Where is the closest hospital located?” “Find a hospital near me” or “Is there a hospital nearby?” all indicate the same search intent but are phrased differently.

Problem

To execute clients’ Digital Assistant’s speech roadmap for worldwide languages, the team needed to acquire large volumes of training data for the speech recognition AI model. The critical requirements of the client were:

Acquire large volumes of training data (single speaker utterance prompts of not more than 3-30 seconds long ) for speech recognition services in 13 global languages
For each language, the supplier will generate text prompts for speakers to record (unless the
client supplies) and transcribe the resulting audio.
Provide audio data and transcription of recorded utterances with corresponding JSON files
containing the metadata for all recordings.
Ensure a diverse mix of speakers by age, gender, education & dialect
Ensure a diverse mix of recording environments as per Specifications.
Each audio recording shall be at least 16kHz but preferably 44kHz

Accelerate your Conversational AI application development by 100%

“After evaluating many vendors, the client chose Shaip because of their expertise in conversational AI projects. We were impressed with Shaip’s project execution competence, their expertise to source, transcribe and deliver the required utterances from expert linguists in 13 languages within stringent timelines and with the required quality.”

Solution

With our deep understanding of conversational AI, we helped the client collect, transcribe and annotate the data with a team of expert linguists and annotators to train their AI-powered Speech Processing multilingual Voice Suite.

The scope of work for Shaip included but was not limited to acquiring large volumes of audio training data for speech recognition, transcribing audio recordings in multiple languages for all languages on our Tier 1 and Tier 2 language roadmap, and delivering corresponding JSON files containing the metadata. Shaip collected utterances of 3-30 seconds at scale while maintaining desired levels of quality required to train ML models for complex projects.

Audio Collected, Transcribed & Annotated: 22,250 hours
Languages Supported: 13 (Danish, Korean, Saudi Arabian Arabic, Dutch, Mainland & Taiwan Chinese, French Canadian, Mexican Spanish, Turkish, Hindi, Polish, Japanese, Russian)
No. of Utterances: 7M+
Timeline: 7-8 months

While collecting audio utterances at 16 kHz, we ensured a healthy mix of speakers by age, gender, education, and dialects in diverse recording environments.

Result

The high-quality utterance audio data from expert linguists empowered the client to accurately train their multilingual Speech Recognition model in 13 Global Tier 1 & 2 languages. With gold-standard training datasets, the client can offer intelligent and robust digital assistance to solve future real-world problems.

Our Expertise

Hours of Speech Collected

0 +

Team of Voice Data Collectors

PII Compliant

0 %

Cool Number

0 +

Data Acceptance & Accuracy

> 0

Fortune 500 Clientele

0 +

Recommended Resources

Buyer’s Guide

Buyer’s Guide: Conversational AI

The chatbot you conversed with runs on an advanced conversational AI system that is trained, tested, and built using tons of speech recognition datasets.

Blog

The State of Conversational AI 2025

The Conversational AI 2025 infographics talk about what is Conversational AI, its evolution, types, Conversational AI Market by Region, Use Cases, challenges, etc.

Blog

How do Siri and Alexa Understand What You’re Saying?

Voice assistants might be these cool, predominantly female voices that respond to your requests to find the nearest restaurant or the shortest route to the mall.

Creating clinical NLP is a critical task that requires tremendous domain expertise to solve. I can clearly see that you are several years ahead of Google in this area. I want to work with you and scale you.

Google, Inc. Director

Over the past 6 months, we've closely collaborated with Shaip on our company's labeling needs. During this time, we met a skilled team that consistently met high standards and deadlines. They handled diverse labeling tasks expertly, adapting to changing requirements. We highly recommend Shaip's work and are pleased with the results.

Project Manager

Tell us how we can help with your next AI initiative.

Case Study: Utterance Collection

Real World Solution

Problem

Solution

Result

Our Expertise

Recommended Resources

Buyer’s Guide

Buyer’s Guide: Conversational AI

Blog

The State of Conversational AI 2025

Blog

How do Siri and Alexa Understand What You’re Saying?

AI Data Services

Platform

Speciality

Industry

Resources

Company

Contact Us

Case Study: Utterance Collection

Real World Solution

Problem

Solution

Result

Our Expertise

Recommended Resources

Buyer’s Guide

Buyer’s Guide: Conversational AI

Blog

The State of Conversational AI 2025

Blog

How do Siri and Alexa Understand What You’re Saying?

Let us know more about you!