Key Phrase/Prompts Audio Collection

Case Study: Key Phrase Collection for in-car voice-activated systems

Key Phrase Collection

There’s an increasing demand for in-car voice-activated systems in the Auto Industry, redefining how we engage with our mobility vehicles.

The automotive industry has rapidly adopted voice-activated systems, with major players like Ford, Tesla, and BMW integrating advanced voice recognition in their vehicles. By 2022, it was estimated that over 50% of new cars featured voice recognition capabilities. These integrations aim to enhance safety, allowing drivers to operate navigation, entertainment, and communication functions without distractions.

The market value for voice recognition in autos was projected to surpass $1 billion by 2023, indicating a growing demand for hands-free, intelligent in-car interactions.


Research suggests that by 2022, 73% of drivers will use an in-car voice assistant.

The Automotive Voice Recognition System Market was valued at USD 2.01 Bn in 2021, and is expected to reach USD 3.51 Bn by 2027, registering a CAGR of around 8.07%.

Real World Solution

Data that powers voice-activated systems

Voice-activated systems in cars enhance safety and convenience. They allow drivers to access navigation, make calls, send texts, and control music without taking hands off the wheel or eyes off the road. By responding to verbal commands, these systems reduce distraction, promote multitasking, and ensure continuous focus on driving. 

The client is a global leader in conversational intelligence who offers voice AI solutions that let businesses offer incredible conversational experiences to their customers. They were working with leading automotive companies to train their voice-activated systems with branded key phrases and needed Shaip’s expertise in audio data collection.

Real World Solution


  • Crowd Sourcing: Recruit 2800+ native speakers per language globally.
  • Data Collection: Secure 200k+ prompts in 12 languages within set timeframe.
  • Context & Intent Recognition: To understand user requests correctly, systems needed to be trained on different variations for the same key phrase.
  • Background Noise Handling: Address real-world background noise for ML model accuracy.
  • Reducing Bias: Acquire voice samples from diverse demographics to ensure inclusivity.
  • Audio Specs: 16khz 16bits PCM, mono, single-channel, WAV; no processing.
  • Recording Environment: Recordings should have clean audio without background noise or disturbance. Key Phrases to be recorded using normal speech.
  • Quality Check:  All speech recordings will undergo quality assessment and validation, only validated speech recordings will be delivered. If Shaip does not meet the agreed Quality Standards, Shaip will redeliver data at no additional cost


Shaip with its expertise in the Conversational AI space enabled the client with:

  • Data Collection: 208k key phrases/brand prompts collected in 12 global languages from 2800 speakers in the stipulated time frame
  • Diverse Accents & Dialects: Recruited specialists from around the world, proficient in the desired accents and dialects.
  • Context & Intent Recognition: Every speaker was tasked with recording the key phrases in 20 distinct variations, enabling the ML models to accurately grasp user requests in terms of context and intent.
  • Background Noise Handling: To ensure pristine audio quality, we made certain that the key phrases were captured in a serene environment with noise levels below 40dB, devoid of background disturbances like TV, radio, music, speech, or street sounds.
  • Reducing Bias: To minimize bias, we engaged individuals from diverse regions and maintained a balanced demographic representation with 50% males and 50% females, spanning age groups from 18 to 60 years.
  • Recording Guidelines: The key phrases were captured in a consistent, normal speech pattern, without any variations such as fast or slow pacing. 2-second silence at both the beginning and end to guarantee that no part of the speech was inadvertently clipped.
  • Recording Formta: The audio was recorded at 16kHz, 16-bit PCM in mono, utilizing a single channel, and saved in the WAV file format. The audio remains unprocessed, meaning there was no application of compression, reverb, or EQ.
  • Quality: Every speech recording was subjected to rigorous quality checks and validation. Only recordings that passed this assessment were delivered. Any files that fell short of the agreed-upon quality standards were re-recorded and provided without any extra charges


The high-quality brand key phrase audio data or voice prompts will enable the automotive companies and their customers with:

  1. Branding and Identity: Voice prompts with specific, brand phrase helps companies create a direct & memorable connection between user and the brand that enhances brand recall.
  2. Ease of Use: Voice commands make it easier for drivers to interact with vehicle’s without taking their hands off the wheel or their eyes off the road thereby enhancing road safety.
  3. Functionality: Voice commands make accessing and controlling car features more intuitive. Whether its navigation, media playback, or climate control.
  4. Integration with Other Systems: Many voice-activated systems are integrated with smartphones, smart home devices, and other IoT devices. For example, a user might be able to ask their car to turn on the lights at home as they approach home.
  5. Competitive Advantage: Offering advanced voice-activated systems can be a selling point & a differentiator. Buyers look for the latest tech when considering a new car purchase.
  6. Future-Proofing: As tech evolves & IoT becomes more integrated into everyday life, having a robust voice-activated system positions automotive companies to be more adaptive to future tech.
  7. Revenue Opportunities: Additional monetization opportunities i.e., voice systems offer recommendations or integrated e-commerce experiences (like ordering food or finding nearby services) that could provide affiliate revenue.

When we began sourcing voice prompts for the automotive sector, the challenges were numerous. Capturing the diversity in speech, accents, and tones was vital to represent our client’s global clientele. Shaip stood out not just as a vendor, but as a true partner. Their commitment to securing a diverse range of voices from different regions was commendable. They went beyond merely gathering voices; they grasped the nuances of our project needs, guaranteeing top-notch recordings. Their flawless adherence to audio collection standards showcased their professionalism and dedication to the project.

Accelerate your Conversational AI
application development by 100%