Over 22k hours of audio data were collected & transcribed to train a multi-lingual digital assistant.
LOUISVILLE, KENTUCKY, UNITED STATES, Aug 1, 2022: Shaip enables an American multinational computer technology corporation with over 22k hours of audio data to train their multi-lingual digital assistant in over 13 languages from across the globe.
Over 7M Utterances of 30 seconds or less were collected, transcribed, & delivered in less than eight months while ensuring a healthy mix of speakers by age, gender, education, and dialects in a diverse mix of recording environments in 16kHz frequency.
Vatsal Ghiya, CEO of Shaip, said, “Shaip is a leader in Conversational AI Projects. We have enabled multiple Fortune 500 companies with their NLP data requirements. We shared the same vision with the client and enabled them to improve solutions with gold-standard data that solves future problems that matter.”
He further adds, “the need for Utterance training arises from the fact that not all customers use the same words or phrases while interacting or asking questions to their voice assistants in a scripted format. That’s why specific voice applications need to be trained on spontaneous speech data. E.g., “Where is the closest hospital located?” “Find a hospital near me” or “Is there a hospital nearby?” all indicate the same search intent but are phrased differently. Shaip can help you identify and articulate utterances in ways people would interact with a voice assistant in a real-world scenario.”
The scope of work for Shaip included but was not limited to acquiring large volumes of audio training data for speech recognition, transcribing audio recordings in multiple languages, and delivering corresponding JSON files containing the metadata. Shaip can collect utterances at scale while maintaining desired levels of quality required to train ML models for complex projects.
Headquartered in Louisville, Kentucky, Shaip is a fully managed data platform designed for companies looking to solve their most demanding AI challenges enabling smarter, faster, and better results. Shaip supports all aspects of AI training data from data collection, licensing, labeling, transcribing, and de-identifying by seamlessly scaling our people, platform, & processes to help companies develop their AI and ML models. To learn how to make your data science team and leaders’ life more manageable, visit us at www.shaip.com.
Senior Manager – Marketing
12806 Townepark Way, Louisville, KY 40243-2311