Boston English Dataset
High-Quality Boston English Call-Center, General Conversation, and Podcast Dataset for AI & Speech Models
Overview
Title (Language)
Boston English Language Dataset
Dataset Types
Call Center, General Conversation, Media Data (Podcast Data)
Country
United States
Description
Unscripted, synthetic telephonic conversations between an agent and a customer are available with an approximate duration ranging from 5 to 15 minutes. Additionally, licensable public domain audio or video files, such as interviews, podcasts, and similar content involving 1 to 5 participants, are available with an approximate duration ranging from 15 to 60 minutes.
Use Case
ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
Data Set Details
| Dataset Type | Sampling Rate | Speakers | Channel | Total Hours | Total Number of Speakers | 
|---|---|---|---|---|---|
| Call Center | 8 kHz | 2 Speakers | Dual | 22:16:10 | 228 | 
| General Conversation | 8 kHz | 2 Speakers | Dual | 162:51:50 | 994 | 
| Media Data | 16 kHz | Multiple Speaker | Mono | 85:51:52 | 206 | 
Featured Clients
Empowering teams to build world-leading AI products.
															Can’t find what you are looking for?
New off-the-shelf datasets are being collected across all data types
Contact us now to let go of your audio/speech training data collection worries