Bengali Dataset
বাংলা ডেটাসেট
High-Quality Bengali Call-Center, General Conversation, and Podcast Dataset for AI & Speech Models
Overview
Title (Language)
Bengali Language Dataset
Dataset Types
Call Center, General Conversation, Media Data (Podcast Data), Scripted Monologue
Country
India
Description
Unscripted, synthetic telephonic conversations between an agent and a customer are available with an approximate duration ranging from 5 to 15 minutes. Additionally, licensable public domain audio or video files, such as interviews, podcasts, and similar content involving 1 to 5 participants, are available with an approximate duration ranging from 15 to 60 minutes.
Use Case
ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
Data Set Details
Dataset Type | Sampling Rate | Speakers | Channel | Total Hours | Total Number of Speakers |
---|---|---|---|---|---|
Call Center | 8 kHz | 2 Speakers | Dual | 117:03:45 | 498 |
General Conversation | 8 kHz | 2 Speakers | Dual | 168:13:39 | 458 |
Media Data | 16 kHz | Multiple Speaker | Mono | 24:58:58 | 90 |
Scripted Monologue | 24 kHz | Single Speaker | Mono | 2,300:00:00 | On Request |
Featured Clients
Empowering teams to build world-leading AI products.
Can’t find what you are looking for?
New off-the-shelf datasets are being collected across all data types
Contact us now to let go of your audio/speech training data collection worries