Assamese Dataset
অসমীয়া ডাটাছেট
High-Quality Assamese Call-Center, General Conversation, and Podcast Dataset for AI & Speech Models
Overview
Title (Language)
Assamese Language Dataset
Dataset Types
Call Center, General Conversation, Media Data (Podcast Data)
Country
India
Description
Unscripted, synthetic telephonic conversations between an agent and a customer are available with an approximate duration ranging from 5 to 15 minutes. Additionally, licensable public domain audio or video files, such as interviews, podcasts, and similar content involving 1 to 5 participants, are available with an approximate duration ranging from 15 to 60 minutes.
Use Case
ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
Data Set Details
Dataset Type | Sampling Rate | Speakers | Channel | Total Hours | Total Number of Speakers |
---|---|---|---|---|---|
Call Center | 44 kHz | 2 Speakers | Dual | 35:41:55 | 420 |
General Conversation | 8 kHz | 2 Speakers | Dual | 96:24:41 | 252 |
Media Data | 16 kHz | Multiple Speaker | Mono | 28:41:59 | 122 |
Featured Clients
Empowering teams to build world-leading AI products.
Can’t find what you are looking for?
New off-the-shelf datasets are being collected across all data types
Contact us now to let go of your audio/speech training data collection worries