Malayalam Dataset
മലയാളം ഡാറ്റാസെറ്റ്
High-Quality Malayalam General Conversation, and Podcast Dataset for AI & Speech Models
Overview
Title (Language)
Malayalam Language Dataset
Dataset Types
General Conversation, Media (Podcast) Data
Country
India
Description
This dataset includes unscripted synthetic telephonic conversations between an agent and a customer (5–15 minutes) and licensable public domain audio or video files, such as interviews and podcasts with 1 to 5 participants (15–60 minutes).
Use Case
ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
Data Set Details
| Dataset Type | Sampling Rate | Speakers | Channel | Total Hours | Total Number of Speakers |
|---|---|---|---|---|---|
| General Conversation | 8 kHz | 2 Speakers | Dual | 70:46:30 | 576 |
| General Conversation | 8 kHz | 2 Speakers | Dual | 149:39:33 | 296 |
| Media Data | 16 kHz | Multiple Speakers | Mono | 12:39:24 | 81 |
Featured Clients
Empowering teams to build world-leading AI products.
Can’t find what you are looking for?
New off-the-shelf datasets are being collected across all data types
Contact us now to let go of your audio/speech training data collection worries