Shaip
  • What We Do
        • What we do best

          AI Data Services

          Data Collection Create global audio, images, text & video.

          Data Annotation & LabelingAccurately annotate to make AI/ML think faster

          Data LicensingOff-the-Shelf Curated Data. Smarter Models

          Speciality

          Healthcare AI Transform complex data into actionable insight.

          Conversational AI Localize speech models with multi-lingual datasets.

          Computer Vision Best-in-class visual training data

          Generative AIFuel your Gen AI with our premium training data.

          • RAG
          • Fine-Tuning
          • Red Teaming
          • Multimodal AI
          • RLHF
          • AI Prompt Generation
  • Off-the-shelf Data
        • Off-the-shelf Data Catalog & Licensing

          Medical DatasetsGold standard, de-identified data

          Physician Dictation Datasets

          Transcribed Medical Records

          Electronic Health Records (EHR)

          CT Scan Images Datasets

          X-Ray Images Datasets

          View All

          Computer Vision DatasetsImage & Video data for ML

          Bank Statement Dataset

          Damaged Car Image Dataset

          Facial Recognition Datasets

          Landmark Image Dataset

          Pay Slips Dataset

          View All

          Speech/Audio DatasetsTranscribed & annotated data in 65+ languages.

          New York English

          Chinese Traditional

          Spanish (Mexico)

          Canadian French

          Arabic

          TTS

          Wake Word

          Call-Center

          Scripted Monologue

          General Conversation

          Podcast

          Spontaneous Dialogue

          Spontaneous IVR

          Singing Audio

          View All

  • Solutions
        • Solutions

          Industry

          Healthcare Transform complex data into actionable insight.

          Technology Powering Technology with Precision Data

          eCommerce Improve Conversion, Order Value, & Revenue

          View All

          Use Cases

          Biometric Data High-Quality Biometric Datasets

          Facial Recognition Auto-detect faces via facial landmarks

          Image Annotation Services Supercharge AI with Image Annotation

           

          Indic Language Data Pre-labeled Indian language speech datasets

          Content Moderation Services Boost AI trust & brand reputation

          Medical Data Annotation Extract entities from unstructured data

          View All

  • Platform
    • Data Platform
    • Generative AI Platform
  • Company
    • About
    • Leadership
    • Blogs
    • Events & Webinars
    • Careers
    • Press Room
    • Security & Compliance
    • Resources
      • Case Study
      • Buyer’s Guide
      • Infographics
      • In The Media
      • Sample Datasets
  • What We Do
    • AI Data Services
      • Data Collection
      • Data Annotation & Labeling
    • Speciality
      • Healthcare AI
      • Conversational AI
      • Computer Vision
      • Generative AI
      • Large Language Models Service
  • Off-the-shelf Data
    • Medical Data Catalog
    • Speech Data Catalog
    • Computer Vision Data Catalog
  • Solutions
    • Industry
      • Healthcare
      • Technology
      • eCommerce
    • Use Cases
      • Biometric Data
      • Facial Recognition
      • Image Annotation Services
      • Indic Language Data
      • Content Moderation Services
      • Medical Data Annotation
      • View All
  • Platform
    • Data Platform
    • Generative AI Platform
  • Resources
    • Case Study
    • Buyer’s Guide
    • Infographics
    • Sample Datasets
    • In The Media
    • Blogs
  • Company
    • About Us
    • Leadership
    • Careers
  • Contact
  • Collaborate with Us
Contact Us
Freelancer/Vendor

Home » Speech Datasets » General Conversation Dataset

Multilingual Human Conversation Dataset

Accelerate Your ASR, NLP, and Conversational AI Development with High-Quality Multilingual Human Conversation Data.

Speech Datasets
Afrikaans Dataset Speech Data

General Conversation, Podcast

No. Hours: 1,026

Afrikaans Dataset

View More

Arabic Dataset Speech Data

General Conversation, TTS

No. Hours: 2,239

Arabic Dataset

View More

Arabic English Dataset Speech Data

General Conversation

No. Hours: 100

Arabic English Dataset

View More

Assamese Dataset Speech Data

Call-Center, General Conversation, Podcast

No. Hours: 200

Assamese Dataset

View More

Bengali Dataset Speech Data

Call-Center, General Conversation, Podcast

No. Hours: 200

Bengali Dataset

View More

Boston English Dataset Speech Data

Call-Center, General Conversation, Podcast

No. Hours: 301

Boston English Dataset

View More

Brazilian Portuguese Dataset Speech Data

General Conversation

No. Hours: 200

Brazilian Portuguese Dataset

View More

Bulgarian Dataset Speech Data

General Conversation

No. Hours: 200

Bulgarian Dataset

View More

Burmese Dataset Speech Data

General Conversation, TTS

No. Hours: 1,000

Burmese Dataset

View More

Cantonese Dataset Speech Data

General Conversation, Spontaneous Dialogue

No. Hours: 1,250

Cantonese Dataset

View More

Chittagonian Dataset Speech Data

General Conversation, TTS

No. Hours: 900

Chittagonian Dataset

View More

Danish Dataset Speech Data

General Conversation, Podcast, TTS

No. Hours: 3,615

Danish Dataset

View More

Dari Dataset Speech Data

General Conversation, TTS

No. Hours: 700

Dari Dataset

View More

Dogri Dataset Speech Data

General Conversation, TTS

No. Hours: 250

Dogri Dataset

View More

English Deep South Dataset Speech Data

Call-Center, General Conversation, Podcast

No. Hours: 473

English Deep South Dataset

View More

Gojri Dataset Speech Data

General Conversation, TTS

No. Hours: 250

Gojri Dataset

View More

Gujarati Dataset Speech Data

Call-Center, General Conversation, Podcast

No. Hours: 200

Gujarati Dataset

View More

Hebrew Dataset Speech Data

General Conversation, Podcast

No. Hours: 826

Hebrew Dataset

View More

Hindi Dataset Speech Data

General Conversation, Podcast, TTS

No. Hours: 3,126

Hindi Dataset

View More

Indonesian Dataset Speech Data

General Conversation, Podcast

No. Hours: 1,139

Indonesian Dataset

View More

Irish Dataset Speech Data

General Conversation

No. Hours: 192

Irish Dataset

View More

Kannada Dataset Speech Data

Call-Center, General Conversation, Podcast

No. Hours: 200

Kannada Dataset

View More

Kashmiri Dataset Speech Data

General Conversation, TTS

No. Hours: 1,000

Kashmiri Dataset

View More

Malay Dataset Speech Data

General Conversation, Podcast

No. Hours: 610

Malay Dataset

View More

Malayalam Dataset Speech Data

Call-Center, General Conversation, Podcast

No. Hours: 200

Malayalam Dataset

View More

Marathi Dataset Speech Data

Call-Center, General Conversation, Podcast

No. Hours: 200

Marathi Dataset

View More

Nagamese Dataset Speech Data

General Conversation, TTS

No. Hours: 850

Nagamese Dataset

View More

New York English Dataset Speech Data

Call-Center, General Conversation, Podcast

No. Hours: 350

New York English Dataset

View More

New Zealand English Dataset Speech Data

General Conversation, Podcast

No. Hours: 548

New Zealand English Dataset

View More

Norwegian Dataset Speech Data

Call-Center, General Conversation, Scripted Monologue, Spontaneous Dialogue

No. Hours: 950

Norwegian Dataset

View More

Oriya Dataset Speech Data

Call-Center, General Conversation, Podcast

No. Hours: 200

Oriya Dataset

View More

Punjabi Dataset Speech Data

Call-Center, General Conversation, Podcast

No. Hours: 200

Punjabi Dataset

View More

Scottish (English Accent) Dataset Speech Data

General Conversation

No. Hours: 292

Scottish (English Accent) Dataset

View More

Sinhalese Dataset Speech Data

General Conversation, TTS

No. Hours: 1,000

Sinhalese Dataset

View More

Swedish Dataset Speech Data

Call-Center, General Conversation, Podcast

No. Hours: 528

Swedish Dataset

View More

Tamil Dataset Speech Data

Call-Center, General Conversation, Podcast

No. Hours: 200

Tamil Dataset

View More

Telugu Dataset Speech Data

General Conversation, Podcast

No. Hours: 1,201

Telugu Dataset

View More

Thai Dataset Speech Data

General Conversation, Podcast

No. Hours: 356

Thai Dataset

View More

Vietnamese Dataset Speech Data

General Conversation, Podcast

No. Hours: 552

Vietnamese Dataset

View More

Welsh (English Accent) Dataset Speech Data

General Conversation

No. Hours: 278

Welsh (English Accent) Dataset

View More

Comprehensive Speech Data Solutions: Fast, Flexible, and Best-in-Class Quality

Comprehensive Voice Data Solutions

End-to-end service: Complete service with expert domain knowledge and fast delivery.

Flexible: Choose custom, semi-custom, or off-the-shelf voice datasets with flexible ownership.

Domain Expert: Hire a Specialized Domain Expert for Fast, Quality AI Datasets.

Quality: Get quality checks from industry experts.

Licensing: Get a license tailored to your needs.

Ethical Data: We ensure contributors are informed and consent to data use.

AI Data Services
  • Data Licensing
  • Data Collection
  • Data Annotation
  • Data De-Identification
Platform
  • Data Platform
  • Generative AI Platform
Speciality
  • Healthcare AI
  • Conversational AI
  • Generative AI
  • Computer Vision
Industry
  • Healthcare AI
  • Technology
  • eCommerce
Resources
  • Blogs
  • Case Study
  • Buyer’s Guide
  • Infographics
  • Sample Datasets
  • Media
Company
  • About
  • Leadership
  • Compliance
  • CSR
  • Press Room
  • Partners
Contact Us

(US): (866) 473-5655

marketing@shaip.com
vendorcolab@shaip.com
career@shaip.com

Vendor Enrolment Form

Linkedin X-twitter Facebook Youtube Instagram

© 2018 – 2025 Shaip | All Rights Reserved

Consent Preferences
  • Privacy Policy
  • Cookie Policy
  • Terms of Service
  • Privacy Policy
  • Cookie Policy
  • Terms of Service