Open Datasets

Discover open source datasets that gets you going to train ML models

Open datasets

Open Source Datasets To Get You Started with AI/ML Models

The output of your AI & ML models is only as good as the data you use to train it – so the precision that you apply to data aggregation and the tagging and identifying of that data is important!

So if you want to start a new AI/ML initiative and now you’re quickly realizing that finding high-quality training data will be one of the more challenging aspects of your project as high-quality datasets are the fuel that keeps the AI/ML engine running. We have accumulated a list of open datasets that are free to use and train your AI/ML models of the future.

Specialization Data Type Dataset Name Industry / Dept. Annotation/Use Case Link
+NLP Text Amazon Reviews E-commerce Sentiment Analysis Link
+NLP Text Wikipedia Links Data General Link
+NLP Text Standford Sentiment Treebank Entertainment Sentiment Analysis Link
+NLP Text Twitter US Airline Sentiment Airline Sentiment Analysis Link
+CV Image Imagenet General Link
+CV Image Google’s Open Images General Link
+NLP Text Cornell Movie Dialogs Entertainment Dialogs Link
+NLP Text Yahoo Answers General Question Answering Link
+NLP Text MS MARCO General Question Answering Link
+NLP Text Natural Questions Dataset General Question Answering Link
+NLP Text DBPedia General Knowledge Graph Link
+NLP Text YAGO General Knowledge Graph Link
+NLP Text FreeBase General Knowledge Graph Link
+NLP Text Ontonotes General Semantic Role Labeling Link
+NLP Text CoNLL 2003 General Named Entity Recognition Link
+CV Image COCO General Object Detection Link
+CV Image PASCAL VOC General Object Detection Link
+CV Image Cityscapes Autonomous Driving Semantic Segmentation Link
+CV Image MNIST General Digit Classification Link
+CV Image Fashion-MNIST Retail Image Classification Link
+NLP Audio LibriSpeech General ASR Link
+NLP Audio TED-LIUM General ASR Link
+NLP Audio TIMIT General Phoneme Recognition Link
+NLP Audio Common Voice General ASR Link
+NLP Audio VoxCeleb General Speaker Recognition Link
+NLP Text Wikipedia Dump General Language Modeling Link
+NLP Text Gigaword News Language Modeling Link
+NLP Text IMDB Reviews Entertainment Sentiment Analysis Link
+CV Video Kinetics-700 General Action Recognition Link
+CV Video UCF101 General Action Recognition Link
+CV Video HMDB51 General Action Recognition Link
+CV Image LFW (Labeled Faces in the Wild) General Face Recognition Link
+CV Image CASIA-WebFace General Face Recognition Link
+NLP Text SQuAD General Reading Comprehension Link
+NLP Text NewsQA News Reading Comprehension Link
+NLP Text MultiNLI General Natural Language Inference Link
+NLP Text SNLI General Natural Language Inference Link
+NLP Text WikiText General Language Modeling Link
+CV Image Stanford Cars Automotive Fine-grained Classification Link
+CV Image Oxford Flowers 102 Botany Fine-grained Classification Link
+CV Image CIFAR-10 General Image Classification Link
+CV Image CIFAR-100 General Image Classification Link
+CV Image VOC Person Layout General Pose Estimation Link
+CV Image MPII Human Pose General Pose Estimation Link
+NLP Text Reuters-21578 Finance Text Classification Link
+NLP Text 20 Newsgroups General Text Classification Link