The below lists various data sets that can be used to train and test Artificial Intelligence and Machine Learning Algorithms.
Below is a list of various data sets that can be applied to learning algorithms, either to benchmark different algorithms or techniques or to help experience with simulation software. Most simulation software packages will come with some basic examples and data sets, but this list provides a more extensive range of data that you can apply to your learning algorithm experiments.
- Machine learning databases at California, Irvine (Readme file)
- Experimental stock market data [from stock master].
- MNIST Handwritten digits
- Google House Numbers from street view
- CIFAR-10 and CIFAR-100 are labeled subsets of the 80 million tiny images dataset
- IMAGENET is an image database organized according to the WordNet hierarchy (currently only the nouns), in which each node of the hierarchy is depicted by hundreds and thousands of images
- Tiny Images 80 Million tiny images
- Flickr Data 100 Million Yahoo dataset
- Berkeley Segmentation Dataset 500 collection of 12,000 hand-labeled segmentations of 1,000 Corel dataset images from 30 human subjects
- Vision data sets a collection of test images
- Machine Learning Datasets via MLN.io
- Machine Learning Datasets for Research via Wikipedia
- R10 – Yahoo News Feed dataset, version 1.0 (1.5TB) via yahoo – It contains approximately 110 billion rows of data regarding user-news interactions
- The Yahoo Webscope Program is a reference library of interesting and scientifically useful datasets for non-commercial use by academics and other scientists
If you would like us to add a specific item to this list, please let us know via our add a link page