MNIST database

USPS database In 1988, a dataset of digits from the US Postal Service was constructed. It contained 16×16 grayscale images digitized from handwritten zip codes that appeared on U.S. mail passing through the Buffalo, New York post office. The training set had 7291 images, and test set had 2007, making a total of 9298. Both training and test set contained ambiguous, unclassifiable, and misclassified data. The dataset was used to train and benchmark the 1989 LeNet. The task is rather difficult. On the test set, two humans made errors at an average rate of 2.5%. Several years of work resulted in several "Special Databases" and benchmarks. Of particular importance to MNIST are Special Database 1 (SD-1), released in May 1990, Special Database 3 (SD-3), released in February 1992, and Special Database 7 (SD-7), or NIST Test Data 1 (TD-1), released in April 1992. They were released on ISO-9660 CD-ROMs. SD-3 was much cleaner and easier to recognize than images in SD-7. It was suspected that SD-3 was produced by people more motivated than those who produced SD-7. Also, the character segmenter for SD-3 was an older design than that of SD-7, and failed more often. It was suspected that the harder instances were filtered out of the construction of SD-3, since the hard instances failed to even pass the segmenter. SD-19 was published in 1995, as a compilation of SD-1, SD-3, SD-7 and some further data. It contained 814,255 binary images of alphanumericals and binary images of 4169 HSFs, including those 500 HSFs that were used to generate SD-7. It was updated in 2016. The training set and the test set both originally had 60k samples, but 50k of the test set samples were discarded, and only the samples indexed 24476 to 34475 were used, giving just 10k samples in the test set. Further versions In 2019, the full 60k test set from MNIST was restored to construct the QMNIST, which has 60k images in the training set and 60k in the test set. MNIST included images only of handwritten digits. EMNIST was constructed from all the images from SD-19, converted into the same 28x28 pixel format, by the same process, as were the MNIST images. Accordingly, tools which work with MNIST would likely work unmodified with EMNIST. Fashion MNIST was created in 2017 as a more challenging alternative for MNIST. The dataset consists of 70,000 28x28 grayscale images of fashion products from 10 categories. == Performance ==