stable_datasets.images package

Submodules

stable_datasets.images.arabic_characters module

class ArabicCharacters(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Arabic Handwritten Characters Dataset

Abstract Handwritten Arabic character recognition systems face several challenges, including the unlimited variation in human handwriting and large public databases. In this work, we model a deep learning architecture that can be effectively apply to recognizing Arabic handwritten characters. A Convolutional Neural Network (CNN) is a special type of feed-forward multilayer trained in supervised mode. The CNN trained and tested our database that contain 16800 of handwritten Arabic characters. In this paper, the optimization methods implemented to increase the performance of CNN. Common machine learning methods usually apply a combination of feature extractor and trainable classifier. The use of CNN leads to significant improvements across different machine-learning classification algorithms. Our proposed CNN is giving an average 5.1% misclassification error on testing data.

Context The motivation of this study is to use cross knowledge learned from multiple works to enhancement the performance of Arabic handwritten character recognition. In recent years, Arabic handwritten characters recognition with different handwriting styles as well, making it important to find and work on a new and advanced solution for handwriting recognition. A deep learning systems needs a huge number of data (images) to be able to make a good decisions.

Content The data-set is composed of 16,800 characters written by 60 participants, the age range is between 19 to 40 years, and 90% of participants are right-hand. Each participant wrote each character (from ’alef’ to ’yeh’) ten times on two forms as shown in Fig. 7(a) & 7(b). The forms were scanned at the resolution of 300 dpi. Each block is segmented automatically using Matlab 2016a to determining the coordinates for each block. The database is partitioned into two sets: a training set (13,440 characters to 480 images per class) and a test set (3,360 characters to 120 images per class). Writers of training set and test set are exclusive. Ordering of including writers to test set are randomized to make sure that writers of test set are not from a single institution (to ensure variability of the test set).

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://github.com/mloey/Arabic-Handwritten-Characters-Dataset', assets=mappingproxy({'train': DownloadInfo(url='https://github.com/mloey/Arabic-Handwritten-Characters-Dataset/raw/master/Train%20Images%2013440x32x32.zip', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='https://github.com/mloey/Arabic-Handwritten-Characters-Dataset/raw/master/Test%20Images%203360x32x32.zip', fallbacks=[], checksum=None, filename=None)}), citation='@article{el2017arabic,\n                        title={Arabic handwritten characters recognition using convolutional neural network},\n                        author={El-Sawy, Ahmed and Loey, Mohamed and El-Bakry, Hazem},\n                        journal={WSEAS Transactions on Computer Research},\n                        volume={5},\n                        pages={11--19},\n                        year={2017}}', license='', checksums=None)
VERSION: Version = Version('1.0.0')

stable_datasets.images.arabic_digits module

class ArabicDigits(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Arabic Handwritten Digits Dataset.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://github.com/mloey/Arabic-Handwritten-Digits-Dataset', assets=mappingproxy({'train': DownloadInfo(url='https://raw.githubusercontent.com/mloey/Arabic-Handwritten-Digits-Dataset/master/Arabic%20Handwritten%20Digits%20Dataset%20CSV.zip', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='https://raw.githubusercontent.com/mloey/Arabic-Handwritten-Digits-Dataset/master/Arabic%20Handwritten%20Digits%20Dataset%20CSV.zip', fallbacks=[], checksum=None, filename=None)}), citation='@inproceedings{el2016cnn,\n                        title={CNN for handwritten arabic digits recognition based on LeNet-5},\n                        author={El-Sawy, Ahmed and Hazem, EL-Bakry and Loey, Mohamed},\n                        booktitle={International conference on advanced intelligent systems and informatics},\n                        pages={566--575},\n                        year={2016},\n                        organization={Springer}\n                        }', license='', checksums=None)
VERSION: Version = Version('1.0.0')

stable_datasets.images.awa2 module

class AWA2(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

The Animals with Attributes 2 (AwA2) dataset provides images across 50 animal classes, useful for attribute-based classification and zero-shot learning research. See https://cvml.ista.ac.at/AwA2/ for more information.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://cvml.ista.ac.at/AwA2/', assets=mappingproxy({'train': DownloadInfo(url='https://cvml.ista.ac.at/AwA2/AwA2-data.zip', fallbacks=[], checksum=None, filename=None)}), citation='@ARTICLE{8413121,\n                         author={Xian, Yongqin and Lampert, Christoph H. and Schiele, Bernt and Akata, Zeynep},\n                         journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},\n                         title={Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly},\n                         year={2019},\n                         volume={41},\n                         number={9},\n                         pages={2251-2265},\n                         keywords={Semantics;Visualization;Task analysis;Training;Fish;Protocols;Learning systems;Generalized zero-shot learning;transductive learning;image classification;weakly-supervised learning},\n                         doi={10.1109/TPAMI.2018.2857768}}', license='', checksums=None)
VERSION: Version = Version('1.0.0')

stable_datasets.images.beans module

class Beans(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Bean disease dataset for classification of three classes: Angular Leaf Spot, Bean Rust, and Healthy leaves.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://github.com/AI-Lab-Makerere/ibean/', assets=mappingproxy({'train': DownloadInfo(url='https://storage.googleapis.com/ibeans/train.zip', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='https://storage.googleapis.com/ibeans/test.zip', fallbacks=[], checksum=None, filename=None), 'validation': DownloadInfo(url='https://storage.googleapis.com/ibeans/validation.zip', fallbacks=[], checksum=None, filename=None)}), citation='@misc{makerere2020beans,\n                         author = "{Makerere AI Lab}",\n                         title = "{Bean Disease Dataset}",\n                         year = "2020",\n                         month = "January",\n                         url = "https://github.com/AI-Lab-Makerere/ibean/"}', license='', checksums=None)
VERSION: Version = Version('1.0.0')

stable_datasets.images.cars196 module

class Cars196(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Cars-196 Dataset The Cars-196 dataset, also known as the Stanford Cars dataset, is a benchmark dataset for fine-grained visual classification of automobiles. It contains 16,185 color images covering 196 car categories, where each category is defined by a specific combination of make, model, and year. The dataset is split into 8,144 training images and 8,041 test images, with the first 98 classes used exclusively for training and the remaining 98 classes reserved for testing, ensuring that training and test classes are disjoint. Images are collected from real-world scenes and exhibit significant variation in v iewpoint, background, and lighting conditions. Each image is annotated with a class label and a tight bounding box around the car, making the dataset suitable for fine-grained recognition tasks that require precise object localization and strong generalization to unseen categories.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://ai.stanford.edu/~jkrause/cars/car_dataset.html', assets=mappingproxy({'train': DownloadInfo(url='https://huggingface.co/datasets/haodoz0118/cars196-img/resolve/main/cars196_train.zip', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='https://huggingface.co/datasets/haodoz0118/cars196-img/resolve/main/cars196_test.zip', fallbacks=[], checksum=None, filename=None)}), citation='@inproceedings{krause20133d,\n            title={3d object representations for fine-grained categorization},\n            author={Krause, Jonathan and Stark, Michael and Deng, Jia and Fei-Fei, Li},\n            booktitle={Proceedings of the IEEE international conference on computer vision workshops},\n            pages={554--561},\n            year={2013}}', license='', checksums=None)
VERSION: Version = Version('1.0.0')

stable_datasets.images.cars3d module

class Cars3D[source]

Bases: BaseDatasetBuilder

183 car types x 24 azimuth angles x 4 elevation angles.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://github.com/google-research/disentanglement_lib/tree/master', assets=mappingproxy({'train': DownloadInfo(url='http://www.scottreed.info/files/nips2015-analogy-data.tar.gz', fallbacks=[], checksum=None, filename=None)}), citation='@inproceedings{locatello2019challenging,\n  title={Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations},\n  author={Locatello, Francesco and Bauer, Stefan and Lucic, Mario and Raetsch, Gunnar and Gelly, Sylvain and Sch{"o}lkopf, Bernhard and Bachem, Olivier},\n  booktitle={International Conference on Machine Learning},\n  pages={4114--4124},\n  year={2019}\n}', license='Apache-2.0', checksums=None)
VERSION: Version = Version('1.0.0')

stable_datasets.images.cassava module

Cassava leaf disease image classification loader.

class cassava[source]

Bases: object

Plant images classification.

The data consists of two folders, a training folder that contains 5 subfolders that contain the respective images for the different 5 classes and a test folder containing test images.

classes = ['cbb', 'cmd', 'cbsd', 'cgm', 'healthy']
static download(path)[source]
static load(path=None)[source]

stable_datasets.images.celeb_a module

class CelebA(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

The CelebA dataset is a large-scale face attributes dataset with more than 200K celebrity images, each with 40 attribute annotations.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html', assets=mappingproxy({'archive': DownloadInfo(url='https://drive.google.com/uc?export=download&id=0B7EVK8r0v71pZjFTYXZWM3FlRnM', fallbacks=[], checksum=None, filename=None), 'attributes': DownloadInfo(url='https://drive.google.com/uc?export=download&id=0B7EVK8r0v71pblRyaVFSWGxPY0U', fallbacks=[], checksum=None, filename=None), 'partition': DownloadInfo(url='https://drive.google.com/uc?export=download&id=0B7EVK8r0v71pY0NSMzRuSXJEVkk', fallbacks=[], checksum=None, filename=None)}), citation='@inproceedings{liu2015faceattributes,\n                         title = {Deep Learning Face Attributes in the Wild},\n                         author = {Liu, Ziwei and Luo, Ping and Wang, Xiaogang and Tang, Xiaoou},\n                         booktitle = {Proceedings of International Conference on Computer Vision (ICCV)},\n                         month = {December},\n                         year = {2015}}', license='', checksums=None)
VERSION: Version = Version('1.0.0')

stable_datasets.images.cifar10 module

class CIFAR10(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Image classification. The `CIFAR-10 < https: // www.cs.toronto.edu/~kriz/cifar.html >`_ dataset was collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. It consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://www.cs.toronto.edu/~kriz/cifar.html', assets=mappingproxy({'train': DownloadInfo(url='https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz', fallbacks=[], checksum=None, filename=None)}), citation='@article{krizhevsky2009learning,\n                         title={Learning multiple layers of features from tiny images},\n                         author={Krizhevsky, Alex and Hinton, Geoffrey and others},\n                         year={2009},\n                         publisher={Toronto, ON, Canada}}', license='', checksums=None)
VERSION: Version = Version('1.0.0')

stable_datasets.images.cifar100 module

class CIFAR100(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

CIFAR-100 dataset, a variant of CIFAR-10 with 100 classes.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://www.cs.toronto.edu/~kriz/cifar.html', assets=mappingproxy({'train': DownloadInfo(url='https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz', fallbacks=[], checksum=None, filename=None)}), citation='@article{krizhevsky2009learning,\n                         title={Learning multiple layers of features from tiny images},\n                         author={Krizhevsky, Alex and Hinton, Geoffrey and others},\n                         year={2009},\n                         publisher={Toronto, ON, Canada}}', license='', checksums=None)
VERSION: Version = Version('1.0.0')

stable_datasets.images.cifar100_c module

class CIFAR100C(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

CIFAR-100-C dataset with corrupted CIFAR-100 images.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://zenodo.org/records/3555552', assets=mappingproxy({'test': DownloadInfo(url='https://zenodo.org/records/3555552/files/CIFAR-100-C.tar?download=1', fallbacks=[], checksum=None, filename=None)}), citation='@article{hendrycks2019robustness,\n                        title={Benchmarking Neural Network Robustness to Common Corruptions and Perturbations},\n                        author={Dan Hendrycks and Thomas Dietterich},\n                        journal={Proceedings of the International Conference on Learning Representations},\n                        year={2019}}', license='', checksums=None)
VERSION: Version = Version('1.0.0')

stable_datasets.images.cifar10_c module

class CIFAR10C(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

CIFAR-10-C dataset with corrupted CIFAR-10 images.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://zenodo.org/records/2535967', assets=mappingproxy({'test': DownloadInfo(url='https://zenodo.org/records/2535967/files/CIFAR-10-C.tar?download=1', fallbacks=[], checksum=None, filename=None)}), citation='@article{hendrycks2019robustness,\n                        title={Benchmarking Neural Network Robustness to Common Corruptions and Perturbations},\n                        author={Dan Hendrycks and Thomas Dietterich},\n                        journal={Proceedings of the International Conference on Learning Representations},\n                        year={2019}}', license='', checksums=None)
VERSION: Version = Version('1.0.0')

stable_datasets.images.clevrer module

class CLEVRER(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

CLEVRER: CoLlision Events for Video REpresentation and Reasoning.

A diagnostic video dataset for systematic evaluation of computational models on a wide range of reasoning tasks. The dataset includes four types of questions: descriptive (e.g., “what color”), explanatory (“what’s responsible for”), predictive (“what will happen next”), and counterfactual (“what if”).

The dataset contains 20,000 synthetic videos of moving and colliding objects. Each video is 5 seconds long and contains 128 frames with resolution 480 x 320.

Splits:
  • train: 10,000 videos (index 0 - 9999)

  • validation: 5,000 videos (index 10000 - 14999)

  • test: 5,000 videos (index 15000 - 19999)

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='http://clevrer.csail.mit.edu/', assets=mappingproxy({'train_videos': DownloadInfo(url='http://data.csail.mit.edu/clevrer/videos/train/video_train.zip', fallbacks=[], checksum=None, filename=None), 'train_annotations': DownloadInfo(url='http://data.csail.mit.edu/clevrer/annotations/train/annotation_train.zip', fallbacks=[], checksum=None, filename=None), 'train_questions': DownloadInfo(url='http://data.csail.mit.edu/clevrer/questions/train.json', fallbacks=[], checksum=None, filename=None), 'validation_videos': DownloadInfo(url='http://data.csail.mit.edu/clevrer/videos/validation/video_validation.zip', fallbacks=[], checksum=None, filename=None), 'validation_annotations': DownloadInfo(url='http://data.csail.mit.edu/clevrer/annotations/validation/annotation_validation.zip', fallbacks=[], checksum=None, filename=None), 'validation_questions': DownloadInfo(url='http://data.csail.mit.edu/clevrer/questions/validation.json', fallbacks=[], checksum=None, filename=None), 'test_videos': DownloadInfo(url='http://data.csail.mit.edu/clevrer/videos/test/video_test.zip', fallbacks=[], checksum=None, filename=None), 'test_questions': DownloadInfo(url='http://data.csail.mit.edu/clevrer/questions/test.json', fallbacks=[], checksum=None, filename=None)}), citation='@inproceedings{yi2020clevrer,\n            title={CLEVRER: CoLlision Events for Video REpresentation and Reasoning},\n            author={Yi, Kexin and Gan, Chuang and Li, Yunzhu and Kohli, Pushmeet and Wu, Jiajun and Torralba, Antonio and Tenenbaum, Joshua B},\n            booktitle={International Conference on Learning Representations},\n            year={2020}\n        }', license='', checksums=None)
VERSION: Version = Version('1.0.0')

stable_datasets.images.country211 module

class Country211(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Country211: Image Classification Dataset for Geolocation. This dataset uses a subset of the YFCC100M dataset, filtered by GPS coordinates to include images labeled with ISO-3166 country codes. Each country has a balanced sample of images for training, validation, and testing.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://github.com/openai/CLIP/blob/main/data/country211.md', assets=mappingproxy({'train': DownloadInfo(url='https://openaipublic.azureedge.net/clip/data/country211.tgz', fallbacks=[], checksum=None, filename=None), 'valid': DownloadInfo(url='https://openaipublic.azureedge.net/clip/data/country211.tgz', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='https://openaipublic.azureedge.net/clip/data/country211.tgz', fallbacks=[], checksum=None, filename=None)}), citation='@inproceedings{radford2021learning,\n                title     = {Learning transferable visual models from natural language supervision},\n                author    = {Radford, Alec and Kim, Jong Wook and Hallacy, Chris and Ramesh, Aditya and Goh, Gabriel and Agarwal, Sandhini and Sastry, Girish and Askell, Amanda and Mishkin, Pamela and Clark, Jack and others},\n                booktitle = {International conference on machine learning},\n                pages     = {8748--8763},\n                year      = {2021},\n                organization = {PmLR} }\n        ', license='', checksums=None)
VERSION: Version = Version('1.0.0')

stable_datasets.images.cub200 module

class CUB200(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Caltech-UCSD Birds-200-2011 (CUB-200-2011) Dataset

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://www.vision.caltech.edu/datasets/cub_200_2011/', assets=mappingproxy({'train': DownloadInfo(url='https://data.caltech.edu/records/65de6-vp158/files/CUB_200_2011.tgz?download=1', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='https://data.caltech.edu/records/65de6-vp158/files/CUB_200_2011.tgz?download=1', fallbacks=[], checksum=None, filename=None)}), citation='@techreport{WahCUB_200_2011,\n                        Title = {The Caltech-UCSD Birds-200-2011 Dataset},\n                        Author = {Wah, C. and Branson, S. and Welinder, P. and Perona, P. and Belongie, S.},\n                        Year = {2011},\n                        Institution = {California Institute of Technology},\n                        Number = {CNS-TR-2011-001}}', license='', checksums=None)
VERSION: Version = Version('1.0.0')

stable_datasets.images.dsprites module

class DSprites[source]

Bases: BaseDatasetBuilder

dSprites is a dataset of 2D shapes procedurally generated from 6 ground truth independent latent factors. These factors are color, shape, scale, rotation, x and y positions of a sprite.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://github.com/deepmind/dsprites-dataset', assets=mappingproxy({'train': DownloadInfo(url='https://github.com/google-deepmind/dsprites-dataset/raw/refs/heads/master/dsprites_ndarray_co1sh3sc6or40x32y32_64x64.npz', fallbacks=[], checksum=None, filename=None)}), citation='@inproceedings{higgins2017beta,\n                    title={beta-vae: Learning basic visual concepts with a constrained variational framework},\n                    author={Higgins, Irina and Matthey, Loic and Pal, Arka and Burgess, Christopher and Glorot, Xavier and Botvinick, Matthew and Mohamed, Shakir and Lerchner, Alexander},\n                    booktitle={International conference on learning representations},\n                    year={2017}', license='', checksums=None)
VERSION: Version = Version('1.0.0')

stable_datasets.images.dsprites_color module

class DSpritesColor(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

DSprites dSprites is a dataset of 2D shapes procedurally generated from 6 ground truth independent latent factors. These factors are color, shape, scale, rotation, x and y positions of a sprite.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://github.com/deepmind/dsprites-dataset', assets=mappingproxy({'train': DownloadInfo(url='https://github.com/google-deepmind/dsprites-dataset/raw/refs/heads/master/dsprites_ndarray_co1sh3sc6or40x32y32_64x64.npz', fallbacks=[], checksum=None, filename=None)}), citation='@inproceedings{locatello2019challenging,\n                    title={Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations},\n                    author={Locatello, Francesco and Bauer, Stefan and Lucic, Mario and Raetsch, Gunnar and Gelly, Sylvain and Sch{"o}lkopf, Bernhard and Bachem, Olivier},\n                    booktitle={International Conference on Machine Learning},\n                    pages={4114--4124},\n                    year={2019}\n                    }', license='', checksums=None)
VERSION: Version = Version('1.0.0')

stable_datasets.images.dsprites_noise module

class DSpritesNoise(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

DSprites dSprites is a dataset of 2D shapes procedurally generated from 6 ground truth independent latent factors. These factors are color, shape, scale, rotation, x and y positions of a sprite.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://github.com/deepmind/dsprites-dataset', assets=mappingproxy({'train': DownloadInfo(url='https://github.com/google-deepmind/dsprites-dataset/raw/refs/heads/master/dsprites_ndarray_co1sh3sc6or40x32y32_64x64.npz', fallbacks=[], checksum=None, filename=None)}), citation='@inproceedings{locatello2019challenging,\n                    title={Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations},\n                    author={Locatello, Francesco and Bauer, Stefan and Lucic, Mario and Raetsch, Gunnar and Gelly, Sylvain and Sch{"o}lkopf, Bernhard and Bachem, Olivier},\n                    booktitle={International Conference on Machine Learning},\n                    pages={4114--4124},\n                    year={2019}\n                    }', license='', checksums=None)
VERSION: Version = Version('1.0.0')

stable_datasets.images.dsprites_scream module

class DSpritesScream(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

DSprites dSprites is a dataset of 2D shapes procedurally generated from 6 ground truth independent latent factors. These factors are color, shape, scale, rotation, x and y positions of a sprite.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://github.com/deepmind/dsprites-dataset', assets=mappingproxy({'train': DownloadInfo(url='https://github.com/google-deepmind/dsprites-dataset/raw/refs/heads/master/dsprites_ndarray_co1sh3sc6or40x32y32_64x64.npz', fallbacks=[], checksum=None, filename=None)}), citation='@inproceedings{higgins2017beta,\n                    title={beta-vae: Learning basic visual concepts with a constrained variational framework},\n                    author={Higgins, Irina and Matthey, Loic and Pal, Arka and Burgess, Christopher and Glorot, Xavier and Botvinick, Matthew and Mohamed, Shakir and Lerchner, Alexander},\n                    booktitle={International conference on learning representations},\n                    year={2017}', license='', checksums=None)
VERSION: Version = Version('1.0.0')

stable_datasets.images.dtd module

class DTD(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Describable Textures Dataset (DTD)

DTD is a texture database, consisting of 5640 images, organized according to a list of 47 terms (categories) inspired from human perception. There are 120 images for each category. Image sizes range between 300x300 and 640x640, and the images contain at least 90% of the surface representing the category attribute. The images were collected from Google and Flickr by entering our proposed attributes and related terms as search queries. The images were annotated using Amazon Mechanical Turk in several iterations. For each image we provide key attribute (main category) and a list of joint attributes.

The data is split in three equal parts, in train, validation and test, 40 images per class, for each split. We provide the ground truth annotation for both key and joint attributes, as well as the 10 splits of the data we used for evaluation.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://www.robots.ox.ac.uk/~vgg/data/dtd/', assets=mappingproxy({'train': DownloadInfo(url='https://www.robots.ox.ac.uk/~vgg/data/dtd/download/dtd-r1.0.1.tar.gz', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='https://www.robots.ox.ac.uk/~vgg/data/dtd/download/dtd-r1.0.1.tar.gz', fallbacks=[], checksum=None, filename=None), 'val': DownloadInfo(url='https://www.robots.ox.ac.uk/~vgg/data/dtd/download/dtd-r1.0.1.tar.gz', fallbacks=[], checksum=None, filename=None)}), citation='@InProceedings{cimpoi14describing,\n                    Author    = {M. Cimpoi and S. Maji and I. Kokkinos and S. Mohamed and and A. Vedaldi},\n                    Title     = {Describing Textures in the Wild},\n                    Booktitle = {Proceedings of the {IEEE} Conf. on Computer Vision and Pattern Recognition ({CVPR})},\n                    Year      = {2014}}', license='', checksums=None)
VERSION: Version = Version('1.0.0')

stable_datasets.images.e_mnist module

class EMNIST(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

EMNIST (Extended MNIST) Dataset

Abstract EMNIST is a set of handwritten characters derived from the NIST Special Database 19 and converted to a 28x28 pixel format that directly matches the MNIST dataset. It serves as a challenging “drop-in” replacement for MNIST, introducing handwritten letters and a larger variety of writing styles while preserving the original file structure and pixel density.

Context While the original MNIST dataset is considered “solved” by modern architectures, EMNIST restores the challenge by providing a larger, more diverse benchmark. It bridges the gap between simple digit recognition and complex handwriting tasks, offering up to 62 classes (digits + uppercase + lowercase) to test generalization and writer-independent recognition.

Content The dataset contains up to 814,255 grayscale images (28x28). It is provided in six split configurations to suit different needs: * ByClass & ByMerge: Full unbalanced sets (up to 62 classes). * Balanced: 131,600 images across 47 classes (ideal for benchmarking). * Letters: 145,600 images across 26 classes (A-Z). * Digits & MNIST: 280,000+ images across 10 classes (0-9).

BUILDER_CONFIGS: list = [EMNISTConfig(name='byclass', version=Version('1.0.0'), description=''), EMNISTConfig(name='bymerge', version=Version('1.0.0'), description=''), EMNISTConfig(name='balanced', version=Version('1.0.0'), description=''), EMNISTConfig(name='letters', version=Version('1.0.0'), description=''), EMNISTConfig(name='digits', version=Version('1.0.0'), description=''), EMNISTConfig(name='mnist', version=Version('1.0.0'), description='')]
SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://www.nist.gov/itl/iad/image-group/emnist-dataset', assets=mappingproxy({'train': DownloadInfo(url='https://biometrics.nist.gov/cs_links/EMNIST/matlab.zip', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='https://biometrics.nist.gov/cs_links/EMNIST/matlab.zip', fallbacks=[], checksum=None, filename=None)}), citation='@misc{cohen2017emnistextensionmnisthandwritten,\n                        title={EMNIST: an extension of MNIST to handwritten letters},\n                        author={Gregory Cohen and Saeed Afshar and Jonathan Tapson and André van Schaik},\n                        year={2017},\n                        eprint={1702.05373},\n                        archivePrefix={arXiv},\n                        primaryClass={cs.CV},\n                        url={https://arxiv.org/abs/1702.05373},\n            }', license='', checksums=None)
VERSION: Version = Version('1.0.0')
class EMNISTConfig(variant, **kwargs)[source]

Bases: BuilderConfig

stable_datasets.images.face_pointing module

class FacePointing(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Head angle classification dataset.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='http://crowley-coutaz.fr/HeadPoseDataSet/', assets=mappingproxy({'train': DownloadInfo(url='http://crowley-coutaz.fr/HeadPoseDataSet/HeadPoseImageDatabase.tar.gz', fallbacks=[], checksum=None, filename=None)}), citation='@inproceedings{gourier2004estimating,\n                         title={Estimating face orientation from robust detection of salient facial features},\n                         author={Gourier, Nicolas and Hall, Daniela and Crowley, James L},\n                         booktitle={ICPR International Workshop on Visual Observation of Deictic Gestures},\n                         year={2004},\n                         organization={Citeseer}}', license='', checksums=None)
VERSION: Version = Version('1.0.0')

stable_datasets.images.fashion_mnist module

class FashionMNIST(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Grayscale image classification.

Fashion-MNIST is a dataset of Zalando’s article images consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://github.com/zalandoresearch/fashion-mnist', assets=mappingproxy({'train': DownloadInfo(url='http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz', fallbacks=[], checksum=None, filename=None)}), citation='@article{xiao2017fashion,\n                         title={Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms},\n                         author={Xiao, Han and Rasul, Kashif and Vollgraf, Roland},\n                         journal={arXiv preprint arXiv:1708.07747},\n                         year={2017}}', license='', checksums=None)
VERSION: Version = Version('1.0.0')

stable_datasets.images.fgvc_aircraft module

class FGVCAircraft(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Fine-Grained Visual Classification of Aircraft (FGVC-Aircraft) Dataset.

FGVC-Aircraft is a benchmark dataset for fine-grained visual categorization of aircraft. The dataset contains 10,000 images of aircraft with 100 different aircraft model variants. Aircraft models are organized in a hierarchical structure with three levels: variant (finest), family, and manufacturer (coarsest).

The dataset is divided into training (3,334 images), validation (3,333 images), and test (3,333 images) subsets. Images are about 1-2MP resolution with a 20-pixel copyright banner at the bottom that is automatically removed during loading.

Usage:

dataset = FGVCAircraft(config_name=”variant”, split=”train”) dataset = FGVCAircraft(config_name=”family”, split=”train”) dataset = FGVCAircraft(config_name=”manufacturer”, split=”train”)

BUILDER_CONFIGS: list = [BuilderConfig(name='variant', version=None, description='100 aircraft model variants (finest granularity)'), BuilderConfig(name='family', version=None, description='70 aircraft families (medium granularity)'), BuilderConfig(name='manufacturer', version=None, description='30 aircraft manufacturers (coarsest granularity)')]
DEFAULT_CONFIG_NAME: str | None = 'variant'
SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/', assets=mappingproxy({'train': DownloadInfo(url='https://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/archives/fgvc-aircraft-2013b.tar.gz', fallbacks=[], checksum=None, filename=None), 'validation': DownloadInfo(url='https://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/archives/fgvc-aircraft-2013b.tar.gz', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='https://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/archives/fgvc-aircraft-2013b.tar.gz', fallbacks=[], checksum=None, filename=None)}), citation='@techreport{maji13fine-grained,\n                        title         = {Fine-Grained Visual Classification of Aircraft},\n                        author        = {S. Maji and J. Kannala and E. Rahtu and M. Blaschko and A. Vedaldi},\n                        year          = {2013},\n                        archivePrefix = {arXiv},\n                        eprint        = {1306.5151},\n                        primaryClass  = "cs.CV",\n                    }', license='Unknown', checksums=None)
VERSION: Version = Version('1.0.0')

stable_datasets.images.flowers102 module

class Flowers102(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Flowers102 Dataset

Abstract The Flowers102 dataset is a fine-grained image classification benchmark consisting of 102 flower categories commonly found in the United Kingdom. It was created to address the challenge of classifying objects with large intra-class variability and small inter-class differences. Each category contains between 40 and 258 images, totaling 8,189 images.

Context Fine-grained visual categorization (FGVC) focuses on differentiating between similar sub-categories of objects (e.g., different species of flowers or birds). Flowers102 serves as a standard benchmark in this domain. Unlike general object recognition (e.g., CIFAR-10), where classes are visually distinct (car vs. dog), Flowers102 requires models to learn subtle features like petal shape, texture, and color patterns.

Content The dataset consists of: - Images: 8,189 images stored in a single archive. - Labels: A MATLAB file mapping each image to one of 102 classes (0-101). - Splits: A predefined split ID file dividing the data into Training (1,020 images), Validation (1,020 images), and Test (6,149 images).

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://www.robots.ox.ac.uk/~vgg/data/flowers/102/', assets=mappingproxy({'images': DownloadInfo(url='https://www.robots.ox.ac.uk/~vgg/data/flowers/102/102flowers.tgz', fallbacks=[], checksum=None, filename=None), 'labels': DownloadInfo(url='https://www.robots.ox.ac.uk/~vgg/data/flowers/102/imagelabels.mat', fallbacks=[], checksum=None, filename=None), 'setid': DownloadInfo(url='https://www.robots.ox.ac.uk/~vgg/data/flowers/102/setid.mat', fallbacks=[], checksum=None, filename=None)}), citation='@inproceedings{nilsback2008flowers102,\n                         title={Automated flower classification over a large number of classes},\n                         author={Nilsback, Maria-Elena and Zisserman, Andrew},\n                         booktitle={2008 Sixth Indian conference on computer vision, graphics \\& image processing},\n                         pages={722--729},\n                         year={2008},\n                         organization={IEEE}}', license='', checksums=None)
VERSION: Version = Version('1.0.0')

stable_datasets.images.food101 module

class Food101(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Food-101 Dataset.

The Food-101 dataset consists of 101 food categories with 101,000 images. For each class, 250 manually reviewed test images are provided as well as 750 training images.

Split sizes: - train: 75,750 images (750 images × 101 classes) - test: 25,250 images (250 images × 101 classes)

All images are automatically rescaled to have a maximum side length of 512 pixels.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/', assets=mappingproxy({'train': DownloadInfo(url='https://huggingface.co/datasets/haodoz0118/food101-img/resolve/main/food101_train.tar', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='https://huggingface.co/datasets/haodoz0118/food101-img/resolve/main/food101_test.tar', fallbacks=[], checksum=None, filename=None)}), citation='@inproceedings{bossard14,\n            title = {Food-101 -- Mining Discriminative Components with Random Forests},\n            author = {Bossard, Lukas and Guillaumin, Matthieu and Van Gool, Luc},\n            booktitle = {European Conference on Computer Vision},\n            year = {2014}}', license='', checksums=None)
VERSION: Version = Version('1.0.0')

stable_datasets.images.galaxy10 module

class Galaxy10Decal(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Galaxy10 DECaLS Dataset: Galaxy morphology classification with DECaLS images.

Galaxy10 DECaLS is a much improved version of the original Galaxy10 dataset. It contains 17,736 256x256 pixel colored galaxy images (g, r and z band) separated into 10 classes. The images come from DESI Legacy Imaging Surveys (DECaLS) and labels come from Galaxy Zoo.

The original Galaxy10 dataset was created with Galaxy Zoo (GZ) Data Release 2 where volunteers classify ~270k of SDSS galaxy images. GZ later utilized images from DESI Legacy Imaging Surveys (DECaLS) with much better resolution and image quality. Galaxy10 DECaLS has combined all three (GZ DR2 with DECaLS images instead of SDSS images and DECaLS campaign a/b, c) resulting in ~441k of unique galaxies covered by DECaLS where ~18k of those images were selected in 10 broad classes using volunteer votes with more rigorous filtering.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://astronn.readthedocs.io/en/latest/galaxy10.html', assets=mappingproxy({'train': DownloadInfo(url='https://zenodo.org/records/10845026/files/Galaxy10_DECals.h5', fallbacks=[], checksum=None, filename=None)}), citation='@article{walmsley2020galaxy,\n                        title={Galaxy Zoo: probabilistic morphology through Bayesian CNNs and active learning},\n                        author={Walmsley, Mike and Smith, Lewis and Lintott, Chris and Gal, Yarin and Bamford, Steven and Dickinson, Hugh and Fortson, Lucy and Kruk, Sandor and Masters, Karen and Scarlata, Claudia and others},\n                        journal={Monthly Notices of the Royal Astronomical Society},\n                        volume={491},\n                        number={2},\n                        pages={1554--1574},\n                        year={2020},\n                        publisher={Oxford University Press}\n                    }', license='MIT', checksums=None)
VERSION: Version = Version('1.0.0')

stable_datasets.images.hasy_v2 module

class HASYv2(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

HASYv2 Dataset

Abstract The HASYv2 dataset contains handwritten symbol images of 369 classes. It includes over 168,000 samples categorized into various classes like Latin characters, numerals, and symbols. Each image is 32x32 pixels in size. The dataset was created to benchmark the classification of mathematical symbols and handwritten characters.

Context Recognizing handwritten mathematical symbols is a challenging task due to the similarity between classes (e.g., ‘1’, ‘l’, ‘|’) and the large number of unique symbols used in scientific notation. HASYv2 serves as a standard benchmark for testing classifiers on a large number of classes (369) with low resolution (32x32).

Content The dataset consists of: - Images: 168,236 black-and-white images (32x32 pixels). - Labels: 369 distinct classes. - Splits: The dataset includes 10 pre-defined folds. This implementation uses ‘Fold 1’ as the standard train/test split.

BUILDER_CONFIGS: list = [BuilderConfig(name='fold-1', version=Version('1.0.0'), description='HASYv2 dataset using fold 1 as the test set.'), BuilderConfig(name='fold-2', version=Version('1.0.0'), description='HASYv2 dataset using fold 2 as the test set.'), BuilderConfig(name='fold-3', version=Version('1.0.0'), description='HASYv2 dataset using fold 3 as the test set.'), BuilderConfig(name='fold-4', version=Version('1.0.0'), description='HASYv2 dataset using fold 4 as the test set.'), BuilderConfig(name='fold-5', version=Version('1.0.0'), description='HASYv2 dataset using fold 5 as the test set.'), BuilderConfig(name='fold-6', version=Version('1.0.0'), description='HASYv2 dataset using fold 6 as the test set.'), BuilderConfig(name='fold-7', version=Version('1.0.0'), description='HASYv2 dataset using fold 7 as the test set.'), BuilderConfig(name='fold-8', version=Version('1.0.0'), description='HASYv2 dataset using fold 8 as the test set.'), BuilderConfig(name='fold-9', version=Version('1.0.0'), description='HASYv2 dataset using fold 9 as the test set.'), BuilderConfig(name='fold-10', version=Version('1.0.0'), description='HASYv2 dataset using fold 10 as the test set.')]
DEFAULT_CONFIG_NAME: str | None = 'fold-1'
SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://github.com/MartinThoma/HASY', assets=mappingproxy({'train': DownloadInfo(url='https://zenodo.org/record/259444/files/HASYv2.tar.bz2?download=1', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='https://zenodo.org/record/259444/files/HASYv2.tar.bz2?download=1', fallbacks=[], checksum=None, filename=None)}), citation='@article{thoma2017hasyv2,\n                         title={The hasyv2 dataset},\n                         author={Thoma, Martin},\n                         journal={arXiv preprint arXiv:1701.08380},\n                         year={2017}}', license='', checksums=None)
VERSION: Version = Version('1.0.0')

stable_datasets.images.imagenet_10 module

class Imagenette(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Imagenette: 10 easily classified classes from ImageNet.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://github.com/fastai/imagenette', assets=mappingproxy({'archive': DownloadInfo(url='https://s3.amazonaws.com/fast-ai-imageclas/imagenette2.tgz', fallbacks=[], checksum=None, filename=None)}), citation='@misc{howard2019imagenette,\n            author={Jeremy Howard},\n            title={Imagenette: A smaller subset of 10 easily classified classes from ImageNet},\n            year={2019},\n            url={https://github.com/fastai/imagenette}\n        }', license='', checksums=None)
VERSION: Version = Version('2.0.0')

stable_datasets.images.imagenet_100 module

class ImageNet100(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: _ImageNetArchiveMixin, BaseDatasetBuilder

ImageNet-100 built by taking the first 100 class TARs from ImageNet-1K train archive.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://www.image-net.org/challenges/LSVRC/2012/', assets=mappingproxy({'train': DownloadInfo(url='https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_train.tar', fallbacks=[], checksum=None, filename=None)}), citation='@article{deng2009imagenet,\n        title={ImageNet: A large-scale hierarchical image database},\n        author={Deng, Jia and others},\n        journal={CVPR},\n        year={2009}\n    }', license='', checksums=None)
VERSION: Version = Version('2.0.0')

stable_datasets.images.imagenet_1k module

class ImageNet1K(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: _ImageNetArchiveMixin, BaseDatasetBuilder

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://www.image-net.org/challenges/LSVRC/2012/', assets=mappingproxy({'train': DownloadInfo(url='https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_train.tar', fallbacks=[], checksum=None, filename=None)}), citation='@article{deng2009imagenet,\n        title={ImageNet: A large-scale hierarchical image database},\n        author={Deng, Jia and others},\n        journal={CVPR},\n        year={2009}\n    }', license='', checksums=None)
VERSION: Version = Version('2.0.0')

stable_datasets.images.imagenette module

class Imagenette(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Imagenette (ImageNet-10) from FastAI’s public tarball.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://github.com/fastai/imagenette', assets=mappingproxy({'archive': DownloadInfo(url='https://s3.amazonaws.com/fast-ai-imageclas/imagenette2.tgz', fallbacks=[], checksum=None, filename=None)}), citation='@misc{howard2019imagenette,\n            author={Jeremy Howard},\n            title={Imagenette: A smaller subset of 10 easily classified classes from ImageNet},\n            year={2019},\n            url={https://github.com/fastai/imagenette}\n        }', license='', checksums=None)
VERSION: Version = Version('2.0.0')

stable_datasets.images.k_mnist module

class KMNIST(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Image classification. The Kuzushiji-MNIST dataset consists of 70,000 28x28 grayscale images of 10 classes of Kuzushiji (cursive Japanese) characters, with 7,000 images per class. There are 60,000 training images and 10,000 test images. Kuzushiji-MNIST is a drop-in replacement for the MNIST dataset, providing a more challenging alternative for benchmarking machine learning algorithms.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='http://codh.rois.ac.jp/kmnist/', assets=mappingproxy({'train': DownloadInfo(url='https://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-train-imgs.npz', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='https://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-test-imgs.npz', fallbacks=[], checksum=None, filename=None)}), citation='@online{clanuwat2018deep,\n                         author       = {Tarin Clanuwat and Mikel Bober-Irizar and Asanobu Kitamoto and Alex Lamb and Kazuaki Yamamoto and David Ha},\n                         title        = {Deep Learning for Classical Japanese Literature},\n                         date         = {2018-12-03},\n                         year         = {2018},\n                         eprintclass  = {cs.CV},\n                         eprinttype   = {arXiv},\n                         eprint       = {cs.CV/1812.01718}}', license='', checksums=None)
VERSION: Version = Version('1.0.0')

stable_datasets.images.linnaeus5 module

class Linnaeus5(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Linnaeus 5 Dataset

Abstract The Linnaeus 5 dataset contains 1,600 RGB images sized 256x256 pixels, categorized into 5 classes: berry, bird, dog, flower, and other (negative set). It was created to benchmark fine-grained classification and object recognition tasks.

Context While many datasets focus on broad object categories (like CIFAR-10), Linnaeus 5 offers a focused challenge on specific natural objects plus a “negative” class (‘other’). It serves as a good middle-ground benchmark between simple digit recognition (MNIST) and large-scale natural image classification (ImageNet).

Content The dataset consists of: - Images: 8,000 color images (256x256 pixels). - Classes: 5 categories (berry, bird, dog, flower, other). - Splits: Pre-split into Training (1,200 images per class) and Test (400 images per class).

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='http://chaladze.com/l5/', assets=mappingproxy({'train': DownloadInfo(url='http://chaladze.com/l5/img/Linnaeus%205%20256X256.rar', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='http://chaladze.com/l5/img/Linnaeus%205%20256X256.rar', fallbacks=[], checksum=None, filename=None)}), citation='@article{chaladze2017linnaeus,\n                      title={Linnaeus 5 dataset for machine learning},\n                      author={Chaladze, G and Kalatozishvili, L},\n                      journal={chaladze.com},\n                      year={2017}}', license='', checksums=None)
VERSION: Version = Version('1.0.0')

stable_datasets.images.med_mnist module

class MedMNIST(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

MedMNIST, a large-scale MNIST-like collection of standardized biomedical images, including 12 datasets for 2D and 6 datasets for 3D.

BUILDER_CONFIGS: list = [MedMNISTConfig(name='pathmnist', version=Version('1.0.0'), description='MedMNIST PathMNIST (2D)'), MedMNISTConfig(name='chestmnist', version=Version('1.0.0'), description='MedMNIST ChestMNIST (2D, multi-label)'), MedMNISTConfig(name='dermamnist', version=Version('1.0.0'), description='MedMNIST DermaMNIST (2D)'), MedMNISTConfig(name='octmnist', version=Version('1.0.0'), description='MedMNIST OCTMNIST (2D)'), MedMNISTConfig(name='pneumoniamnist', version=Version('1.0.0'), description='MedMNIST PneumoniaMNIST (2D)'), MedMNISTConfig(name='retinamnist', version=Version('1.0.0'), description='MedMNIST RetinaMNIST (2D)'), MedMNISTConfig(name='breastmnist', version=Version('1.0.0'), description='MedMNIST BreastMNIST (2D)'), MedMNISTConfig(name='bloodmnist', version=Version('1.0.0'), description='MedMNIST BloodMNIST (2D)'), MedMNISTConfig(name='tissuemnist', version=Version('1.0.0'), description='MedMNIST TissueMNIST (2D)'), MedMNISTConfig(name='organamnist', version=Version('1.0.0'), description='MedMNIST OrganAMNIST (2D)'), MedMNISTConfig(name='organcmnist', version=Version('1.0.0'), description='MedMNIST OrganCMNIST (2D)'), MedMNISTConfig(name='organsmnist', version=Version('1.0.0'), description='MedMNIST OrganSMNIST (2D)'), MedMNISTConfig(name='organmnist3d', version=Version('1.0.0'), description='MedMNIST OrganMNIST3D (3D)'), MedMNISTConfig(name='nodulemnist3d', version=Version('1.0.0'), description='MedMNIST NoduleMNIST3D (3D)'), MedMNISTConfig(name='adrenalmnist3d', version=Version('1.0.0'), description='MedMNIST AdrenalMNIST3D (3D)'), MedMNISTConfig(name='fracturemnist3d', version=Version('1.0.0'), description='MedMNIST FractureMNIST3D (3D)'), MedMNISTConfig(name='vesselmnist3d', version=Version('1.0.0'), description='MedMNIST VesselMNIST3D (3D)'), MedMNISTConfig(name='synapsemnist3d', version=Version('1.0.0'), description='MedMNIST SynapseMNIST3D (3D)')]
VERSION: Version = Version('1.0.0')
class MedMNISTConfig(*, num_classes: int, is_3d: bool = False, multi_label: bool = False, **kwargs)[source]

Bases: BuilderConfig

BuilderConfig with per-variant metadata used by MedMNIST._info().

stable_datasets.images.mnist module

class MNIST(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

MNIST Dataset using raw IDX files for digit classification.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='http://yann.lecun.com/exdb/mnist/', assets=mappingproxy({'train_images': DownloadInfo(url='https://storage.googleapis.com/cvdf-datasets/mnist/train-images-idx3-ubyte.gz', fallbacks=[], checksum=None, filename=None), 'train_labels': DownloadInfo(url='https://storage.googleapis.com/cvdf-datasets/mnist/train-labels-idx1-ubyte.gz', fallbacks=[], checksum=None, filename=None), 'test_images': DownloadInfo(url='https://storage.googleapis.com/cvdf-datasets/mnist/t10k-images-idx3-ubyte.gz', fallbacks=[], checksum=None, filename=None), 'test_labels': DownloadInfo(url='https://storage.googleapis.com/cvdf-datasets/mnist/t10k-labels-idx1-ubyte.gz', fallbacks=[], checksum=None, filename=None)}), citation='@misc{lecun1998mnist,\n                          author={Yann LeCun and Corinna Cortes and Christopher J.C. Burges},\n                          title={The MNIST database of handwritten digits},\n                          year={1998},\n                          url={http://yann.lecun.com/exdb/mnist/}\n                        }', license='', checksums=None)
VERSION: Version = Version('1.0.0')

stable_datasets.images.not_mnist module

class NotMNIST(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

NotMNIST Dataset that contains images of letters A-J.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html', assets=mappingproxy({'train_images': DownloadInfo(url='https://github.com/davidflanagan/notMNIST-to-MNIST/raw/refs/heads/master/train-images-idx3-ubyte.gz', fallbacks=[], checksum=None, filename=None), 'train_labels': DownloadInfo(url='https://github.com/davidflanagan/notMNIST-to-MNIST/raw/refs/heads/master/train-labels-idx1-ubyte.gz', fallbacks=[], checksum=None, filename=None), 'test_images': DownloadInfo(url='https://github.com/davidflanagan/notMNIST-to-MNIST/raw/refs/heads/master/t10k-images-idx3-ubyte.gz', fallbacks=[], checksum=None, filename=None), 'test_labels': DownloadInfo(url='https://github.com/davidflanagan/notMNIST-to-MNIST/raw/refs/heads/master/t10k-labels-idx1-ubyte.gz', fallbacks=[], checksum=None, filename=None)}), citation='@misc{bulatov2011notmnist,\n                          author={Yaroslav Bulatov},\n                          title={notMNIST dataset},\n                          year={2011},\n                          url={http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html}\n                        }', license='', checksums=None)
VERSION: Version = Version('1.0.0')

stable_datasets.images.patch_camelyon module

PatchCamelyon dataset builder stub.

class PatchCamelyon(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://github.com/basveeling/pcam', assets=mappingproxy({}), citation='TBD', license='', checksums=None)
VERSION: Version = Version('0.0.0')

stable_datasets.images.places365_small module

class Places365Small(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

The Places365-Standard dataset (small version) for image classification.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='http://places2.csail.mit.edu/', assets=mappingproxy({'train': DownloadInfo(url='http://data.csail.mit.edu/places/places365/train_256_places365standard.tar', fallbacks=[], checksum=None, filename=None), 'val': DownloadInfo(url='http://data.csail.mit.edu/places/places365/val_256.tar', fallbacks=[], checksum=None, filename=None), 'devkit': DownloadInfo(url='http://data.csail.mit.edu/places/places365/filelist_places365-standard.tar', fallbacks=[], checksum=None, filename=None)}), citation='@article{zhou2017places,\n                         title={Places: A 10 million Image Database for Scene Recognition},\n                         author={Zhou, Bolei and Lapedriza, Agata and Khosla, Aditya and Oliva, Aude and Torralba, Antonio},\n                         year={2017}}\n            ', license='', checksums=None)
VERSION: Version = Version('1.0.0')
static extract_train_class(input_string)[source]

stable_datasets.images.rock_paper_scissor module

class RockPaperScissor(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Rock Paper Scissors dataset.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://laurencemoroney.com/datasets.html', assets=mappingproxy({'train': DownloadInfo(url='https://storage.googleapis.com/download.tensorflow.org/data/rps.zip', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='https://storage.googleapis.com/download.tensorflow.org/data/rps-test-set.zip', fallbacks=[], checksum=None, filename=None)}), citation='@misc{laurence2019rock,\n                         title={Rock Paper Scissors Dataset},\n                         author={Laurence Moroney},\n                         year={2019},\n                         url={https://laurencemoroney.com/datasets.html}}', license='CC By 2.0', checksums=None)
VERSION: Version = Version('1.0.0')

stable_datasets.images.shapes3d module

class Shapes3D(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Shapes3D dataset: 10x10x10x8x4x15 factor combinations, 64x64 RGB images.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://github.com/google-deepmind/3dshapes-dataset/', assets=mappingproxy({'train': DownloadInfo(url='https://huggingface.co/datasets/randall-lab/shapes3d/resolve/main/shapes3d.npz', fallbacks=[], checksum=None, filename=None)}), citation='@InProceedings{pmlr-v80-kim18b,\n  title = {Disentangling by Factorising},\n  author = {Kim, Hyunjik and Mnih, Andriy},\n  booktitle = {Proceedings of the 35th International Conference on Machine Learning},\n  pages = {2649--2658},\n  year = {2018},\n  editor = {Dy, Jennifer and Krause, Andreas},\n  volume = {80},\n  series = {Proceedings of Machine Learning Research},\n  month = {10--15 Jul},\n  publisher = {PMLR},\n  pdf = {http://proceedings.mlr.press/v80/kim18b/kim18b.pdf},\n  url = {https://proceedings.mlr.press/v80/kim18b.html}\n}', license='apache-2.0', checksums=None)
VERSION: Version = Version('1.0.0')

stable_datasets.images.small_norb module

class SmallNORB(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

SmallNORB dataset: 96x96 stereo images with 5 known factors.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://cs.nyu.edu/~ylclab/data/norb-v1.0-small/', assets=mappingproxy({'train': DownloadInfo(url='https://huggingface.co/datasets/randall-lab/small-norb/resolve/main/smallnorb-train.zip', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='https://huggingface.co/datasets/randall-lab/small-norb/resolve/main/smallnorb-test.zip', fallbacks=[], checksum=None, filename=None)}), citation='@inproceedings{lecun2004learning,\n  title={Learning methods for generic object recognition with invariance to pose and lighting},\n  author={LeCun, Yann and Huang, Fu Jie and Bottou, Leon},\n  booktitle={Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004.},\n  volume={2},\n  pages={II--104},\n  year={2004},\n  organization={IEEE}\n}', license='Apache-2.0', checksums=None)
VERSION: Version = Version('1.0.0')

stable_datasets.images.stl10 module

class STL10(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

STL-10 Dataset

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://cs.stanford.edu/~acoates/stl10/', assets=mappingproxy({'train': DownloadInfo(url='https://cs.stanford.edu/~acoates/stl10/stl10_binary.tar.gz', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='https://cs.stanford.edu/~acoates/stl10/stl10_binary.tar.gz', fallbacks=[], checksum=None, filename=None), 'unlabeled': DownloadInfo(url='https://cs.stanford.edu/~acoates/stl10/stl10_binary.tar.gz', fallbacks=[], checksum=None, filename=None)}), citation='@article{coates2011analysis,\n                        title={An analysis of single-layer networks in unsupervised feature learning},\n                        author={Coates, Adam and Ng, Andrew Y},\n                        journal={AISTATS},\n                        year={2011}}', license='', checksums=None)
VERSION: Version = Version('1.0.0')

stable_datasets.images.svhn module

class SVHN(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

SVHN (Street View House Numbers) Dataset for image classification.

SVHN is a real-world image dataset for developing machine learning and object recognition algorithms with minimal requirement on data preprocessing and formatting. It can be seen as similar in flavor to MNIST, but incorporates an order of magnitude more labeled data (over 600,000 digit images) and comes from a significantly harder, unsolved, real world problem (recognizing digits and numbers in natural scene images). SVHN is obtained from house numbers in Google Street View images.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='http://ufldl.stanford.edu/housenumbers/', assets=mappingproxy({'train': DownloadInfo(url='http://ufldl.stanford.edu/housenumbers/train_32x32.mat', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='http://ufldl.stanford.edu/housenumbers/test_32x32.mat', fallbacks=[], checksum=None, filename=None), 'extra': DownloadInfo(url='http://ufldl.stanford.edu/housenumbers/extra_32x32.mat', fallbacks=[], checksum=None, filename=None)}), citation='@inproceedings{netzer2011reading,\n                          title={Reading digits in natural images with unsupervised feature learning},\n                          author={Netzer, Yuval and Wang, Tao and Coates, Adam and Bissacco, Alessandro and Wu, Baolin and Ng, Andrew Y and others},\n                          booktitle={NIPS workshop on deep learning and unsupervised feature learning},\n                          volume={2011},\n                          number={2},\n                          pages={4},\n                          year={2011},\n                          organization={Granada}\n                        }', license='', checksums=None)
VERSION: Version = Version('1.0.0')

stable_datasets.images.tiny_imagenet module

class TinyImagenet(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Tiny ImageNet dataset for image classification tasks. It contains 200 classes with 500 training images, 50 validation images, and 50 test images per class.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://www.kaggle.com/c/tiny-imagenet', assets=mappingproxy({'train': DownloadInfo(url='http://cs231n.stanford.edu/tiny-imagenet-200.zip', fallbacks=[], checksum=None, filename=None), 'validation': DownloadInfo(url='http://cs231n.stanford.edu/tiny-imagenet-200.zip', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='http://cs231n.stanford.edu/tiny-imagenet-200.zip', fallbacks=[], checksum=None, filename=None)}), citation='@inproceedings{Le2015TinyIV,\n                          title={Tiny ImageNet Visual Recognition Challenge},\n                          author={Ya Le and Xuan S. Yang},\n                          year={2015}\n                        }', license='MIT License', checksums=None)
VERSION: Version = Version('1.0.0')

stable_datasets.images.tiny_imagenet_c module

class TinyImagenetC(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Tiny ImageNet-C dataset for image classification tasks with corruptions applied.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://zenodo.org/records/2536630', assets=mappingproxy({'test': DownloadInfo(url='https://zenodo.org/records/2536630/files/Tiny-ImageNet-C.tar?download=1', fallbacks=[], checksum=None, filename=None)}), citation='@article{hendrycks2019robustness,\n                        title={Benchmarking Neural Network Robustness to Common Corruptions and Perturbations},\n                        author={Dan Hendrycks and Thomas Dietterich},\n                        journal={Proceedings of the International Conference on Learning Representations},\n                        year={2019}}', license='CC BY 4.0', checksums=None)
VERSION: Version = Version('1.0.0')

Module contents

class AWA2(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

The Animals with Attributes 2 (AwA2) dataset provides images across 50 animal classes, useful for attribute-based classification and zero-shot learning research. See https://cvml.ista.ac.at/AwA2/ for more information.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://cvml.ista.ac.at/AwA2/', assets=mappingproxy({'train': DownloadInfo(url='https://cvml.ista.ac.at/AwA2/AwA2-data.zip', fallbacks=[], checksum=None, filename=None)}), citation='@ARTICLE{8413121,\n                         author={Xian, Yongqin and Lampert, Christoph H. and Schiele, Bernt and Akata, Zeynep},\n                         journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},\n                         title={Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly},\n                         year={2019},\n                         volume={41},\n                         number={9},\n                         pages={2251-2265},\n                         keywords={Semantics;Visualization;Task analysis;Training;Fish;Protocols;Learning systems;Generalized zero-shot learning;transductive learning;image classification;weakly-supervised learning},\n                         doi={10.1109/TPAMI.2018.2857768}}', license='', checksums=None)
VERSION: Version = Version('1.0.0')
class ArabicCharacters(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Arabic Handwritten Characters Dataset

Abstract Handwritten Arabic character recognition systems face several challenges, including the unlimited variation in human handwriting and large public databases. In this work, we model a deep learning architecture that can be effectively apply to recognizing Arabic handwritten characters. A Convolutional Neural Network (CNN) is a special type of feed-forward multilayer trained in supervised mode. The CNN trained and tested our database that contain 16800 of handwritten Arabic characters. In this paper, the optimization methods implemented to increase the performance of CNN. Common machine learning methods usually apply a combination of feature extractor and trainable classifier. The use of CNN leads to significant improvements across different machine-learning classification algorithms. Our proposed CNN is giving an average 5.1% misclassification error on testing data.

Context The motivation of this study is to use cross knowledge learned from multiple works to enhancement the performance of Arabic handwritten character recognition. In recent years, Arabic handwritten characters recognition with different handwriting styles as well, making it important to find and work on a new and advanced solution for handwriting recognition. A deep learning systems needs a huge number of data (images) to be able to make a good decisions.

Content The data-set is composed of 16,800 characters written by 60 participants, the age range is between 19 to 40 years, and 90% of participants are right-hand. Each participant wrote each character (from ’alef’ to ’yeh’) ten times on two forms as shown in Fig. 7(a) & 7(b). The forms were scanned at the resolution of 300 dpi. Each block is segmented automatically using Matlab 2016a to determining the coordinates for each block. The database is partitioned into two sets: a training set (13,440 characters to 480 images per class) and a test set (3,360 characters to 120 images per class). Writers of training set and test set are exclusive. Ordering of including writers to test set are randomized to make sure that writers of test set are not from a single institution (to ensure variability of the test set).

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://github.com/mloey/Arabic-Handwritten-Characters-Dataset', assets=mappingproxy({'train': DownloadInfo(url='https://github.com/mloey/Arabic-Handwritten-Characters-Dataset/raw/master/Train%20Images%2013440x32x32.zip', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='https://github.com/mloey/Arabic-Handwritten-Characters-Dataset/raw/master/Test%20Images%203360x32x32.zip', fallbacks=[], checksum=None, filename=None)}), citation='@article{el2017arabic,\n                        title={Arabic handwritten characters recognition using convolutional neural network},\n                        author={El-Sawy, Ahmed and Loey, Mohamed and El-Bakry, Hazem},\n                        journal={WSEAS Transactions on Computer Research},\n                        volume={5},\n                        pages={11--19},\n                        year={2017}}', license='', checksums=None)
VERSION: Version = Version('1.0.0')
class ArabicDigits(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Arabic Handwritten Digits Dataset.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://github.com/mloey/Arabic-Handwritten-Digits-Dataset', assets=mappingproxy({'train': DownloadInfo(url='https://raw.githubusercontent.com/mloey/Arabic-Handwritten-Digits-Dataset/master/Arabic%20Handwritten%20Digits%20Dataset%20CSV.zip', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='https://raw.githubusercontent.com/mloey/Arabic-Handwritten-Digits-Dataset/master/Arabic%20Handwritten%20Digits%20Dataset%20CSV.zip', fallbacks=[], checksum=None, filename=None)}), citation='@inproceedings{el2016cnn,\n                        title={CNN for handwritten arabic digits recognition based on LeNet-5},\n                        author={El-Sawy, Ahmed and Hazem, EL-Bakry and Loey, Mohamed},\n                        booktitle={International conference on advanced intelligent systems and informatics},\n                        pages={566--575},\n                        year={2016},\n                        organization={Springer}\n                        }', license='', checksums=None)
VERSION: Version = Version('1.0.0')
class Beans(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Bean disease dataset for classification of three classes: Angular Leaf Spot, Bean Rust, and Healthy leaves.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://github.com/AI-Lab-Makerere/ibean/', assets=mappingproxy({'train': DownloadInfo(url='https://storage.googleapis.com/ibeans/train.zip', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='https://storage.googleapis.com/ibeans/test.zip', fallbacks=[], checksum=None, filename=None), 'validation': DownloadInfo(url='https://storage.googleapis.com/ibeans/validation.zip', fallbacks=[], checksum=None, filename=None)}), citation='@misc{makerere2020beans,\n                         author = "{Makerere AI Lab}",\n                         title = "{Bean Disease Dataset}",\n                         year = "2020",\n                         month = "January",\n                         url = "https://github.com/AI-Lab-Makerere/ibean/"}', license='', checksums=None)
VERSION: Version = Version('1.0.0')
class CIFAR10(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Image classification. The `CIFAR-10 < https: // www.cs.toronto.edu/~kriz/cifar.html >`_ dataset was collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. It consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://www.cs.toronto.edu/~kriz/cifar.html', assets=mappingproxy({'train': DownloadInfo(url='https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz', fallbacks=[], checksum=None, filename=None)}), citation='@article{krizhevsky2009learning,\n                         title={Learning multiple layers of features from tiny images},\n                         author={Krizhevsky, Alex and Hinton, Geoffrey and others},\n                         year={2009},\n                         publisher={Toronto, ON, Canada}}', license='', checksums=None)
VERSION: Version = Version('1.0.0')
class CIFAR100(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

CIFAR-100 dataset, a variant of CIFAR-10 with 100 classes.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://www.cs.toronto.edu/~kriz/cifar.html', assets=mappingproxy({'train': DownloadInfo(url='https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz', fallbacks=[], checksum=None, filename=None)}), citation='@article{krizhevsky2009learning,\n                         title={Learning multiple layers of features from tiny images},\n                         author={Krizhevsky, Alex and Hinton, Geoffrey and others},\n                         year={2009},\n                         publisher={Toronto, ON, Canada}}', license='', checksums=None)
VERSION: Version = Version('1.0.0')
class CIFAR100C(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

CIFAR-100-C dataset with corrupted CIFAR-100 images.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://zenodo.org/records/3555552', assets=mappingproxy({'test': DownloadInfo(url='https://zenodo.org/records/3555552/files/CIFAR-100-C.tar?download=1', fallbacks=[], checksum=None, filename=None)}), citation='@article{hendrycks2019robustness,\n                        title={Benchmarking Neural Network Robustness to Common Corruptions and Perturbations},\n                        author={Dan Hendrycks and Thomas Dietterich},\n                        journal={Proceedings of the International Conference on Learning Representations},\n                        year={2019}}', license='', checksums=None)
VERSION: Version = Version('1.0.0')
class CIFAR10C(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

CIFAR-10-C dataset with corrupted CIFAR-10 images.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://zenodo.org/records/2535967', assets=mappingproxy({'test': DownloadInfo(url='https://zenodo.org/records/2535967/files/CIFAR-10-C.tar?download=1', fallbacks=[], checksum=None, filename=None)}), citation='@article{hendrycks2019robustness,\n                        title={Benchmarking Neural Network Robustness to Common Corruptions and Perturbations},\n                        author={Dan Hendrycks and Thomas Dietterich},\n                        journal={Proceedings of the International Conference on Learning Representations},\n                        year={2019}}', license='', checksums=None)
VERSION: Version = Version('1.0.0')
class CLEVRER(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

CLEVRER: CoLlision Events for Video REpresentation and Reasoning.

A diagnostic video dataset for systematic evaluation of computational models on a wide range of reasoning tasks. The dataset includes four types of questions: descriptive (e.g., “what color”), explanatory (“what’s responsible for”), predictive (“what will happen next”), and counterfactual (“what if”).

The dataset contains 20,000 synthetic videos of moving and colliding objects. Each video is 5 seconds long and contains 128 frames with resolution 480 x 320.

Splits:
  • train: 10,000 videos (index 0 - 9999)

  • validation: 5,000 videos (index 10000 - 14999)

  • test: 5,000 videos (index 15000 - 19999)

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='http://clevrer.csail.mit.edu/', assets=mappingproxy({'train_videos': DownloadInfo(url='http://data.csail.mit.edu/clevrer/videos/train/video_train.zip', fallbacks=[], checksum=None, filename=None), 'train_annotations': DownloadInfo(url='http://data.csail.mit.edu/clevrer/annotations/train/annotation_train.zip', fallbacks=[], checksum=None, filename=None), 'train_questions': DownloadInfo(url='http://data.csail.mit.edu/clevrer/questions/train.json', fallbacks=[], checksum=None, filename=None), 'validation_videos': DownloadInfo(url='http://data.csail.mit.edu/clevrer/videos/validation/video_validation.zip', fallbacks=[], checksum=None, filename=None), 'validation_annotations': DownloadInfo(url='http://data.csail.mit.edu/clevrer/annotations/validation/annotation_validation.zip', fallbacks=[], checksum=None, filename=None), 'validation_questions': DownloadInfo(url='http://data.csail.mit.edu/clevrer/questions/validation.json', fallbacks=[], checksum=None, filename=None), 'test_videos': DownloadInfo(url='http://data.csail.mit.edu/clevrer/videos/test/video_test.zip', fallbacks=[], checksum=None, filename=None), 'test_questions': DownloadInfo(url='http://data.csail.mit.edu/clevrer/questions/test.json', fallbacks=[], checksum=None, filename=None)}), citation='@inproceedings{yi2020clevrer,\n            title={CLEVRER: CoLlision Events for Video REpresentation and Reasoning},\n            author={Yi, Kexin and Gan, Chuang and Li, Yunzhu and Kohli, Pushmeet and Wu, Jiajun and Torralba, Antonio and Tenenbaum, Joshua B},\n            booktitle={International Conference on Learning Representations},\n            year={2020}\n        }', license='', checksums=None)
VERSION: Version = Version('1.0.0')
class CUB200(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Caltech-UCSD Birds-200-2011 (CUB-200-2011) Dataset

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://www.vision.caltech.edu/datasets/cub_200_2011/', assets=mappingproxy({'train': DownloadInfo(url='https://data.caltech.edu/records/65de6-vp158/files/CUB_200_2011.tgz?download=1', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='https://data.caltech.edu/records/65de6-vp158/files/CUB_200_2011.tgz?download=1', fallbacks=[], checksum=None, filename=None)}), citation='@techreport{WahCUB_200_2011,\n                        Title = {The Caltech-UCSD Birds-200-2011 Dataset},\n                        Author = {Wah, C. and Branson, S. and Welinder, P. and Perona, P. and Belongie, S.},\n                        Year = {2011},\n                        Institution = {California Institute of Technology},\n                        Number = {CNS-TR-2011-001}}', license='', checksums=None)
VERSION: Version = Version('1.0.0')
class Cars196(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Cars-196 Dataset The Cars-196 dataset, also known as the Stanford Cars dataset, is a benchmark dataset for fine-grained visual classification of automobiles. It contains 16,185 color images covering 196 car categories, where each category is defined by a specific combination of make, model, and year. The dataset is split into 8,144 training images and 8,041 test images, with the first 98 classes used exclusively for training and the remaining 98 classes reserved for testing, ensuring that training and test classes are disjoint. Images are collected from real-world scenes and exhibit significant variation in v iewpoint, background, and lighting conditions. Each image is annotated with a class label and a tight bounding box around the car, making the dataset suitable for fine-grained recognition tasks that require precise object localization and strong generalization to unseen categories.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://ai.stanford.edu/~jkrause/cars/car_dataset.html', assets=mappingproxy({'train': DownloadInfo(url='https://huggingface.co/datasets/haodoz0118/cars196-img/resolve/main/cars196_train.zip', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='https://huggingface.co/datasets/haodoz0118/cars196-img/resolve/main/cars196_test.zip', fallbacks=[], checksum=None, filename=None)}), citation='@inproceedings{krause20133d,\n            title={3d object representations for fine-grained categorization},\n            author={Krause, Jonathan and Stark, Michael and Deng, Jia and Fei-Fei, Li},\n            booktitle={Proceedings of the IEEE international conference on computer vision workshops},\n            pages={554--561},\n            year={2013}}', license='', checksums=None)
VERSION: Version = Version('1.0.0')
class Cars3D[source]

Bases: BaseDatasetBuilder

183 car types x 24 azimuth angles x 4 elevation angles.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://github.com/google-research/disentanglement_lib/tree/master', assets=mappingproxy({'train': DownloadInfo(url='http://www.scottreed.info/files/nips2015-analogy-data.tar.gz', fallbacks=[], checksum=None, filename=None)}), citation='@inproceedings{locatello2019challenging,\n  title={Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations},\n  author={Locatello, Francesco and Bauer, Stefan and Lucic, Mario and Raetsch, Gunnar and Gelly, Sylvain and Sch{"o}lkopf, Bernhard and Bachem, Olivier},\n  booktitle={International Conference on Machine Learning},\n  pages={4114--4124},\n  year={2019}\n}', license='Apache-2.0', checksums=None)
VERSION: Version = Version('1.0.0')
class Country211(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Country211: Image Classification Dataset for Geolocation. This dataset uses a subset of the YFCC100M dataset, filtered by GPS coordinates to include images labeled with ISO-3166 country codes. Each country has a balanced sample of images for training, validation, and testing.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://github.com/openai/CLIP/blob/main/data/country211.md', assets=mappingproxy({'train': DownloadInfo(url='https://openaipublic.azureedge.net/clip/data/country211.tgz', fallbacks=[], checksum=None, filename=None), 'valid': DownloadInfo(url='https://openaipublic.azureedge.net/clip/data/country211.tgz', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='https://openaipublic.azureedge.net/clip/data/country211.tgz', fallbacks=[], checksum=None, filename=None)}), citation='@inproceedings{radford2021learning,\n                title     = {Learning transferable visual models from natural language supervision},\n                author    = {Radford, Alec and Kim, Jong Wook and Hallacy, Chris and Ramesh, Aditya and Goh, Gabriel and Agarwal, Sandhini and Sastry, Girish and Askell, Amanda and Mishkin, Pamela and Clark, Jack and others},\n                booktitle = {International conference on machine learning},\n                pages     = {8748--8763},\n                year      = {2021},\n                organization = {PmLR} }\n        ', license='', checksums=None)
VERSION: Version = Version('1.0.0')
class DSprites[source]

Bases: BaseDatasetBuilder

dSprites is a dataset of 2D shapes procedurally generated from 6 ground truth independent latent factors. These factors are color, shape, scale, rotation, x and y positions of a sprite.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://github.com/deepmind/dsprites-dataset', assets=mappingproxy({'train': DownloadInfo(url='https://github.com/google-deepmind/dsprites-dataset/raw/refs/heads/master/dsprites_ndarray_co1sh3sc6or40x32y32_64x64.npz', fallbacks=[], checksum=None, filename=None)}), citation='@inproceedings{higgins2017beta,\n                    title={beta-vae: Learning basic visual concepts with a constrained variational framework},\n                    author={Higgins, Irina and Matthey, Loic and Pal, Arka and Burgess, Christopher and Glorot, Xavier and Botvinick, Matthew and Mohamed, Shakir and Lerchner, Alexander},\n                    booktitle={International conference on learning representations},\n                    year={2017}', license='', checksums=None)
VERSION: Version = Version('1.0.0')
class DSpritesColor(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

DSprites dSprites is a dataset of 2D shapes procedurally generated from 6 ground truth independent latent factors. These factors are color, shape, scale, rotation, x and y positions of a sprite.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://github.com/deepmind/dsprites-dataset', assets=mappingproxy({'train': DownloadInfo(url='https://github.com/google-deepmind/dsprites-dataset/raw/refs/heads/master/dsprites_ndarray_co1sh3sc6or40x32y32_64x64.npz', fallbacks=[], checksum=None, filename=None)}), citation='@inproceedings{locatello2019challenging,\n                    title={Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations},\n                    author={Locatello, Francesco and Bauer, Stefan and Lucic, Mario and Raetsch, Gunnar and Gelly, Sylvain and Sch{"o}lkopf, Bernhard and Bachem, Olivier},\n                    booktitle={International Conference on Machine Learning},\n                    pages={4114--4124},\n                    year={2019}\n                    }', license='', checksums=None)
VERSION: Version = Version('1.0.0')
class DSpritesNoise(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

DSprites dSprites is a dataset of 2D shapes procedurally generated from 6 ground truth independent latent factors. These factors are color, shape, scale, rotation, x and y positions of a sprite.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://github.com/deepmind/dsprites-dataset', assets=mappingproxy({'train': DownloadInfo(url='https://github.com/google-deepmind/dsprites-dataset/raw/refs/heads/master/dsprites_ndarray_co1sh3sc6or40x32y32_64x64.npz', fallbacks=[], checksum=None, filename=None)}), citation='@inproceedings{locatello2019challenging,\n                    title={Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations},\n                    author={Locatello, Francesco and Bauer, Stefan and Lucic, Mario and Raetsch, Gunnar and Gelly, Sylvain and Sch{"o}lkopf, Bernhard and Bachem, Olivier},\n                    booktitle={International Conference on Machine Learning},\n                    pages={4114--4124},\n                    year={2019}\n                    }', license='', checksums=None)
VERSION: Version = Version('1.0.0')
class DSpritesScream(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

DSprites dSprites is a dataset of 2D shapes procedurally generated from 6 ground truth independent latent factors. These factors are color, shape, scale, rotation, x and y positions of a sprite.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://github.com/deepmind/dsprites-dataset', assets=mappingproxy({'train': DownloadInfo(url='https://github.com/google-deepmind/dsprites-dataset/raw/refs/heads/master/dsprites_ndarray_co1sh3sc6or40x32y32_64x64.npz', fallbacks=[], checksum=None, filename=None)}), citation='@inproceedings{higgins2017beta,\n                    title={beta-vae: Learning basic visual concepts with a constrained variational framework},\n                    author={Higgins, Irina and Matthey, Loic and Pal, Arka and Burgess, Christopher and Glorot, Xavier and Botvinick, Matthew and Mohamed, Shakir and Lerchner, Alexander},\n                    booktitle={International conference on learning representations},\n                    year={2017}', license='', checksums=None)
VERSION: Version = Version('1.0.0')
class DTD(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Describable Textures Dataset (DTD)

DTD is a texture database, consisting of 5640 images, organized according to a list of 47 terms (categories) inspired from human perception. There are 120 images for each category. Image sizes range between 300x300 and 640x640, and the images contain at least 90% of the surface representing the category attribute. The images were collected from Google and Flickr by entering our proposed attributes and related terms as search queries. The images were annotated using Amazon Mechanical Turk in several iterations. For each image we provide key attribute (main category) and a list of joint attributes.

The data is split in three equal parts, in train, validation and test, 40 images per class, for each split. We provide the ground truth annotation for both key and joint attributes, as well as the 10 splits of the data we used for evaluation.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://www.robots.ox.ac.uk/~vgg/data/dtd/', assets=mappingproxy({'train': DownloadInfo(url='https://www.robots.ox.ac.uk/~vgg/data/dtd/download/dtd-r1.0.1.tar.gz', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='https://www.robots.ox.ac.uk/~vgg/data/dtd/download/dtd-r1.0.1.tar.gz', fallbacks=[], checksum=None, filename=None), 'val': DownloadInfo(url='https://www.robots.ox.ac.uk/~vgg/data/dtd/download/dtd-r1.0.1.tar.gz', fallbacks=[], checksum=None, filename=None)}), citation='@InProceedings{cimpoi14describing,\n                    Author    = {M. Cimpoi and S. Maji and I. Kokkinos and S. Mohamed and and A. Vedaldi},\n                    Title     = {Describing Textures in the Wild},\n                    Booktitle = {Proceedings of the {IEEE} Conf. on Computer Vision and Pattern Recognition ({CVPR})},\n                    Year      = {2014}}', license='', checksums=None)
VERSION: Version = Version('1.0.0')
class EMNIST(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

EMNIST (Extended MNIST) Dataset

Abstract EMNIST is a set of handwritten characters derived from the NIST Special Database 19 and converted to a 28x28 pixel format that directly matches the MNIST dataset. It serves as a challenging “drop-in” replacement for MNIST, introducing handwritten letters and a larger variety of writing styles while preserving the original file structure and pixel density.

Context While the original MNIST dataset is considered “solved” by modern architectures, EMNIST restores the challenge by providing a larger, more diverse benchmark. It bridges the gap between simple digit recognition and complex handwriting tasks, offering up to 62 classes (digits + uppercase + lowercase) to test generalization and writer-independent recognition.

Content The dataset contains up to 814,255 grayscale images (28x28). It is provided in six split configurations to suit different needs: * ByClass & ByMerge: Full unbalanced sets (up to 62 classes). * Balanced: 131,600 images across 47 classes (ideal for benchmarking). * Letters: 145,600 images across 26 classes (A-Z). * Digits & MNIST: 280,000+ images across 10 classes (0-9).

BUILDER_CONFIGS: list = [EMNISTConfig(name='byclass', version=Version('1.0.0'), description=''), EMNISTConfig(name='bymerge', version=Version('1.0.0'), description=''), EMNISTConfig(name='balanced', version=Version('1.0.0'), description=''), EMNISTConfig(name='letters', version=Version('1.0.0'), description=''), EMNISTConfig(name='digits', version=Version('1.0.0'), description=''), EMNISTConfig(name='mnist', version=Version('1.0.0'), description='')]
SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://www.nist.gov/itl/iad/image-group/emnist-dataset', assets=mappingproxy({'train': DownloadInfo(url='https://biometrics.nist.gov/cs_links/EMNIST/matlab.zip', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='https://biometrics.nist.gov/cs_links/EMNIST/matlab.zip', fallbacks=[], checksum=None, filename=None)}), citation='@misc{cohen2017emnistextensionmnisthandwritten,\n                        title={EMNIST: an extension of MNIST to handwritten letters},\n                        author={Gregory Cohen and Saeed Afshar and Jonathan Tapson and André van Schaik},\n                        year={2017},\n                        eprint={1702.05373},\n                        archivePrefix={arXiv},\n                        primaryClass={cs.CV},\n                        url={https://arxiv.org/abs/1702.05373},\n            }', license='', checksums=None)
VERSION: Version = Version('1.0.0')
class FGVCAircraft(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Fine-Grained Visual Classification of Aircraft (FGVC-Aircraft) Dataset.

FGVC-Aircraft is a benchmark dataset for fine-grained visual categorization of aircraft. The dataset contains 10,000 images of aircraft with 100 different aircraft model variants. Aircraft models are organized in a hierarchical structure with three levels: variant (finest), family, and manufacturer (coarsest).

The dataset is divided into training (3,334 images), validation (3,333 images), and test (3,333 images) subsets. Images are about 1-2MP resolution with a 20-pixel copyright banner at the bottom that is automatically removed during loading.

Usage:

dataset = FGVCAircraft(config_name=”variant”, split=”train”) dataset = FGVCAircraft(config_name=”family”, split=”train”) dataset = FGVCAircraft(config_name=”manufacturer”, split=”train”)

BUILDER_CONFIGS: list = [BuilderConfig(name='variant', version=None, description='100 aircraft model variants (finest granularity)'), BuilderConfig(name='family', version=None, description='70 aircraft families (medium granularity)'), BuilderConfig(name='manufacturer', version=None, description='30 aircraft manufacturers (coarsest granularity)')]
DEFAULT_CONFIG_NAME: str | None = 'variant'
SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/', assets=mappingproxy({'train': DownloadInfo(url='https://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/archives/fgvc-aircraft-2013b.tar.gz', fallbacks=[], checksum=None, filename=None), 'validation': DownloadInfo(url='https://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/archives/fgvc-aircraft-2013b.tar.gz', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='https://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/archives/fgvc-aircraft-2013b.tar.gz', fallbacks=[], checksum=None, filename=None)}), citation='@techreport{maji13fine-grained,\n                        title         = {Fine-Grained Visual Classification of Aircraft},\n                        author        = {S. Maji and J. Kannala and E. Rahtu and M. Blaschko and A. Vedaldi},\n                        year          = {2013},\n                        archivePrefix = {arXiv},\n                        eprint        = {1306.5151},\n                        primaryClass  = "cs.CV",\n                    }', license='Unknown', checksums=None)
VERSION: Version = Version('1.0.0')
class FacePointing(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Head angle classification dataset.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='http://crowley-coutaz.fr/HeadPoseDataSet/', assets=mappingproxy({'train': DownloadInfo(url='http://crowley-coutaz.fr/HeadPoseDataSet/HeadPoseImageDatabase.tar.gz', fallbacks=[], checksum=None, filename=None)}), citation='@inproceedings{gourier2004estimating,\n                         title={Estimating face orientation from robust detection of salient facial features},\n                         author={Gourier, Nicolas and Hall, Daniela and Crowley, James L},\n                         booktitle={ICPR International Workshop on Visual Observation of Deictic Gestures},\n                         year={2004},\n                         organization={Citeseer}}', license='', checksums=None)
VERSION: Version = Version('1.0.0')
class FashionMNIST(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Grayscale image classification.

Fashion-MNIST is a dataset of Zalando’s article images consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://github.com/zalandoresearch/fashion-mnist', assets=mappingproxy({'train': DownloadInfo(url='http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz', fallbacks=[], checksum=None, filename=None)}), citation='@article{xiao2017fashion,\n                         title={Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms},\n                         author={Xiao, Han and Rasul, Kashif and Vollgraf, Roland},\n                         journal={arXiv preprint arXiv:1708.07747},\n                         year={2017}}', license='', checksums=None)
VERSION: Version = Version('1.0.0')
class Flowers102(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Flowers102 Dataset

Abstract The Flowers102 dataset is a fine-grained image classification benchmark consisting of 102 flower categories commonly found in the United Kingdom. It was created to address the challenge of classifying objects with large intra-class variability and small inter-class differences. Each category contains between 40 and 258 images, totaling 8,189 images.

Context Fine-grained visual categorization (FGVC) focuses on differentiating between similar sub-categories of objects (e.g., different species of flowers or birds). Flowers102 serves as a standard benchmark in this domain. Unlike general object recognition (e.g., CIFAR-10), where classes are visually distinct (car vs. dog), Flowers102 requires models to learn subtle features like petal shape, texture, and color patterns.

Content The dataset consists of: - Images: 8,189 images stored in a single archive. - Labels: A MATLAB file mapping each image to one of 102 classes (0-101). - Splits: A predefined split ID file dividing the data into Training (1,020 images), Validation (1,020 images), and Test (6,149 images).

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://www.robots.ox.ac.uk/~vgg/data/flowers/102/', assets=mappingproxy({'images': DownloadInfo(url='https://www.robots.ox.ac.uk/~vgg/data/flowers/102/102flowers.tgz', fallbacks=[], checksum=None, filename=None), 'labels': DownloadInfo(url='https://www.robots.ox.ac.uk/~vgg/data/flowers/102/imagelabels.mat', fallbacks=[], checksum=None, filename=None), 'setid': DownloadInfo(url='https://www.robots.ox.ac.uk/~vgg/data/flowers/102/setid.mat', fallbacks=[], checksum=None, filename=None)}), citation='@inproceedings{nilsback2008flowers102,\n                         title={Automated flower classification over a large number of classes},\n                         author={Nilsback, Maria-Elena and Zisserman, Andrew},\n                         booktitle={2008 Sixth Indian conference on computer vision, graphics \\& image processing},\n                         pages={722--729},\n                         year={2008},\n                         organization={IEEE}}', license='', checksums=None)
VERSION: Version = Version('1.0.0')
class Food101(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Food-101 Dataset.

The Food-101 dataset consists of 101 food categories with 101,000 images. For each class, 250 manually reviewed test images are provided as well as 750 training images.

Split sizes: - train: 75,750 images (750 images × 101 classes) - test: 25,250 images (250 images × 101 classes)

All images are automatically rescaled to have a maximum side length of 512 pixels.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/', assets=mappingproxy({'train': DownloadInfo(url='https://huggingface.co/datasets/haodoz0118/food101-img/resolve/main/food101_train.tar', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='https://huggingface.co/datasets/haodoz0118/food101-img/resolve/main/food101_test.tar', fallbacks=[], checksum=None, filename=None)}), citation='@inproceedings{bossard14,\n            title = {Food-101 -- Mining Discriminative Components with Random Forests},\n            author = {Bossard, Lukas and Guillaumin, Matthieu and Van Gool, Luc},\n            booktitle = {European Conference on Computer Vision},\n            year = {2014}}', license='', checksums=None)
VERSION: Version = Version('1.0.0')
class Galaxy10Decal(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Galaxy10 DECaLS Dataset: Galaxy morphology classification with DECaLS images.

Galaxy10 DECaLS is a much improved version of the original Galaxy10 dataset. It contains 17,736 256x256 pixel colored galaxy images (g, r and z band) separated into 10 classes. The images come from DESI Legacy Imaging Surveys (DECaLS) and labels come from Galaxy Zoo.

The original Galaxy10 dataset was created with Galaxy Zoo (GZ) Data Release 2 where volunteers classify ~270k of SDSS galaxy images. GZ later utilized images from DESI Legacy Imaging Surveys (DECaLS) with much better resolution and image quality. Galaxy10 DECaLS has combined all three (GZ DR2 with DECaLS images instead of SDSS images and DECaLS campaign a/b, c) resulting in ~441k of unique galaxies covered by DECaLS where ~18k of those images were selected in 10 broad classes using volunteer votes with more rigorous filtering.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://astronn.readthedocs.io/en/latest/galaxy10.html', assets=mappingproxy({'train': DownloadInfo(url='https://zenodo.org/records/10845026/files/Galaxy10_DECals.h5', fallbacks=[], checksum=None, filename=None)}), citation='@article{walmsley2020galaxy,\n                        title={Galaxy Zoo: probabilistic morphology through Bayesian CNNs and active learning},\n                        author={Walmsley, Mike and Smith, Lewis and Lintott, Chris and Gal, Yarin and Bamford, Steven and Dickinson, Hugh and Fortson, Lucy and Kruk, Sandor and Masters, Karen and Scarlata, Claudia and others},\n                        journal={Monthly Notices of the Royal Astronomical Society},\n                        volume={491},\n                        number={2},\n                        pages={1554--1574},\n                        year={2020},\n                        publisher={Oxford University Press}\n                    }', license='MIT', checksums=None)
VERSION: Version = Version('1.0.0')
class HASYv2(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

HASYv2 Dataset

Abstract The HASYv2 dataset contains handwritten symbol images of 369 classes. It includes over 168,000 samples categorized into various classes like Latin characters, numerals, and symbols. Each image is 32x32 pixels in size. The dataset was created to benchmark the classification of mathematical symbols and handwritten characters.

Context Recognizing handwritten mathematical symbols is a challenging task due to the similarity between classes (e.g., ‘1’, ‘l’, ‘|’) and the large number of unique symbols used in scientific notation. HASYv2 serves as a standard benchmark for testing classifiers on a large number of classes (369) with low resolution (32x32).

Content The dataset consists of: - Images: 168,236 black-and-white images (32x32 pixels). - Labels: 369 distinct classes. - Splits: The dataset includes 10 pre-defined folds. This implementation uses ‘Fold 1’ as the standard train/test split.

BUILDER_CONFIGS: list = [BuilderConfig(name='fold-1', version=Version('1.0.0'), description='HASYv2 dataset using fold 1 as the test set.'), BuilderConfig(name='fold-2', version=Version('1.0.0'), description='HASYv2 dataset using fold 2 as the test set.'), BuilderConfig(name='fold-3', version=Version('1.0.0'), description='HASYv2 dataset using fold 3 as the test set.'), BuilderConfig(name='fold-4', version=Version('1.0.0'), description='HASYv2 dataset using fold 4 as the test set.'), BuilderConfig(name='fold-5', version=Version('1.0.0'), description='HASYv2 dataset using fold 5 as the test set.'), BuilderConfig(name='fold-6', version=Version('1.0.0'), description='HASYv2 dataset using fold 6 as the test set.'), BuilderConfig(name='fold-7', version=Version('1.0.0'), description='HASYv2 dataset using fold 7 as the test set.'), BuilderConfig(name='fold-8', version=Version('1.0.0'), description='HASYv2 dataset using fold 8 as the test set.'), BuilderConfig(name='fold-9', version=Version('1.0.0'), description='HASYv2 dataset using fold 9 as the test set.'), BuilderConfig(name='fold-10', version=Version('1.0.0'), description='HASYv2 dataset using fold 10 as the test set.')]
DEFAULT_CONFIG_NAME: str | None = 'fold-1'
SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://github.com/MartinThoma/HASY', assets=mappingproxy({'train': DownloadInfo(url='https://zenodo.org/record/259444/files/HASYv2.tar.bz2?download=1', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='https://zenodo.org/record/259444/files/HASYv2.tar.bz2?download=1', fallbacks=[], checksum=None, filename=None)}), citation='@article{thoma2017hasyv2,\n                         title={The hasyv2 dataset},\n                         author={Thoma, Martin},\n                         journal={arXiv preprint arXiv:1701.08380},\n                         year={2017}}', license='', checksums=None)
VERSION: Version = Version('1.0.0')
class ImageNet100(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: _ImageNetArchiveMixin, BaseDatasetBuilder

ImageNet-100 built by taking the first 100 class TARs from ImageNet-1K train archive.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://www.image-net.org/challenges/LSVRC/2012/', assets=mappingproxy({'train': DownloadInfo(url='https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_train.tar', fallbacks=[], checksum=None, filename=None)}), citation='@article{deng2009imagenet,\n        title={ImageNet: A large-scale hierarchical image database},\n        author={Deng, Jia and others},\n        journal={CVPR},\n        year={2009}\n    }', license='', checksums=None)
VERSION: Version = Version('2.0.0')
class ImageNet1K(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: _ImageNetArchiveMixin, BaseDatasetBuilder

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://www.image-net.org/challenges/LSVRC/2012/', assets=mappingproxy({'train': DownloadInfo(url='https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_train.tar', fallbacks=[], checksum=None, filename=None)}), citation='@article{deng2009imagenet,\n        title={ImageNet: A large-scale hierarchical image database},\n        author={Deng, Jia and others},\n        journal={CVPR},\n        year={2009}\n    }', license='', checksums=None)
VERSION: Version = Version('2.0.0')
class Imagenette(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Imagenette: 10 easily classified classes from ImageNet.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://github.com/fastai/imagenette', assets=mappingproxy({'archive': DownloadInfo(url='https://s3.amazonaws.com/fast-ai-imageclas/imagenette2.tgz', fallbacks=[], checksum=None, filename=None)}), citation='@misc{howard2019imagenette,\n            author={Jeremy Howard},\n            title={Imagenette: A smaller subset of 10 easily classified classes from ImageNet},\n            year={2019},\n            url={https://github.com/fastai/imagenette}\n        }', license='', checksums=None)
VERSION: Version = Version('2.0.0')
class KMNIST(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Image classification. The Kuzushiji-MNIST dataset consists of 70,000 28x28 grayscale images of 10 classes of Kuzushiji (cursive Japanese) characters, with 7,000 images per class. There are 60,000 training images and 10,000 test images. Kuzushiji-MNIST is a drop-in replacement for the MNIST dataset, providing a more challenging alternative for benchmarking machine learning algorithms.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='http://codh.rois.ac.jp/kmnist/', assets=mappingproxy({'train': DownloadInfo(url='https://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-train-imgs.npz', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='https://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-test-imgs.npz', fallbacks=[], checksum=None, filename=None)}), citation='@online{clanuwat2018deep,\n                         author       = {Tarin Clanuwat and Mikel Bober-Irizar and Asanobu Kitamoto and Alex Lamb and Kazuaki Yamamoto and David Ha},\n                         title        = {Deep Learning for Classical Japanese Literature},\n                         date         = {2018-12-03},\n                         year         = {2018},\n                         eprintclass  = {cs.CV},\n                         eprinttype   = {arXiv},\n                         eprint       = {cs.CV/1812.01718}}', license='', checksums=None)
VERSION: Version = Version('1.0.0')
class Linnaeus5(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Linnaeus 5 Dataset

Abstract The Linnaeus 5 dataset contains 1,600 RGB images sized 256x256 pixels, categorized into 5 classes: berry, bird, dog, flower, and other (negative set). It was created to benchmark fine-grained classification and object recognition tasks.

Context While many datasets focus on broad object categories (like CIFAR-10), Linnaeus 5 offers a focused challenge on specific natural objects plus a “negative” class (‘other’). It serves as a good middle-ground benchmark between simple digit recognition (MNIST) and large-scale natural image classification (ImageNet).

Content The dataset consists of: - Images: 8,000 color images (256x256 pixels). - Classes: 5 categories (berry, bird, dog, flower, other). - Splits: Pre-split into Training (1,200 images per class) and Test (400 images per class).

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='http://chaladze.com/l5/', assets=mappingproxy({'train': DownloadInfo(url='http://chaladze.com/l5/img/Linnaeus%205%20256X256.rar', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='http://chaladze.com/l5/img/Linnaeus%205%20256X256.rar', fallbacks=[], checksum=None, filename=None)}), citation='@article{chaladze2017linnaeus,\n                      title={Linnaeus 5 dataset for machine learning},\n                      author={Chaladze, G and Kalatozishvili, L},\n                      journal={chaladze.com},\n                      year={2017}}', license='', checksums=None)
VERSION: Version = Version('1.0.0')
class MedMNIST(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

MedMNIST, a large-scale MNIST-like collection of standardized biomedical images, including 12 datasets for 2D and 6 datasets for 3D.

BUILDER_CONFIGS: list = [MedMNISTConfig(name='pathmnist', version=Version('1.0.0'), description='MedMNIST PathMNIST (2D)'), MedMNISTConfig(name='chestmnist', version=Version('1.0.0'), description='MedMNIST ChestMNIST (2D, multi-label)'), MedMNISTConfig(name='dermamnist', version=Version('1.0.0'), description='MedMNIST DermaMNIST (2D)'), MedMNISTConfig(name='octmnist', version=Version('1.0.0'), description='MedMNIST OCTMNIST (2D)'), MedMNISTConfig(name='pneumoniamnist', version=Version('1.0.0'), description='MedMNIST PneumoniaMNIST (2D)'), MedMNISTConfig(name='retinamnist', version=Version('1.0.0'), description='MedMNIST RetinaMNIST (2D)'), MedMNISTConfig(name='breastmnist', version=Version('1.0.0'), description='MedMNIST BreastMNIST (2D)'), MedMNISTConfig(name='bloodmnist', version=Version('1.0.0'), description='MedMNIST BloodMNIST (2D)'), MedMNISTConfig(name='tissuemnist', version=Version('1.0.0'), description='MedMNIST TissueMNIST (2D)'), MedMNISTConfig(name='organamnist', version=Version('1.0.0'), description='MedMNIST OrganAMNIST (2D)'), MedMNISTConfig(name='organcmnist', version=Version('1.0.0'), description='MedMNIST OrganCMNIST (2D)'), MedMNISTConfig(name='organsmnist', version=Version('1.0.0'), description='MedMNIST OrganSMNIST (2D)'), MedMNISTConfig(name='organmnist3d', version=Version('1.0.0'), description='MedMNIST OrganMNIST3D (3D)'), MedMNISTConfig(name='nodulemnist3d', version=Version('1.0.0'), description='MedMNIST NoduleMNIST3D (3D)'), MedMNISTConfig(name='adrenalmnist3d', version=Version('1.0.0'), description='MedMNIST AdrenalMNIST3D (3D)'), MedMNISTConfig(name='fracturemnist3d', version=Version('1.0.0'), description='MedMNIST FractureMNIST3D (3D)'), MedMNISTConfig(name='vesselmnist3d', version=Version('1.0.0'), description='MedMNIST VesselMNIST3D (3D)'), MedMNISTConfig(name='synapsemnist3d', version=Version('1.0.0'), description='MedMNIST SynapseMNIST3D (3D)')]
VERSION: Version = Version('1.0.0')
class NotMNIST(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

NotMNIST Dataset that contains images of letters A-J.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html', assets=mappingproxy({'train_images': DownloadInfo(url='https://github.com/davidflanagan/notMNIST-to-MNIST/raw/refs/heads/master/train-images-idx3-ubyte.gz', fallbacks=[], checksum=None, filename=None), 'train_labels': DownloadInfo(url='https://github.com/davidflanagan/notMNIST-to-MNIST/raw/refs/heads/master/train-labels-idx1-ubyte.gz', fallbacks=[], checksum=None, filename=None), 'test_images': DownloadInfo(url='https://github.com/davidflanagan/notMNIST-to-MNIST/raw/refs/heads/master/t10k-images-idx3-ubyte.gz', fallbacks=[], checksum=None, filename=None), 'test_labels': DownloadInfo(url='https://github.com/davidflanagan/notMNIST-to-MNIST/raw/refs/heads/master/t10k-labels-idx1-ubyte.gz', fallbacks=[], checksum=None, filename=None)}), citation='@misc{bulatov2011notmnist,\n                          author={Yaroslav Bulatov},\n                          title={notMNIST dataset},\n                          year={2011},\n                          url={http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html}\n                        }', license='', checksums=None)
VERSION: Version = Version('1.0.0')
class RockPaperScissor(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Rock Paper Scissors dataset.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://laurencemoroney.com/datasets.html', assets=mappingproxy({'train': DownloadInfo(url='https://storage.googleapis.com/download.tensorflow.org/data/rps.zip', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='https://storage.googleapis.com/download.tensorflow.org/data/rps-test-set.zip', fallbacks=[], checksum=None, filename=None)}), citation='@misc{laurence2019rock,\n                         title={Rock Paper Scissors Dataset},\n                         author={Laurence Moroney},\n                         year={2019},\n                         url={https://laurencemoroney.com/datasets.html}}', license='CC By 2.0', checksums=None)
VERSION: Version = Version('1.0.0')
class STL10(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

STL-10 Dataset

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://cs.stanford.edu/~acoates/stl10/', assets=mappingproxy({'train': DownloadInfo(url='https://cs.stanford.edu/~acoates/stl10/stl10_binary.tar.gz', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='https://cs.stanford.edu/~acoates/stl10/stl10_binary.tar.gz', fallbacks=[], checksum=None, filename=None), 'unlabeled': DownloadInfo(url='https://cs.stanford.edu/~acoates/stl10/stl10_binary.tar.gz', fallbacks=[], checksum=None, filename=None)}), citation='@article{coates2011analysis,\n                        title={An analysis of single-layer networks in unsupervised feature learning},\n                        author={Coates, Adam and Ng, Andrew Y},\n                        journal={AISTATS},\n                        year={2011}}', license='', checksums=None)
VERSION: Version = Version('1.0.0')
class SVHN(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

SVHN (Street View House Numbers) Dataset for image classification.

SVHN is a real-world image dataset for developing machine learning and object recognition algorithms with minimal requirement on data preprocessing and formatting. It can be seen as similar in flavor to MNIST, but incorporates an order of magnitude more labeled data (over 600,000 digit images) and comes from a significantly harder, unsolved, real world problem (recognizing digits and numbers in natural scene images). SVHN is obtained from house numbers in Google Street View images.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='http://ufldl.stanford.edu/housenumbers/', assets=mappingproxy({'train': DownloadInfo(url='http://ufldl.stanford.edu/housenumbers/train_32x32.mat', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='http://ufldl.stanford.edu/housenumbers/test_32x32.mat', fallbacks=[], checksum=None, filename=None), 'extra': DownloadInfo(url='http://ufldl.stanford.edu/housenumbers/extra_32x32.mat', fallbacks=[], checksum=None, filename=None)}), citation='@inproceedings{netzer2011reading,\n                          title={Reading digits in natural images with unsupervised feature learning},\n                          author={Netzer, Yuval and Wang, Tao and Coates, Adam and Bissacco, Alessandro and Wu, Baolin and Ng, Andrew Y and others},\n                          booktitle={NIPS workshop on deep learning and unsupervised feature learning},\n                          volume={2011},\n                          number={2},\n                          pages={4},\n                          year={2011},\n                          organization={Granada}\n                        }', license='', checksums=None)
VERSION: Version = Version('1.0.0')
class Shapes3D(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Shapes3D dataset: 10x10x10x8x4x15 factor combinations, 64x64 RGB images.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://github.com/google-deepmind/3dshapes-dataset/', assets=mappingproxy({'train': DownloadInfo(url='https://huggingface.co/datasets/randall-lab/shapes3d/resolve/main/shapes3d.npz', fallbacks=[], checksum=None, filename=None)}), citation='@InProceedings{pmlr-v80-kim18b,\n  title = {Disentangling by Factorising},\n  author = {Kim, Hyunjik and Mnih, Andriy},\n  booktitle = {Proceedings of the 35th International Conference on Machine Learning},\n  pages = {2649--2658},\n  year = {2018},\n  editor = {Dy, Jennifer and Krause, Andreas},\n  volume = {80},\n  series = {Proceedings of Machine Learning Research},\n  month = {10--15 Jul},\n  publisher = {PMLR},\n  pdf = {http://proceedings.mlr.press/v80/kim18b/kim18b.pdf},\n  url = {https://proceedings.mlr.press/v80/kim18b.html}\n}', license='apache-2.0', checksums=None)
VERSION: Version = Version('1.0.0')
class SmallNORB(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

SmallNORB dataset: 96x96 stereo images with 5 known factors.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://cs.nyu.edu/~ylclab/data/norb-v1.0-small/', assets=mappingproxy({'train': DownloadInfo(url='https://huggingface.co/datasets/randall-lab/small-norb/resolve/main/smallnorb-train.zip', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='https://huggingface.co/datasets/randall-lab/small-norb/resolve/main/smallnorb-test.zip', fallbacks=[], checksum=None, filename=None)}), citation='@inproceedings{lecun2004learning,\n  title={Learning methods for generic object recognition with invariance to pose and lighting},\n  author={LeCun, Yann and Huang, Fu Jie and Bottou, Leon},\n  booktitle={Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004.},\n  volume={2},\n  pages={II--104},\n  year={2004},\n  organization={IEEE}\n}', license='Apache-2.0', checksums=None)
VERSION: Version = Version('1.0.0')
class TinyImagenet(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Tiny ImageNet dataset for image classification tasks. It contains 200 classes with 500 training images, 50 validation images, and 50 test images per class.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://www.kaggle.com/c/tiny-imagenet', assets=mappingproxy({'train': DownloadInfo(url='http://cs231n.stanford.edu/tiny-imagenet-200.zip', fallbacks=[], checksum=None, filename=None), 'validation': DownloadInfo(url='http://cs231n.stanford.edu/tiny-imagenet-200.zip', fallbacks=[], checksum=None, filename=None), 'test': DownloadInfo(url='http://cs231n.stanford.edu/tiny-imagenet-200.zip', fallbacks=[], checksum=None, filename=None)}), citation='@inproceedings{Le2015TinyIV,\n                          title={Tiny ImageNet Visual Recognition Challenge},\n                          author={Ya Le and Xuan S. Yang},\n                          year={2015}\n                        }', license='MIT License', checksums=None)
VERSION: Version = Version('1.0.0')
class TinyImagenetC(*args, split=None, processed_cache_dir=None, download_dir=None, storage_format=None, backend_kwargs=None, decode_video=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Tiny ImageNet-C dataset for image classification tasks with corruptions applied.

SOURCE: DatasetSource | Mapping = DatasetSource(homepage='https://zenodo.org/records/2536630', assets=mappingproxy({'test': DownloadInfo(url='https://zenodo.org/records/2536630/files/Tiny-ImageNet-C.tar?download=1', fallbacks=[], checksum=None, filename=None)}), citation='@article{hendrycks2019robustness,\n                        title={Benchmarking Neural Network Robustness to Common Corruptions and Perturbations},\n                        author={Dan Hendrycks and Thomas Dietterich},\n                        journal={Proceedings of the International Conference on Learning Representations},\n                        year={2019}}', license='CC BY 4.0', checksums=None)
VERSION: Version = Version('1.0.0')