stable_datasets.images package

Submodules

stable_datasets.images.arabic_characters module

class ArabicCharacters(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Arabic Handwritten Characters Dataset

Abstract Handwritten Arabic character recognition systems face several challenges, including the unlimited variation in human handwriting and large public databases. In this work, we model a deep learning architecture that can be effectively apply to recognizing Arabic handwritten characters. A Convolutional Neural Network (CNN) is a special type of feed-forward multilayer trained in supervised mode. The CNN trained and tested our database that contain 16800 of handwritten Arabic characters. In this paper, the optimization methods implemented to increase the performance of CNN. Common machine learning methods usually apply a combination of feature extractor and trainable classifier. The use of CNN leads to significant improvements across different machine-learning classification algorithms. Our proposed CNN is giving an average 5.1% misclassification error on testing data.

Context The motivation of this study is to use cross knowledge learned from multiple works to enhancement the performance of Arabic handwritten character recognition. In recent years, Arabic handwritten characters recognition with different handwriting styles as well, making it important to find and work on a new and advanced solution for handwriting recognition. A deep learning systems needs a huge number of data (images) to be able to make a good decisions.

Content The data-set is composed of 16,800 characters written by 60 participants, the age range is between 19 to 40 years, and 90% of participants are right-hand. Each participant wrote each character (from ’alef’ to ’yeh’) ten times on two forms as shown in Fig. 7(a) & 7(b). The forms were scanned at the resolution of 300 dpi. Each block is segmented automatically using Matlab 2016a to determining the coordinates for each block. The database is partitioned into two sets: a training set (13,440 characters to 480 images per class) and a test set (3,360 characters to 120 images per class). Writers of training set and test set are exclusive. Ordering of including writers to test set are randomized to make sure that writers of test set are not from a single institution (to ensure variability of the test set).

SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/mloey/Arabic-Handwritten-Characters-Dataset', 'assets': mappingproxy({'train': 'https://github.com/mloey/Arabic-Handwritten-Characters-Dataset/raw/master/Train%20Images%2013440x32x32.zip', 'test': 'https://github.com/mloey/Arabic-Handwritten-Characters-Dataset/raw/master/Test%20Images%203360x32x32.zip'}), 'citation': '@article{el2017arabic,\n title={Arabic handwritten characters recognition using convolutional neural network},\n author={El-Sawy, Ahmed and Loey, Mohamed and El-Bakry, Hazem},\n journal={WSEAS Transactions on Computer Research},\n volume={5},\n pages={11--19},\n year={2017}}'})

VERSION: Version = 1.0.0

stable_datasets.images.arabic_digits module

class ArabicDigits(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Arabic Handwritten Digits Dataset.

SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/mloey/Arabic-Handwritten-Digits-Dataset', 'assets': mappingproxy({'train': 'https://raw.githubusercontent.com/mloey/Arabic-Handwritten-Digits-Dataset/master/Arabic%20Handwritten%20Digits%20Dataset%20CSV.zip', 'test': 'https://raw.githubusercontent.com/mloey/Arabic-Handwritten-Digits-Dataset/master/Arabic%20Handwritten%20Digits%20Dataset%20CSV.zip'}), 'citation': '@inproceedings{el2016cnn,\n title={CNN for handwritten arabic digits recognition based on LeNet-5},\n author={El-Sawy, Ahmed and Hazem, EL-Bakry and Loey, Mohamed},\n booktitle={International conference on advanced intelligent systems and informatics},\n pages={566--575},\n year={2016},\n organization={Springer}\n }'})

VERSION: Version = 1.0.0

stable_datasets.images.awa2 module

Bases: GeneratorBasedBuilder

The Animals with Attributes 2 (AwA2) dataset provides images across 50 animal classes, useful for attribute-based classification and zero-shot learning research. See https://cvml.ista.ac.at/AwA2/ for more information.

VERSION = 1.0.0

stable_datasets.images.beans module

Bases: GeneratorBasedBuilder

Bean disease dataset for classification of three classes: Angular Leaf Spot, Bean Rust, and Healthy leaves.

VERSION = 1.0.0

stable_datasets.images.cars196 module

class Cars196(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Cars-196 Dataset The Cars-196 dataset, also known as the Stanford Cars dataset, is a benchmark dataset for fine-grained visual classification of automobiles. It contains 16,185 color images covering 196 car categories, where each category is defined by a specific combination of make, model, and year. The dataset is split into 8,144 training images and 8,041 test images, with the first 98 classes used exclusively for training and the remaining 98 classes reserved for testing, ensuring that training and test classes are disjoint. Images are collected from real-world scenes and exhibit significant variation in v iewpoint, background, and lighting conditions. Each image is annotated with a class label and a tight bounding box around the car, making the dataset suitable for fine-grained recognition tasks that require precise object localization and strong generalization to unseen categories.

SOURCE: Mapping = mappingproxy({'homepage': 'https://ai.stanford.edu/~jkrause/cars/car_dataset.html', 'assets': mappingproxy({'train': 'https://huggingface.co/datasets/haodoz0118/cars196-img/resolve/main/cars196_train.zip', 'test': 'https://huggingface.co/datasets/haodoz0118/cars196-img/resolve/main/cars196_test.zip'}), 'citation': '@inproceedings{krause20133d,\n title={3d object representations for fine-grained categorization},\n author={Krause, Jonathan and Stark, Michael and Deng, Jia and Fei-Fei, Li},\n booktitle={Proceedings of the IEEE international conference on computer vision workshops},\n pages={554--561},\n year={2013}}'})

VERSION: Version = 1.0.0

stable_datasets.images.cars3d module

class CARS3D[source]

Bases: BaseDatasetBuilder

183 car types x 24 azimuth angles x 4 elevation angles.

SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/google-research/disentanglement_lib/tree/master', 'assets': mappingproxy({'train': 'http://www.scottreed.info/files/nips2015-analogy-data.tar.gz'}), 'license': 'Apache-2.0', 'citation': '@inproceedings{locatello2019challenging,\n title={Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations},\n author={Locatello, Francesco and Bauer, Stefan and Lucic, Mario and Raetsch, Gunnar and Gelly, Sylvain and Sch{"o}lkopf, Bernhard and Bachem, Olivier},\n booktitle={International Conference on Machine Learning},\n pages={4114--4124},\n year={2019}\n}'})

VERSION: Version = 1.0.0

stable_datasets.images.cassava module

Legacy Cassava loader (to be refactored into a BaseDatasetBuilder).

This module was moved under stable_datasets.images to align the repository layout. It still exposes the original imperative cassava.load(…) API for now.

class cassava[source]

Bases: object

Plant images classification.

The data consists of two folders, a training folder that contains 5 subfolders that contain the respective images for the different 5 classes and a test folder containing test images.

classes = ['cbb', 'cmd', 'cbsd', 'cgm', 'healthy']

static download(path)[source]

static load(path=None)[source]

stable_datasets.images.celeb_a module

Bases: GeneratorBasedBuilder

The CelebA dataset is a large-scale face attributes dataset with more than 200K celebrity images, each with 40 attribute annotations.

VERSION = 1.0.0

stable_datasets.images.cifar10 module

class CIFAR10(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Image classification. The `CIFAR-10 < https: // www.cs.toronto.edu/~kriz/cifar.html >`_ dataset was collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. It consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class.

SOURCE: Mapping = mappingproxy({'homepage': 'https://www.cs.toronto.edu/~kriz/cifar.html', 'assets': mappingproxy({'train': 'https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz', 'test': 'https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz'}), 'citation': '@article{krizhevsky2009learning,\n title={Learning multiple layers of features from tiny images},\n author={Krizhevsky, Alex and Hinton, Geoffrey and others},\n year={2009},\n publisher={Toronto, ON, Canada}}'})

VERSION: Version = 1.0.0

stable_datasets.images.cifar100 module

class CIFAR100(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

CIFAR-100 dataset, a variant of CIFAR-10 with 100 classes.

SOURCE: Mapping = mappingproxy({'homepage': 'https://www.cs.toronto.edu/~kriz/cifar.html', 'assets': mappingproxy({'train': 'https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz', 'test': 'https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz'}), 'citation': '@article{krizhevsky2009learning,\n title={Learning multiple layers of features from tiny images},\n author={Krizhevsky, Alex and Hinton, Geoffrey and others},\n year={2009},\n publisher={Toronto, ON, Canada}}'})

VERSION: Version = 1.0.0

stable_datasets.images.cifar100_c module

class CIFAR100C(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

CIFAR-100-C dataset with corrupted CIFAR-100 images.

SOURCE: Mapping = mappingproxy({'homepage': 'https://zenodo.org/records/3555552', 'assets': mappingproxy({'test': 'https://zenodo.org/records/3555552/files/CIFAR-100-C.tar?download=1'}), 'citation': '@article{hendrycks2019robustness,\n title={Benchmarking Neural Network Robustness to Common Corruptions and Perturbations},\n author={Dan Hendrycks and Thomas Dietterich},\n journal={Proceedings of the International Conference on Learning Representations},\n year={2019}}'})

VERSION: Version = 1.0.0

stable_datasets.images.cifar10_c module

class CIFAR10C(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

CIFAR-10-C dataset with corrupted CIFAR-10 images.

SOURCE: Mapping = mappingproxy({'homepage': 'https://zenodo.org/records/2535967', 'assets': mappingproxy({'test': 'https://zenodo.org/records/2535967/files/CIFAR-10-C.tar?download=1'}), 'citation': '@article{hendrycks2019robustness,\n title={Benchmarking Neural Network Robustness to Common Corruptions and Perturbations},\n author={Dan Hendrycks and Thomas Dietterich},\n journal={Proceedings of the International Conference on Learning Representations},\n year={2019}}'})

VERSION: Version = 1.0.0

stable_datasets.images.clevrer module

class CLEVRER(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

CLEVRER: CoLlision Events for Video REpresentation and Reasoning.

A diagnostic video dataset for systematic evaluation of computational models on a wide range of reasoning tasks. The dataset includes four types of questions: descriptive (e.g., “what color”), explanatory (“what’s responsible for”), predictive (“what will happen next”), and counterfactual (“what if”).

The dataset contains 20,000 synthetic videos of moving and colliding objects. Each video is 5 seconds long and contains 128 frames with resolution 480 x 320.

Splits:

train: 10,000 videos (index 0 - 9999)
validation: 5,000 videos (index 10000 - 14999)
test: 5,000 videos (index 15000 - 19999)

SOURCE: Mapping = mappingproxy({'homepage': 'http://clevrer.csail.mit.edu/', 'assets': mappingproxy({'train_videos': 'http://data.csail.mit.edu/clevrer/videos/train/video_train.zip', 'train_annotations': 'http://data.csail.mit.edu/clevrer/annotations/train/annotation_train.zip', 'train_questions': 'http://data.csail.mit.edu/clevrer/questions/train.json', 'validation_videos': 'http://data.csail.mit.edu/clevrer/videos/validation/video_validation.zip', 'validation_annotations': 'http://data.csail.mit.edu/clevrer/annotations/validation/annotation_validation.zip', 'validation_questions': 'http://data.csail.mit.edu/clevrer/questions/validation.json', 'test_videos': 'http://data.csail.mit.edu/clevrer/videos/test/video_test.zip', 'test_questions': 'http://data.csail.mit.edu/clevrer/questions/test.json'}), 'citation': '@inproceedings{yi2020clevrer,\n title={CLEVRER: CoLlision Events for Video REpresentation and Reasoning},\n author={Yi, Kexin and Gan, Chuang and Li, Yunzhu and Kohli, Pushmeet and Wu, Jiajun and Torralba, Antonio and Tenenbaum, Joshua B},\n booktitle={International Conference on Learning Representations},\n year={2020}\n }'})

VERSION: Version = 1.0.0

stable_datasets.images.country211 module

class Country211(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Country211: Image Classification Dataset for Geolocation. This dataset uses a subset of the YFCC100M dataset, filtered by GPS coordinates to include images labeled with ISO-3166 country codes. Each country has a balanced sample of images for training, validation, and testing.

SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/openai/CLIP/blob/main/data/country211.md', 'assets': mappingproxy({'train': 'https://openaipublic.azureedge.net/clip/data/country211.tgz', 'valid': 'https://openaipublic.azureedge.net/clip/data/country211.tgz', 'test': 'https://openaipublic.azureedge.net/clip/data/country211.tgz'}), 'citation': '@inproceedings{radford2021learning,\n title = {Learning transferable visual models from natural language supervision},\n author = {Radford, Alec and Kim, Jong Wook and Hallacy, Chris and Ramesh, Aditya and Goh, Gabriel and Agarwal, Sandhini and Sastry, Girish and Askell, Amanda and Mishkin, Pamela and Clark, Jack and others},\n booktitle = {International conference on machine learning},\n pages = {8748--8763},\n year = {2021},\n organization = {PmLR} }\n '})

VERSION: Version = 1.0.0

stable_datasets.images.cub200 module

class CUB200(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Caltech-UCSD Birds-200-2011 (CUB-200-2011) Dataset

SOURCE: Mapping = mappingproxy({'homepage': 'https://www.vision.caltech.edu/datasets/cub_200_2011/', 'assets': mappingproxy({'train': 'https://data.caltech.edu/records/65de6-vp158/files/CUB_200_2011.tgz?download=1', 'test': 'https://data.caltech.edu/records/65de6-vp158/files/CUB_200_2011.tgz?download=1'}), 'citation': '@techreport{WahCUB_200_2011,\n Title = {The Caltech-UCSD Birds-200-2011 Dataset},\n Author = {Wah, C. and Branson, S. and Welinder, P. and Perona, P. and Belongie, S.},\n Year = {2011},\n Institution = {California Institute of Technology},\n Number = {CNS-TR-2011-001}}'})

VERSION: Version = 1.0.0

stable_datasets.images.dsprites module

class DSprites[source]

Bases: BaseDatasetBuilder

dSprites is a dataset of 2D shapes procedurally generated from 6 ground truth independent latent factors. These factors are color, shape, scale, rotation, x and y positions of a sprite.

SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/deepmind/dsprites-dataset', 'assets': mappingproxy({'train': 'https://github.com/google-deepmind/dsprites-dataset/raw/refs/heads/master/dsprites_ndarray_co1sh3sc6or40x32y32_64x64.npz'}), 'citation': '@inproceedings{higgins2017beta,\n title={beta-vae: Learning basic visual concepts with a constrained variational framework},\n author={Higgins, Irina and Matthey, Loic and Pal, Arka and Burgess, Christopher and Glorot, Xavier and Botvinick, Matthew and Mohamed, Shakir and Lerchner, Alexander},\n booktitle={International conference on learning representations},\n year={2017}'})

VERSION: Version = 1.0.0

stable_datasets.images.dsprites_color module

class DSpritesColor(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

DSprites dSprites is a dataset of 2D shapes procedurally generated from 6 ground truth independent latent factors. These factors are color, shape, scale, rotation, x and y positions of a sprite.

SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/deepmind/dsprites-dataset', 'assets': mappingproxy({'train': 'https://github.com/google-deepmind/dsprites-dataset/raw/refs/heads/master/dsprites_ndarray_co1sh3sc6or40x32y32_64x64.npz'}), 'citation': '@inproceedings{locatello2019challenging,\n title={Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations},\n author={Locatello, Francesco and Bauer, Stefan and Lucic, Mario and Raetsch, Gunnar and Gelly, Sylvain and Sch{"o}lkopf, Bernhard and Bachem, Olivier},\n booktitle={International Conference on Machine Learning},\n pages={4114--4124},\n year={2019}\n }'})

VERSION: Version = 1.0.0

stable_datasets.images.dsprites_noise module

class DSpritesNoise(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

DSprites dSprites is a dataset of 2D shapes procedurally generated from 6 ground truth independent latent factors. These factors are color, shape, scale, rotation, x and y positions of a sprite.

SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/deepmind/dsprites-dataset', 'assets': mappingproxy({'train': 'https://github.com/google-deepmind/dsprites-dataset/raw/refs/heads/master/dsprites_ndarray_co1sh3sc6or40x32y32_64x64.npz'}), 'citation': '@inproceedings{locatello2019challenging,\n title={Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations},\n author={Locatello, Francesco and Bauer, Stefan and Lucic, Mario and Raetsch, Gunnar and Gelly, Sylvain and Sch{"o}lkopf, Bernhard and Bachem, Olivier},\n booktitle={International Conference on Machine Learning},\n pages={4114--4124},\n year={2019}\n }'})

VERSION: Version = 1.0.0

stable_datasets.images.dsprites_scream module

class DSpritesScream(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

DSprites dSprites is a dataset of 2D shapes procedurally generated from 6 ground truth independent latent factors. These factors are color, shape, scale, rotation, x and y positions of a sprite.

SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/deepmind/dsprites-dataset', 'assets': mappingproxy({'train': 'https://github.com/google-deepmind/dsprites-dataset/raw/refs/heads/master/dsprites_ndarray_co1sh3sc6or40x32y32_64x64.npz'}), 'citation': '@inproceedings{higgins2017beta,\n title={beta-vae: Learning basic visual concepts with a constrained variational framework},\n author={Higgins, Irina and Matthey, Loic and Pal, Arka and Burgess, Christopher and Glorot, Xavier and Botvinick, Matthew and Mohamed, Shakir and Lerchner, Alexander},\n booktitle={International conference on learning representations},\n year={2017}'})

VERSION: Version = 1.0.0

stable_datasets.images.dtd module

class DTD(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Describable Textures Dataset (DTD)

DTD is a texture database, consisting of 5640 images, organized according to a list of 47 terms (categories) inspired from human perception. There are 120 images for each category. Image sizes range between 300x300 and 640x640, and the images contain at least 90% of the surface representing the category attribute. The images were collected from Google and Flickr by entering our proposed attributes and related terms as search queries. The images were annotated using Amazon Mechanical Turk in several iterations. For each image we provide key attribute (main category) and a list of joint attributes.

The data is split in three equal parts, in train, validation and test, 40 images per class, for each split. We provide the ground truth annotation for both key and joint attributes, as well as the 10 splits of the data we used for evaluation.

SOURCE: Mapping = mappingproxy({'homepage': 'https://www.robots.ox.ac.uk/~vgg/data/dtd/', 'assets': mappingproxy({'train': 'https://www.robots.ox.ac.uk/~vgg/data/dtd/download/dtd-r1.0.1.tar.gz', 'test': 'https://www.robots.ox.ac.uk/~vgg/data/dtd/download/dtd-r1.0.1.tar.gz', 'val': 'https://www.robots.ox.ac.uk/~vgg/data/dtd/download/dtd-r1.0.1.tar.gz'}), 'citation': '@InProceedings{cimpoi14describing,\n Author = {M. Cimpoi and S. Maji and I. Kokkinos and S. Mohamed and and A. Vedaldi},\n Title = {Describing Textures in the Wild},\n Booktitle = {Proceedings of the {IEEE} Conf. on Computer Vision and Pattern Recognition ({CVPR})},\n Year = {2014}}'})

VERSION: Version = 1.0.0

stable_datasets.images.e_mnist module

class EMNIST(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

EMNIST (Extended MNIST) Dataset

Abstract EMNIST is a set of handwritten characters derived from the NIST Special Database 19 and converted to a 28x28 pixel format that directly matches the MNIST dataset. It serves as a challenging “drop-in” replacement for MNIST, introducing handwritten letters and a larger variety of writing styles while preserving the original file structure and pixel density.

Context While the original MNIST dataset is considered “solved” by modern architectures, EMNIST restores the challenge by providing a larger, more diverse benchmark. It bridges the gap between simple digit recognition and complex handwriting tasks, offering up to 62 classes (digits + uppercase + lowercase) to test generalization and writer-independent recognition.

Content The dataset contains up to 814,255 grayscale images (28x28). It is provided in six split configurations to suit different needs: * ByClass & ByMerge: Full unbalanced sets (up to 62 classes). * Balanced: 131,600 images across 47 classes (ideal for benchmarking). * Letters: 145,600 images across 26 classes (A-Z). * Digits & MNIST: 280,000+ images across 10 classes (0-9).

BUILDER_CONFIGS = [EMNISTConfig(name='byclass', version=1.0.0, data_dir=None, data_files=None, description=None), EMNISTConfig(name='bymerge', version=1.0.0, data_dir=None, data_files=None, description=None), EMNISTConfig(name='balanced', version=1.0.0, data_dir=None, data_files=None, description=None), EMNISTConfig(name='letters', version=1.0.0, data_dir=None, data_files=None, description=None), EMNISTConfig(name='digits', version=1.0.0, data_dir=None, data_files=None, description=None), EMNISTConfig(name='mnist', version=1.0.0, data_dir=None, data_files=None, description=None)]

SOURCE: Mapping = mappingproxy({'homepage': 'https://www.nist.gov/itl/iad/image-group/emnist-dataset', 'citation': '@misc{cohen2017emnistextensionmnisthandwritten,\n title={EMNIST: an extension of MNIST to handwritten letters},\n author={Gregory Cohen and Saeed Afshar and Jonathan Tapson and André van Schaik},\n year={2017},\n eprint={1702.05373},\n archivePrefix={arXiv},\n primaryClass={cs.CV},\n url={https://arxiv.org/abs/1702.05373},\n }', 'assets': mappingproxy({'train': 'https://biometrics.nist.gov/cs_links/EMNIST/matlab.zip', 'test': 'https://biometrics.nist.gov/cs_links/EMNIST/matlab.zip'})})

VERSION: Version = 1.0.0

class EMNISTConfig(variant, **kwargs)[source]: Bases: BuilderConfig

stable_datasets.images.face_pointing module

class FacePointing(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Head angle classification dataset.

SOURCE: Mapping = mappingproxy({'homepage': 'http://crowley-coutaz.fr/HeadPoseDataSet/', 'assets': mappingproxy({'train': 'http://crowley-coutaz.fr/HeadPoseDataSet/HeadPoseImageDatabase.tar.gz'}), 'citation': '@inproceedings{gourier2004estimating,\n title={Estimating face orientation from robust detection of salient facial features},\n author={Gourier, Nicolas and Hall, Daniela and Crowley, James L},\n booktitle={ICPR International Workshop on Visual Observation of Deictic Gestures},\n year={2004},\n organization={Citeseer}}'})

VERSION: Version = 1.0.0

stable_datasets.images.fashion_mnist module

class FashionMNIST(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Grayscale image classification.

Fashion-MNIST is a dataset of Zalando’s article images consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes.

SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/zalandoresearch/fashion-mnist', 'assets': mappingproxy({'train': 'http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz', 'test': 'http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz'}), 'citation': '@article{xiao2017fashion,\n title={Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms},\n author={Xiao, Han and Rasul, Kashif and Vollgraf, Roland},\n journal={arXiv preprint arXiv:1708.07747},\n year={2017}}'})

VERSION: Version = 1.0.0

stable_datasets.images.fgvc_aircraft module

Bases: GeneratorBasedBuilder

FGVC Aircraft Dataset.

VERSION = 1.0.0

stable_datasets.images.flowers102 module

class Flowers102(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Flowers102 Dataset

Abstract The Flowers102 dataset is a fine-grained image classification benchmark consisting of 102 flower categories commonly found in the United Kingdom. It was created to address the challenge of classifying objects with large intra-class variability and small inter-class differences. Each category contains between 40 and 258 images, totaling 8,189 images.

Context Fine-grained visual categorization (FGVC) focuses on differentiating between similar sub-categories of objects (e.g., different species of flowers or birds). Flowers102 serves as a standard benchmark in this domain. Unlike general object recognition (e.g., CIFAR-10), where classes are visually distinct (car vs. dog), Flowers102 requires models to learn subtle features like petal shape, texture, and color patterns.

Content The dataset consists of: - Images: 8,189 images stored in a single archive. - Labels: A MATLAB file mapping each image to one of 102 classes (0-101). - Splits: A predefined split ID file dividing the data into Training (1,020 images), Validation (1,020 images), and Test (6,149 images).

SOURCE: Mapping = mappingproxy({'homepage': 'https://www.robots.ox.ac.uk/~vgg/data/flowers/102/', 'citation': '@inproceedings{nilsback2008flowers102,\n title={Automated flower classification over a large number of classes},\n author={Nilsback, Maria-Elena and Zisserman, Andrew},\n booktitle={2008 Sixth Indian conference on computer vision, graphics \\& image processing},\n pages={722--729},\n year={2008},\n organization={IEEE}}', 'assets': mappingproxy({'images': 'https://www.robots.ox.ac.uk/~vgg/data/flowers/102/102flowers.tgz', 'labels': 'https://www.robots.ox.ac.uk/~vgg/data/flowers/102/imagelabels.mat', 'setid': 'https://www.robots.ox.ac.uk/~vgg/data/flowers/102/setid.mat'})})

VERSION: Version = 1.0.0

stable_datasets.images.food101 module

class Food101(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

SOURCE: Mapping = mappingproxy({'homepage': 'https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/', 'assets': mappingproxy({'train': 'https://huggingface.co/datasets/haodoz0118/food101-img/resolve/main/food101_train.zip', 'test': 'https://huggingface.co/datasets/haodoz0118/food101-img/resolve/main/food101_test.zip'}), 'citation': '@inproceedings{bossard14,\n title = {Food-101 -- Mining Discriminative Components with Random Forests},\n author = {Bossard, Lukas and Guillaumin, Matthieu and Van Gool, Luc},\n booktitle = {European Conference on Computer Vision},\n year = {2014}}'})

VERSION: Version = 1.0.0

stable_datasets.images.hasy_v2 module

class HASYv2(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

HASYv2 Dataset

Abstract The HASYv2 dataset contains handwritten symbol images of 369 classes. It includes over 168,000 samples categorized into various classes like Latin characters, numerals, and symbols. Each image is 32x32 pixels in size. The dataset was created to benchmark the classification of mathematical symbols and handwritten characters.

Context Recognizing handwritten mathematical symbols is a challenging task due to the similarity between classes (e.g., ‘1’, ‘l’, ‘|’) and the large number of unique symbols used in scientific notation. HASYv2 serves as a standard benchmark for testing classifiers on a large number of classes (369) with low resolution (32x32).

Content The dataset consists of: - Images: 168,236 black-and-white images (32x32 pixels). - Labels: 369 distinct classes. - Splits: The dataset includes 10 pre-defined folds. This implementation uses ‘Fold 1’ as the standard train/test split.

BUILDER_CONFIGS = [BuilderConfig(name='fold-1', version=1.0.0, data_dir=None, data_files=None, description='HASYv2 dataset using fold 1 as the test set.'), BuilderConfig(name='fold-2', version=1.0.0, data_dir=None, data_files=None, description='HASYv2 dataset using fold 2 as the test set.'), BuilderConfig(name='fold-3', version=1.0.0, data_dir=None, data_files=None, description='HASYv2 dataset using fold 3 as the test set.'), BuilderConfig(name='fold-4', version=1.0.0, data_dir=None, data_files=None, description='HASYv2 dataset using fold 4 as the test set.'), BuilderConfig(name='fold-5', version=1.0.0, data_dir=None, data_files=None, description='HASYv2 dataset using fold 5 as the test set.'), BuilderConfig(name='fold-6', version=1.0.0, data_dir=None, data_files=None, description='HASYv2 dataset using fold 6 as the test set.'), BuilderConfig(name='fold-7', version=1.0.0, data_dir=None, data_files=None, description='HASYv2 dataset using fold 7 as the test set.'), BuilderConfig(name='fold-8', version=1.0.0, data_dir=None, data_files=None, description='HASYv2 dataset using fold 8 as the test set.'), BuilderConfig(name='fold-9', version=1.0.0, data_dir=None, data_files=None, description='HASYv2 dataset using fold 9 as the test set.'), BuilderConfig(name='fold-10', version=1.0.0, data_dir=None, data_files=None, description='HASYv2 dataset using fold 10 as the test set.')]

DEFAULT_CONFIG_NAME = 'fold-1'

SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/MartinThoma/HASY', 'citation': '@article{thoma2017hasyv2,\n title={The hasyv2 dataset},\n author={Thoma, Martin},\n journal={arXiv preprint arXiv:1701.08380},\n year={2017}}', 'assets': mappingproxy({'train': 'https://zenodo.org/record/259444/files/HASYv2.tar.bz2?download=1', 'test': 'https://zenodo.org/record/259444/files/HASYv2.tar.bz2?download=1'})})

VERSION: Version = 1.0.0

stable_datasets.images.imagenet module

exception DownloadError(message='')[source]

Bases: Exception

Base class for exceptions in this module.

download(n_images, min_size, n_threads, wnids_list, out_dir)[source]

download_images(dir_path, image_url_list, n_images, min_size)[source]

get_full_subtree_wnid(wnid, timeout=5, retry=3)

get_image_urls(wnid, timeout=5, retry=3)

get_subtree_wnid(wnid, timeout=5, retry=3)

get_url_request_list_function(request_url)[source]

get_words_wnid(wnid)[source]

main(wnid, out_dir, n_threads, n_images, fullsubtree, noroot, nosubtree, min_size)[source]

mkdir(path)[source]

stable_datasets.images.imagenette module

Bases: GeneratorBasedBuilder

TODO: Short description of my dataset.

BUILDER_CONFIGS = [BuilderConfig(name='imagenet', version=1.1.0, data_dir=None, data_files=None, description='1000-class version'), BuilderConfig(name='imagenette', version=1.1.0, data_dir=None, data_files=None, description='10-class version'), BuilderConfig(name='imagenet100', version=1.1.0, data_dir=None, data_files=None, description='100-class version')]

DEFAULT_CONFIG_NAME = 'imagenette'

VERSION = 1.1.0

stable_datasets.images.k_mnist module

class KMNIST(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Image classification. The Kuzushiji-MNIST dataset consists of 70,000 28x28 grayscale images of 10 classes of Kuzushiji (cursive Japanese) characters, with 7,000 images per class. There are 60,000 training images and 10,000 test images. Kuzushiji-MNIST is a drop-in replacement for the MNIST dataset, providing a more challenging alternative for benchmarking machine learning algorithms.

SOURCE: Mapping = mappingproxy({'homepage': 'http://codh.rois.ac.jp/kmnist/', 'assets': mappingproxy({'train': 'https://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-train-imgs.npz', 'test': 'https://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-test-imgs.npz'}), 'citation': '@online{clanuwat2018deep,\n author = {Tarin Clanuwat and Mikel Bober-Irizar and Asanobu Kitamoto and Alex Lamb and Kazuaki Yamamoto and David Ha},\n title = {Deep Learning for Classical Japanese Literature},\n date = {2018-12-03},\n year = {2018},\n eprintclass = {cs.CV},\n eprinttype = {arXiv},\n eprint = {cs.CV/1812.01718}}'})

VERSION: Version = 1.0.0

stable_datasets.images.linnaeus5 module

class Linnaeus5(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Linnaeus 5 Dataset

Abstract The Linnaeus 5 dataset contains 1,600 RGB images sized 256x256 pixels, categorized into 5 classes: berry, bird, dog, flower, and other (negative set). It was created to benchmark fine-grained classification and object recognition tasks.

Context While many datasets focus on broad object categories (like CIFAR-10), Linnaeus 5 offers a focused challenge on specific natural objects plus a “negative” class (‘other’). It serves as a good middle-ground benchmark between simple digit recognition (MNIST) and large-scale natural image classification (ImageNet).

Content The dataset consists of: - Images: 8,000 color images (256x256 pixels). - Classes: 5 categories (berry, bird, dog, flower, other). - Splits: Pre-split into Training (1,200 images per class) and Test (400 images per class).

SOURCE: Mapping = mappingproxy({'homepage': 'http://chaladze.com/l5/', 'citation': '@article{chaladze2017linnaeus,\n title={Linnaeus 5 dataset for machine learning},\n author={Chaladze, G and Kalatozishvili, L},\n journal={chaladze.com},\n year={2017}}', 'assets': mappingproxy({'data': 'http://chaladze.com/l5/img/Linnaeus%205%20256X256.rar'})})

VERSION: Version = 1.0.0

stable_datasets.images.med_mnist module

class MedMNIST(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

MedMNIST, a large-scale MNIST-like collection of standardized biomedical images, including 12 datasets for 2D and 6 datasets for 3D.

BUILDER_CONFIGS = [MedMNISTConfig(name='pathmnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST PathMNIST (2D)'), MedMNISTConfig(name='chestmnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST ChestMNIST (2D, multi-label)'), MedMNISTConfig(name='dermamnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST DermaMNIST (2D)'), MedMNISTConfig(name='octmnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST OCTMNIST (2D)'), MedMNISTConfig(name='pneumoniamnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST PneumoniaMNIST (2D)'), MedMNISTConfig(name='retinamnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST RetinaMNIST (2D)'), MedMNISTConfig(name='breastmnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST BreastMNIST (2D)'), MedMNISTConfig(name='bloodmnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST BloodMNIST (2D)'), MedMNISTConfig(name='tissuemnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST TissueMNIST (2D)'), MedMNISTConfig(name='organamnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST OrganAMNIST (2D)'), MedMNISTConfig(name='organcmnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST OrganCMNIST (2D)'), MedMNISTConfig(name='organsmnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST OrganSMNIST (2D)'), MedMNISTConfig(name='organmnist3d', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST OrganMNIST3D (3D)'), MedMNISTConfig(name='nodulemnist3d', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST NoduleMNIST3D (3D)'), MedMNISTConfig(name='adrenalmnist3d', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST AdrenalMNIST3D (3D)'), MedMNISTConfig(name='fracturemnist3d', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST FractureMNIST3D (3D)'), MedMNISTConfig(name='vesselmnist3d', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST VesselMNIST3D (3D)'), MedMNISTConfig(name='synapsemnist3d', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST SynapseMNIST3D (3D)')]

VERSION: Version = 1.0.0

class MedMNISTConfig(*, num_classes: int, is_3d: bool = False, multi_label: bool = False, **kwargs)[source]

Bases: BuilderConfig

BuilderConfig with per-variant metadata used by MedMNIST._info().

stable_datasets.images.mnist module

Bases: GeneratorBasedBuilder

MNIST Dataset using raw IDX files for digit classification.

VERSION = 1.0.0

stable_datasets.images.not_mnist module

class NotMNIST(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

NotMNIST Dataset that contains images of letters A-J.

SOURCE: Mapping = mappingproxy({'homepage': 'https://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html', 'assets': mappingproxy({'train_images': 'https://github.com/davidflanagan/notMNIST-to-MNIST/raw/refs/heads/master/train-images-idx3-ubyte.gz', 'train_labels': 'https://github.com/davidflanagan/notMNIST-to-MNIST/raw/refs/heads/master/train-labels-idx1-ubyte.gz', 'test_images': 'https://github.com/davidflanagan/notMNIST-to-MNIST/raw/refs/heads/master/t10k-images-idx3-ubyte.gz', 'test_labels': 'https://github.com/davidflanagan/notMNIST-to-MNIST/raw/refs/heads/master/t10k-labels-idx1-ubyte.gz'}), 'citation': '@misc{bulatov2011notmnist,\n author={Yaroslav Bulatov},\n title={notMNIST dataset},\n year={2011},\n url={http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html}\n }'})

VERSION: Version = 1.0.0

stable_datasets.images.patch_camelyon module

PatchCamelyon dataset (stub).

This file was previously a broken legacy loader at the top-level package. It was moved under stable_datasets.images to match the repository layout.

TODO: Implement as a HuggingFace-compatible builder using BaseDatasetBuilder and the local download helpers in stable_datasets.utils.

class PatchCamelyon(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/basveeling/pcam', 'citation': 'TBD', 'assets': mappingproxy({})})

VERSION: Version = 0.0.0

stable_datasets.images.places365_small module

Bases: GeneratorBasedBuilder

The Places365-Standard dataset (small version) for image classification.

VERSION = 1.0.0

static extract_train_class(input_string)[source]

stable_datasets.images.rock_paper_scissor module

class RockPaperScissor(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Rock Paper Scissors dataset.

SOURCE: Mapping = mappingproxy({'homepage': 'https://laurencemoroney.com/datasets.html', 'assets': mappingproxy({'train': 'https://storage.googleapis.com/download.tensorflow.org/data/rps.zip', 'test': 'https://storage.googleapis.com/download.tensorflow.org/data/rps-test-set.zip'}), 'citation': '@misc{laurence2019rock,\n title={Rock Paper Scissors Dataset},\n author={Laurence Moroney},\n year={2019},\n url={https://laurencemoroney.com/datasets.html}}', 'license': 'CC By 2.0'})

VERSION: Version = 1.0.0

stable_datasets.images.shapes3d module

class Shapes3D(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Shapes3D dataset: 10x10x10x8x4x15 factor combinations, 64x64 RGB images.

SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/google-deepmind/3dshapes-dataset/', 'assets': mappingproxy({'train': 'https://huggingface.co/datasets/randall-lab/shapes3d/resolve/main/shapes3d.npz'}), 'license': 'apache-2.0', 'citation': '@InProceedings{pmlr-v80-kim18b,\n title = {Disentangling by Factorising},\n author = {Kim, Hyunjik and Mnih, Andriy},\n booktitle = {Proceedings of the 35th International Conference on Machine Learning},\n pages = {2649--2658},\n year = {2018},\n editor = {Dy, Jennifer and Krause, Andreas},\n volume = {80},\n series = {Proceedings of Machine Learning Research},\n month = {10--15 Jul},\n publisher = {PMLR},\n pdf = {http://proceedings.mlr.press/v80/kim18b/kim18b.pdf},\n url = {https://proceedings.mlr.press/v80/kim18b.html}\n}'})

VERSION: Version = 1.0.0

stable_datasets.images.small_norb module

class SmallNORB(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

SmallNORB dataset: 96x96 stereo images with 5 known factors.

SOURCE: Mapping = mappingproxy({'homepage': 'https://cs.nyu.edu/~ylclab/data/norb-v1.0-small/', 'assets': mappingproxy({'train': 'https://huggingface.co/datasets/randall-lab/small-norb/resolve/main/smallnorb-train.zip', 'test': 'https://huggingface.co/datasets/randall-lab/small-norb/resolve/main/smallnorb-test.zip'}), 'license': 'Apache-2.0', 'citation': '@inproceedings{lecun2004learning,\n title={Learning methods for generic object recognition with invariance to pose and lighting},\n author={LeCun, Yann and Huang, Fu Jie and Bottou, Leon},\n booktitle={Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004.},\n volume={2},\n pages={II--104},\n year={2004},\n organization={IEEE}\n}'})

VERSION: Version = 1.0.0

stable_datasets.images.stl10 module

class STL10(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

STL-10 Dataset

SOURCE: Mapping = mappingproxy({'homepage': 'https://cs.stanford.edu/~acoates/stl10/', 'assets': mappingproxy({'train': 'https://cs.stanford.edu/~acoates/stl10/stl10_binary.tar.gz', 'test': 'https://cs.stanford.edu/~acoates/stl10/stl10_binary.tar.gz', 'unlabeled': 'https://cs.stanford.edu/~acoates/stl10/stl10_binary.tar.gz'}), 'citation': '@article{coates2011analysis,\n title={An analysis of single-layer networks in unsupervised feature learning},\n author={Coates, Adam and Ng, Andrew Y},\n journal={AISTATS},\n year={2011}}'})

VERSION: Version = 1.0.0

stable_datasets.images.svhn module

class SVHN(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

SVHN (Street View House Numbers) Dataset for image classification.

SVHN is a real-world image dataset for developing machine learning and object recognition algorithms with minimal requirement on data preprocessing and formatting. It can be seen as similar in flavor to MNIST, but incorporates an order of magnitude more labeled data (over 600,000 digit images) and comes from a significantly harder, unsolved, real world problem (recognizing digits and numbers in natural scene images). SVHN is obtained from house numbers in Google Street View images.

SOURCE: Mapping = mappingproxy({'homepage': 'http://ufldl.stanford.edu/housenumbers/', 'assets': mappingproxy({'train': 'http://ufldl.stanford.edu/housenumbers/train_32x32.mat', 'test': 'http://ufldl.stanford.edu/housenumbers/test_32x32.mat', 'extra': 'http://ufldl.stanford.edu/housenumbers/extra_32x32.mat'}), 'citation': '@inproceedings{netzer2011reading,\n title={Reading digits in natural images with unsupervised feature learning},\n author={Netzer, Yuval and Wang, Tao and Coates, Adam and Bissacco, Alessandro and Wu, Baolin and Ng, Andrew Y and others},\n booktitle={NIPS workshop on deep learning and unsupervised feature learning},\n volume={2011},\n number={2},\n pages={4},\n year={2011},\n organization={Granada}\n }'})

VERSION: Version = 1.0.0

stable_datasets.images.tiny_imagenet module

Bases: GeneratorBasedBuilder

Tiny ImageNet dataset for image classification tasks. It contains 200 classes with 500 training images, 50 validation images, and 50 test images per class.

VERSION = 1.0.0

stable_datasets.images.tiny_imagenet_c module

Bases: GeneratorBasedBuilder

Tiny ImageNet-C dataset for image classification tasks with corruptions applied.

VERSION = 1.0.0

Module contents

class ArabicCharacters(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Arabic Handwritten Characters Dataset

Abstract Handwritten Arabic character recognition systems face several challenges, including the unlimited variation in human handwriting and large public databases. In this work, we model a deep learning architecture that can be effectively apply to recognizing Arabic handwritten characters. A Convolutional Neural Network (CNN) is a special type of feed-forward multilayer trained in supervised mode. The CNN trained and tested our database that contain 16800 of handwritten Arabic characters. In this paper, the optimization methods implemented to increase the performance of CNN. Common machine learning methods usually apply a combination of feature extractor and trainable classifier. The use of CNN leads to significant improvements across different machine-learning classification algorithms. Our proposed CNN is giving an average 5.1% misclassification error on testing data.

Context The motivation of this study is to use cross knowledge learned from multiple works to enhancement the performance of Arabic handwritten character recognition. In recent years, Arabic handwritten characters recognition with different handwriting styles as well, making it important to find and work on a new and advanced solution for handwriting recognition. A deep learning systems needs a huge number of data (images) to be able to make a good decisions.

Content The data-set is composed of 16,800 characters written by 60 participants, the age range is between 19 to 40 years, and 90% of participants are right-hand. Each participant wrote each character (from ’alef’ to ’yeh’) ten times on two forms as shown in Fig. 7(a) & 7(b). The forms were scanned at the resolution of 300 dpi. Each block is segmented automatically using Matlab 2016a to determining the coordinates for each block. The database is partitioned into two sets: a training set (13,440 characters to 480 images per class) and a test set (3,360 characters to 120 images per class). Writers of training set and test set are exclusive. Ordering of including writers to test set are randomized to make sure that writers of test set are not from a single institution (to ensure variability of the test set).

SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/mloey/Arabic-Handwritten-Characters-Dataset', 'assets': mappingproxy({'train': 'https://github.com/mloey/Arabic-Handwritten-Characters-Dataset/raw/master/Train%20Images%2013440x32x32.zip', 'test': 'https://github.com/mloey/Arabic-Handwritten-Characters-Dataset/raw/master/Test%20Images%203360x32x32.zip'}), 'citation': '@article{el2017arabic,\n title={Arabic handwritten characters recognition using convolutional neural network},\n author={El-Sawy, Ahmed and Loey, Mohamed and El-Bakry, Hazem},\n journal={WSEAS Transactions on Computer Research},\n volume={5},\n pages={11--19},\n year={2017}}'})

VERSION: Version = 1.0.0

class ArabicDigits(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Arabic Handwritten Digits Dataset.

SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/mloey/Arabic-Handwritten-Digits-Dataset', 'assets': mappingproxy({'train': 'https://raw.githubusercontent.com/mloey/Arabic-Handwritten-Digits-Dataset/master/Arabic%20Handwritten%20Digits%20Dataset%20CSV.zip', 'test': 'https://raw.githubusercontent.com/mloey/Arabic-Handwritten-Digits-Dataset/master/Arabic%20Handwritten%20Digits%20Dataset%20CSV.zip'}), 'citation': '@inproceedings{el2016cnn,\n title={CNN for handwritten arabic digits recognition based on LeNet-5},\n author={El-Sawy, Ahmed and Hazem, EL-Bakry and Loey, Mohamed},\n booktitle={International conference on advanced intelligent systems and informatics},\n pages={566--575},\n year={2016},\n organization={Springer}\n }'})

VERSION: Version = 1.0.0

class CARS3D[source]

Bases: BaseDatasetBuilder

183 car types x 24 azimuth angles x 4 elevation angles.

SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/google-research/disentanglement_lib/tree/master', 'assets': mappingproxy({'train': 'http://www.scottreed.info/files/nips2015-analogy-data.tar.gz'}), 'license': 'Apache-2.0', 'citation': '@inproceedings{locatello2019challenging,\n title={Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations},\n author={Locatello, Francesco and Bauer, Stefan and Lucic, Mario and Raetsch, Gunnar and Gelly, Sylvain and Sch{"o}lkopf, Bernhard and Bachem, Olivier},\n booktitle={International Conference on Machine Learning},\n pages={4114--4124},\n year={2019}\n}'})

VERSION: Version = 1.0.0

class CIFAR10(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Image classification. The `CIFAR-10 < https: // www.cs.toronto.edu/~kriz/cifar.html >`_ dataset was collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. It consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class.

SOURCE: Mapping = mappingproxy({'homepage': 'https://www.cs.toronto.edu/~kriz/cifar.html', 'assets': mappingproxy({'train': 'https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz', 'test': 'https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz'}), 'citation': '@article{krizhevsky2009learning,\n title={Learning multiple layers of features from tiny images},\n author={Krizhevsky, Alex and Hinton, Geoffrey and others},\n year={2009},\n publisher={Toronto, ON, Canada}}'})

VERSION: Version = 1.0.0

class CIFAR100(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

CIFAR-100 dataset, a variant of CIFAR-10 with 100 classes.

SOURCE: Mapping = mappingproxy({'homepage': 'https://www.cs.toronto.edu/~kriz/cifar.html', 'assets': mappingproxy({'train': 'https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz', 'test': 'https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz'}), 'citation': '@article{krizhevsky2009learning,\n title={Learning multiple layers of features from tiny images},\n author={Krizhevsky, Alex and Hinton, Geoffrey and others},\n year={2009},\n publisher={Toronto, ON, Canada}}'})

VERSION: Version = 1.0.0

class CIFAR100C(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

CIFAR-100-C dataset with corrupted CIFAR-100 images.

SOURCE: Mapping = mappingproxy({'homepage': 'https://zenodo.org/records/3555552', 'assets': mappingproxy({'test': 'https://zenodo.org/records/3555552/files/CIFAR-100-C.tar?download=1'}), 'citation': '@article{hendrycks2019robustness,\n title={Benchmarking Neural Network Robustness to Common Corruptions and Perturbations},\n author={Dan Hendrycks and Thomas Dietterich},\n journal={Proceedings of the International Conference on Learning Representations},\n year={2019}}'})

VERSION: Version = 1.0.0

class CIFAR10C(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

CIFAR-10-C dataset with corrupted CIFAR-10 images.

SOURCE: Mapping = mappingproxy({'homepage': 'https://zenodo.org/records/2535967', 'assets': mappingproxy({'test': 'https://zenodo.org/records/2535967/files/CIFAR-10-C.tar?download=1'}), 'citation': '@article{hendrycks2019robustness,\n title={Benchmarking Neural Network Robustness to Common Corruptions and Perturbations},\n author={Dan Hendrycks and Thomas Dietterich},\n journal={Proceedings of the International Conference on Learning Representations},\n year={2019}}'})

VERSION: Version = 1.0.0

class CLEVRER(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

CLEVRER: CoLlision Events for Video REpresentation and Reasoning.

A diagnostic video dataset for systematic evaluation of computational models on a wide range of reasoning tasks. The dataset includes four types of questions: descriptive (e.g., “what color”), explanatory (“what’s responsible for”), predictive (“what will happen next”), and counterfactual (“what if”).

The dataset contains 20,000 synthetic videos of moving and colliding objects. Each video is 5 seconds long and contains 128 frames with resolution 480 x 320.

Splits:

train: 10,000 videos (index 0 - 9999)
validation: 5,000 videos (index 10000 - 14999)
test: 5,000 videos (index 15000 - 19999)

SOURCE: Mapping = mappingproxy({'homepage': 'http://clevrer.csail.mit.edu/', 'assets': mappingproxy({'train_videos': 'http://data.csail.mit.edu/clevrer/videos/train/video_train.zip', 'train_annotations': 'http://data.csail.mit.edu/clevrer/annotations/train/annotation_train.zip', 'train_questions': 'http://data.csail.mit.edu/clevrer/questions/train.json', 'validation_videos': 'http://data.csail.mit.edu/clevrer/videos/validation/video_validation.zip', 'validation_annotations': 'http://data.csail.mit.edu/clevrer/annotations/validation/annotation_validation.zip', 'validation_questions': 'http://data.csail.mit.edu/clevrer/questions/validation.json', 'test_videos': 'http://data.csail.mit.edu/clevrer/videos/test/video_test.zip', 'test_questions': 'http://data.csail.mit.edu/clevrer/questions/test.json'}), 'citation': '@inproceedings{yi2020clevrer,\n title={CLEVRER: CoLlision Events for Video REpresentation and Reasoning},\n author={Yi, Kexin and Gan, Chuang and Li, Yunzhu and Kohli, Pushmeet and Wu, Jiajun and Torralba, Antonio and Tenenbaum, Joshua B},\n booktitle={International Conference on Learning Representations},\n year={2020}\n }'})

VERSION: Version = 1.0.0

class CUB200(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Caltech-UCSD Birds-200-2011 (CUB-200-2011) Dataset

SOURCE: Mapping = mappingproxy({'homepage': 'https://www.vision.caltech.edu/datasets/cub_200_2011/', 'assets': mappingproxy({'train': 'https://data.caltech.edu/records/65de6-vp158/files/CUB_200_2011.tgz?download=1', 'test': 'https://data.caltech.edu/records/65de6-vp158/files/CUB_200_2011.tgz?download=1'}), 'citation': '@techreport{WahCUB_200_2011,\n Title = {The Caltech-UCSD Birds-200-2011 Dataset},\n Author = {Wah, C. and Branson, S. and Welinder, P. and Perona, P. and Belongie, S.},\n Year = {2011},\n Institution = {California Institute of Technology},\n Number = {CNS-TR-2011-001}}'})

VERSION: Version = 1.0.0

class Cars196(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Cars-196 Dataset The Cars-196 dataset, also known as the Stanford Cars dataset, is a benchmark dataset for fine-grained visual classification of automobiles. It contains 16,185 color images covering 196 car categories, where each category is defined by a specific combination of make, model, and year. The dataset is split into 8,144 training images and 8,041 test images, with the first 98 classes used exclusively for training and the remaining 98 classes reserved for testing, ensuring that training and test classes are disjoint. Images are collected from real-world scenes and exhibit significant variation in v iewpoint, background, and lighting conditions. Each image is annotated with a class label and a tight bounding box around the car, making the dataset suitable for fine-grained recognition tasks that require precise object localization and strong generalization to unseen categories.

SOURCE: Mapping = mappingproxy({'homepage': 'https://ai.stanford.edu/~jkrause/cars/car_dataset.html', 'assets': mappingproxy({'train': 'https://huggingface.co/datasets/haodoz0118/cars196-img/resolve/main/cars196_train.zip', 'test': 'https://huggingface.co/datasets/haodoz0118/cars196-img/resolve/main/cars196_test.zip'}), 'citation': '@inproceedings{krause20133d,\n title={3d object representations for fine-grained categorization},\n author={Krause, Jonathan and Stark, Michael and Deng, Jia and Fei-Fei, Li},\n booktitle={Proceedings of the IEEE international conference on computer vision workshops},\n pages={554--561},\n year={2013}}'})

VERSION: Version = 1.0.0

class Country211(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Country211: Image Classification Dataset for Geolocation. This dataset uses a subset of the YFCC100M dataset, filtered by GPS coordinates to include images labeled with ISO-3166 country codes. Each country has a balanced sample of images for training, validation, and testing.

SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/openai/CLIP/blob/main/data/country211.md', 'assets': mappingproxy({'train': 'https://openaipublic.azureedge.net/clip/data/country211.tgz', 'valid': 'https://openaipublic.azureedge.net/clip/data/country211.tgz', 'test': 'https://openaipublic.azureedge.net/clip/data/country211.tgz'}), 'citation': '@inproceedings{radford2021learning,\n title = {Learning transferable visual models from natural language supervision},\n author = {Radford, Alec and Kim, Jong Wook and Hallacy, Chris and Ramesh, Aditya and Goh, Gabriel and Agarwal, Sandhini and Sastry, Girish and Askell, Amanda and Mishkin, Pamela and Clark, Jack and others},\n booktitle = {International conference on machine learning},\n pages = {8748--8763},\n year = {2021},\n organization = {PmLR} }\n '})

VERSION: Version = 1.0.0

class DSprites[source]

Bases: BaseDatasetBuilder

dSprites is a dataset of 2D shapes procedurally generated from 6 ground truth independent latent factors. These factors are color, shape, scale, rotation, x and y positions of a sprite.

SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/deepmind/dsprites-dataset', 'assets': mappingproxy({'train': 'https://github.com/google-deepmind/dsprites-dataset/raw/refs/heads/master/dsprites_ndarray_co1sh3sc6or40x32y32_64x64.npz'}), 'citation': '@inproceedings{higgins2017beta,\n title={beta-vae: Learning basic visual concepts with a constrained variational framework},\n author={Higgins, Irina and Matthey, Loic and Pal, Arka and Burgess, Christopher and Glorot, Xavier and Botvinick, Matthew and Mohamed, Shakir and Lerchner, Alexander},\n booktitle={International conference on learning representations},\n year={2017}'})

VERSION: Version = 1.0.0

class DSpritesColor(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

DSprites dSprites is a dataset of 2D shapes procedurally generated from 6 ground truth independent latent factors. These factors are color, shape, scale, rotation, x and y positions of a sprite.

SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/deepmind/dsprites-dataset', 'assets': mappingproxy({'train': 'https://github.com/google-deepmind/dsprites-dataset/raw/refs/heads/master/dsprites_ndarray_co1sh3sc6or40x32y32_64x64.npz'}), 'citation': '@inproceedings{locatello2019challenging,\n title={Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations},\n author={Locatello, Francesco and Bauer, Stefan and Lucic, Mario and Raetsch, Gunnar and Gelly, Sylvain and Sch{"o}lkopf, Bernhard and Bachem, Olivier},\n booktitle={International Conference on Machine Learning},\n pages={4114--4124},\n year={2019}\n }'})

VERSION: Version = 1.0.0

class DSpritesNoise(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

DSprites dSprites is a dataset of 2D shapes procedurally generated from 6 ground truth independent latent factors. These factors are color, shape, scale, rotation, x and y positions of a sprite.

SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/deepmind/dsprites-dataset', 'assets': mappingproxy({'train': 'https://github.com/google-deepmind/dsprites-dataset/raw/refs/heads/master/dsprites_ndarray_co1sh3sc6or40x32y32_64x64.npz'}), 'citation': '@inproceedings{locatello2019challenging,\n title={Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations},\n author={Locatello, Francesco and Bauer, Stefan and Lucic, Mario and Raetsch, Gunnar and Gelly, Sylvain and Sch{"o}lkopf, Bernhard and Bachem, Olivier},\n booktitle={International Conference on Machine Learning},\n pages={4114--4124},\n year={2019}\n }'})

VERSION: Version = 1.0.0

class DSpritesScream(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

DSprites dSprites is a dataset of 2D shapes procedurally generated from 6 ground truth independent latent factors. These factors are color, shape, scale, rotation, x and y positions of a sprite.

SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/deepmind/dsprites-dataset', 'assets': mappingproxy({'train': 'https://github.com/google-deepmind/dsprites-dataset/raw/refs/heads/master/dsprites_ndarray_co1sh3sc6or40x32y32_64x64.npz'}), 'citation': '@inproceedings{higgins2017beta,\n title={beta-vae: Learning basic visual concepts with a constrained variational framework},\n author={Higgins, Irina and Matthey, Loic and Pal, Arka and Burgess, Christopher and Glorot, Xavier and Botvinick, Matthew and Mohamed, Shakir and Lerchner, Alexander},\n booktitle={International conference on learning representations},\n year={2017}'})

VERSION: Version = 1.0.0

class DTD(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Describable Textures Dataset (DTD)

DTD is a texture database, consisting of 5640 images, organized according to a list of 47 terms (categories) inspired from human perception. There are 120 images for each category. Image sizes range between 300x300 and 640x640, and the images contain at least 90% of the surface representing the category attribute. The images were collected from Google and Flickr by entering our proposed attributes and related terms as search queries. The images were annotated using Amazon Mechanical Turk in several iterations. For each image we provide key attribute (main category) and a list of joint attributes.

The data is split in three equal parts, in train, validation and test, 40 images per class, for each split. We provide the ground truth annotation for both key and joint attributes, as well as the 10 splits of the data we used for evaluation.

SOURCE: Mapping = mappingproxy({'homepage': 'https://www.robots.ox.ac.uk/~vgg/data/dtd/', 'assets': mappingproxy({'train': 'https://www.robots.ox.ac.uk/~vgg/data/dtd/download/dtd-r1.0.1.tar.gz', 'test': 'https://www.robots.ox.ac.uk/~vgg/data/dtd/download/dtd-r1.0.1.tar.gz', 'val': 'https://www.robots.ox.ac.uk/~vgg/data/dtd/download/dtd-r1.0.1.tar.gz'}), 'citation': '@InProceedings{cimpoi14describing,\n Author = {M. Cimpoi and S. Maji and I. Kokkinos and S. Mohamed and and A. Vedaldi},\n Title = {Describing Textures in the Wild},\n Booktitle = {Proceedings of the {IEEE} Conf. on Computer Vision and Pattern Recognition ({CVPR})},\n Year = {2014}}'})

VERSION: Version = 1.0.0

class EMNIST(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

EMNIST (Extended MNIST) Dataset

Abstract EMNIST is a set of handwritten characters derived from the NIST Special Database 19 and converted to a 28x28 pixel format that directly matches the MNIST dataset. It serves as a challenging “drop-in” replacement for MNIST, introducing handwritten letters and a larger variety of writing styles while preserving the original file structure and pixel density.

Context While the original MNIST dataset is considered “solved” by modern architectures, EMNIST restores the challenge by providing a larger, more diverse benchmark. It bridges the gap between simple digit recognition and complex handwriting tasks, offering up to 62 classes (digits + uppercase + lowercase) to test generalization and writer-independent recognition.

Content The dataset contains up to 814,255 grayscale images (28x28). It is provided in six split configurations to suit different needs: * ByClass & ByMerge: Full unbalanced sets (up to 62 classes). * Balanced: 131,600 images across 47 classes (ideal for benchmarking). * Letters: 145,600 images across 26 classes (A-Z). * Digits & MNIST: 280,000+ images across 10 classes (0-9).

BUILDER_CONFIGS = [EMNISTConfig(name='byclass', version=1.0.0, data_dir=None, data_files=None, description=None), EMNISTConfig(name='bymerge', version=1.0.0, data_dir=None, data_files=None, description=None), EMNISTConfig(name='balanced', version=1.0.0, data_dir=None, data_files=None, description=None), EMNISTConfig(name='letters', version=1.0.0, data_dir=None, data_files=None, description=None), EMNISTConfig(name='digits', version=1.0.0, data_dir=None, data_files=None, description=None), EMNISTConfig(name='mnist', version=1.0.0, data_dir=None, data_files=None, description=None)]

SOURCE: Mapping = mappingproxy({'homepage': 'https://www.nist.gov/itl/iad/image-group/emnist-dataset', 'citation': '@misc{cohen2017emnistextensionmnisthandwritten,\n title={EMNIST: an extension of MNIST to handwritten letters},\n author={Gregory Cohen and Saeed Afshar and Jonathan Tapson and André van Schaik},\n year={2017},\n eprint={1702.05373},\n archivePrefix={arXiv},\n primaryClass={cs.CV},\n url={https://arxiv.org/abs/1702.05373},\n }', 'assets': mappingproxy({'train': 'https://biometrics.nist.gov/cs_links/EMNIST/matlab.zip', 'test': 'https://biometrics.nist.gov/cs_links/EMNIST/matlab.zip'})})

VERSION: Version = 1.0.0

class FacePointing(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Head angle classification dataset.

SOURCE: Mapping = mappingproxy({'homepage': 'http://crowley-coutaz.fr/HeadPoseDataSet/', 'assets': mappingproxy({'train': 'http://crowley-coutaz.fr/HeadPoseDataSet/HeadPoseImageDatabase.tar.gz'}), 'citation': '@inproceedings{gourier2004estimating,\n title={Estimating face orientation from robust detection of salient facial features},\n author={Gourier, Nicolas and Hall, Daniela and Crowley, James L},\n booktitle={ICPR International Workshop on Visual Observation of Deictic Gestures},\n year={2004},\n organization={Citeseer}}'})

VERSION: Version = 1.0.0

class FashionMNIST(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Grayscale image classification.

Fashion-MNIST is a dataset of Zalando’s article images consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes.

SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/zalandoresearch/fashion-mnist', 'assets': mappingproxy({'train': 'http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz', 'test': 'http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz'}), 'citation': '@article{xiao2017fashion,\n title={Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms},\n author={Xiao, Han and Rasul, Kashif and Vollgraf, Roland},\n journal={arXiv preprint arXiv:1708.07747},\n year={2017}}'})

VERSION: Version = 1.0.0

class Flowers102(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Flowers102 Dataset

Abstract The Flowers102 dataset is a fine-grained image classification benchmark consisting of 102 flower categories commonly found in the United Kingdom. It was created to address the challenge of classifying objects with large intra-class variability and small inter-class differences. Each category contains between 40 and 258 images, totaling 8,189 images.

Context Fine-grained visual categorization (FGVC) focuses on differentiating between similar sub-categories of objects (e.g., different species of flowers or birds). Flowers102 serves as a standard benchmark in this domain. Unlike general object recognition (e.g., CIFAR-10), where classes are visually distinct (car vs. dog), Flowers102 requires models to learn subtle features like petal shape, texture, and color patterns.

Content The dataset consists of: - Images: 8,189 images stored in a single archive. - Labels: A MATLAB file mapping each image to one of 102 classes (0-101). - Splits: A predefined split ID file dividing the data into Training (1,020 images), Validation (1,020 images), and Test (6,149 images).

SOURCE: Mapping = mappingproxy({'homepage': 'https://www.robots.ox.ac.uk/~vgg/data/flowers/102/', 'citation': '@inproceedings{nilsback2008flowers102,\n title={Automated flower classification over a large number of classes},\n author={Nilsback, Maria-Elena and Zisserman, Andrew},\n booktitle={2008 Sixth Indian conference on computer vision, graphics \\& image processing},\n pages={722--729},\n year={2008},\n organization={IEEE}}', 'assets': mappingproxy({'images': 'https://www.robots.ox.ac.uk/~vgg/data/flowers/102/102flowers.tgz', 'labels': 'https://www.robots.ox.ac.uk/~vgg/data/flowers/102/imagelabels.mat', 'setid': 'https://www.robots.ox.ac.uk/~vgg/data/flowers/102/setid.mat'})})

VERSION: Version = 1.0.0

class Food101(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

SOURCE: Mapping = mappingproxy({'homepage': 'https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/', 'assets': mappingproxy({'train': 'https://huggingface.co/datasets/haodoz0118/food101-img/resolve/main/food101_train.zip', 'test': 'https://huggingface.co/datasets/haodoz0118/food101-img/resolve/main/food101_test.zip'}), 'citation': '@inproceedings{bossard14,\n title = {Food-101 -- Mining Discriminative Components with Random Forests},\n author = {Bossard, Lukas and Guillaumin, Matthieu and Van Gool, Luc},\n booktitle = {European Conference on Computer Vision},\n year = {2014}}'})

VERSION: Version = 1.0.0

class HASYv2(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

HASYv2 Dataset

Abstract The HASYv2 dataset contains handwritten symbol images of 369 classes. It includes over 168,000 samples categorized into various classes like Latin characters, numerals, and symbols. Each image is 32x32 pixels in size. The dataset was created to benchmark the classification of mathematical symbols and handwritten characters.

Context Recognizing handwritten mathematical symbols is a challenging task due to the similarity between classes (e.g., ‘1’, ‘l’, ‘|’) and the large number of unique symbols used in scientific notation. HASYv2 serves as a standard benchmark for testing classifiers on a large number of classes (369) with low resolution (32x32).

Content The dataset consists of: - Images: 168,236 black-and-white images (32x32 pixels). - Labels: 369 distinct classes. - Splits: The dataset includes 10 pre-defined folds. This implementation uses ‘Fold 1’ as the standard train/test split.

BUILDER_CONFIGS = [BuilderConfig(name='fold-1', version=1.0.0, data_dir=None, data_files=None, description='HASYv2 dataset using fold 1 as the test set.'), BuilderConfig(name='fold-2', version=1.0.0, data_dir=None, data_files=None, description='HASYv2 dataset using fold 2 as the test set.'), BuilderConfig(name='fold-3', version=1.0.0, data_dir=None, data_files=None, description='HASYv2 dataset using fold 3 as the test set.'), BuilderConfig(name='fold-4', version=1.0.0, data_dir=None, data_files=None, description='HASYv2 dataset using fold 4 as the test set.'), BuilderConfig(name='fold-5', version=1.0.0, data_dir=None, data_files=None, description='HASYv2 dataset using fold 5 as the test set.'), BuilderConfig(name='fold-6', version=1.0.0, data_dir=None, data_files=None, description='HASYv2 dataset using fold 6 as the test set.'), BuilderConfig(name='fold-7', version=1.0.0, data_dir=None, data_files=None, description='HASYv2 dataset using fold 7 as the test set.'), BuilderConfig(name='fold-8', version=1.0.0, data_dir=None, data_files=None, description='HASYv2 dataset using fold 8 as the test set.'), BuilderConfig(name='fold-9', version=1.0.0, data_dir=None, data_files=None, description='HASYv2 dataset using fold 9 as the test set.'), BuilderConfig(name='fold-10', version=1.0.0, data_dir=None, data_files=None, description='HASYv2 dataset using fold 10 as the test set.')]

DEFAULT_CONFIG_NAME = 'fold-1'

SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/MartinThoma/HASY', 'citation': '@article{thoma2017hasyv2,\n title={The hasyv2 dataset},\n author={Thoma, Martin},\n journal={arXiv preprint arXiv:1701.08380},\n year={2017}}', 'assets': mappingproxy({'train': 'https://zenodo.org/record/259444/files/HASYv2.tar.bz2?download=1', 'test': 'https://zenodo.org/record/259444/files/HASYv2.tar.bz2?download=1'})})

VERSION: Version = 1.0.0

class KMNIST(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Image classification. The Kuzushiji-MNIST dataset consists of 70,000 28x28 grayscale images of 10 classes of Kuzushiji (cursive Japanese) characters, with 7,000 images per class. There are 60,000 training images and 10,000 test images. Kuzushiji-MNIST is a drop-in replacement for the MNIST dataset, providing a more challenging alternative for benchmarking machine learning algorithms.

SOURCE: Mapping = mappingproxy({'homepage': 'http://codh.rois.ac.jp/kmnist/', 'assets': mappingproxy({'train': 'https://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-train-imgs.npz', 'test': 'https://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-test-imgs.npz'}), 'citation': '@online{clanuwat2018deep,\n author = {Tarin Clanuwat and Mikel Bober-Irizar and Asanobu Kitamoto and Alex Lamb and Kazuaki Yamamoto and David Ha},\n title = {Deep Learning for Classical Japanese Literature},\n date = {2018-12-03},\n year = {2018},\n eprintclass = {cs.CV},\n eprinttype = {arXiv},\n eprint = {cs.CV/1812.01718}}'})

VERSION: Version = 1.0.0

class Linnaeus5(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Linnaeus 5 Dataset

Abstract The Linnaeus 5 dataset contains 1,600 RGB images sized 256x256 pixels, categorized into 5 classes: berry, bird, dog, flower, and other (negative set). It was created to benchmark fine-grained classification and object recognition tasks.

Context While many datasets focus on broad object categories (like CIFAR-10), Linnaeus 5 offers a focused challenge on specific natural objects plus a “negative” class (‘other’). It serves as a good middle-ground benchmark between simple digit recognition (MNIST) and large-scale natural image classification (ImageNet).

Content The dataset consists of: - Images: 8,000 color images (256x256 pixels). - Classes: 5 categories (berry, bird, dog, flower, other). - Splits: Pre-split into Training (1,200 images per class) and Test (400 images per class).

SOURCE: Mapping = mappingproxy({'homepage': 'http://chaladze.com/l5/', 'citation': '@article{chaladze2017linnaeus,\n title={Linnaeus 5 dataset for machine learning},\n author={Chaladze, G and Kalatozishvili, L},\n journal={chaladze.com},\n year={2017}}', 'assets': mappingproxy({'data': 'http://chaladze.com/l5/img/Linnaeus%205%20256X256.rar'})})

VERSION: Version = 1.0.0

class MedMNIST(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

MedMNIST, a large-scale MNIST-like collection of standardized biomedical images, including 12 datasets for 2D and 6 datasets for 3D.

BUILDER_CONFIGS = [MedMNISTConfig(name='pathmnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST PathMNIST (2D)'), MedMNISTConfig(name='chestmnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST ChestMNIST (2D, multi-label)'), MedMNISTConfig(name='dermamnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST DermaMNIST (2D)'), MedMNISTConfig(name='octmnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST OCTMNIST (2D)'), MedMNISTConfig(name='pneumoniamnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST PneumoniaMNIST (2D)'), MedMNISTConfig(name='retinamnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST RetinaMNIST (2D)'), MedMNISTConfig(name='breastmnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST BreastMNIST (2D)'), MedMNISTConfig(name='bloodmnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST BloodMNIST (2D)'), MedMNISTConfig(name='tissuemnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST TissueMNIST (2D)'), MedMNISTConfig(name='organamnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST OrganAMNIST (2D)'), MedMNISTConfig(name='organcmnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST OrganCMNIST (2D)'), MedMNISTConfig(name='organsmnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST OrganSMNIST (2D)'), MedMNISTConfig(name='organmnist3d', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST OrganMNIST3D (3D)'), MedMNISTConfig(name='nodulemnist3d', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST NoduleMNIST3D (3D)'), MedMNISTConfig(name='adrenalmnist3d', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST AdrenalMNIST3D (3D)'), MedMNISTConfig(name='fracturemnist3d', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST FractureMNIST3D (3D)'), MedMNISTConfig(name='vesselmnist3d', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST VesselMNIST3D (3D)'), MedMNISTConfig(name='synapsemnist3d', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST SynapseMNIST3D (3D)')]

VERSION: Version = 1.0.0

class NotMNIST(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

NotMNIST Dataset that contains images of letters A-J.

SOURCE: Mapping = mappingproxy({'homepage': 'https://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html', 'assets': mappingproxy({'train_images': 'https://github.com/davidflanagan/notMNIST-to-MNIST/raw/refs/heads/master/train-images-idx3-ubyte.gz', 'train_labels': 'https://github.com/davidflanagan/notMNIST-to-MNIST/raw/refs/heads/master/train-labels-idx1-ubyte.gz', 'test_images': 'https://github.com/davidflanagan/notMNIST-to-MNIST/raw/refs/heads/master/t10k-images-idx3-ubyte.gz', 'test_labels': 'https://github.com/davidflanagan/notMNIST-to-MNIST/raw/refs/heads/master/t10k-labels-idx1-ubyte.gz'}), 'citation': '@misc{bulatov2011notmnist,\n author={Yaroslav Bulatov},\n title={notMNIST dataset},\n year={2011},\n url={http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html}\n }'})

VERSION: Version = 1.0.0

class RockPaperScissor(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Rock Paper Scissors dataset.

SOURCE: Mapping = mappingproxy({'homepage': 'https://laurencemoroney.com/datasets.html', 'assets': mappingproxy({'train': 'https://storage.googleapis.com/download.tensorflow.org/data/rps.zip', 'test': 'https://storage.googleapis.com/download.tensorflow.org/data/rps-test-set.zip'}), 'citation': '@misc{laurence2019rock,\n title={Rock Paper Scissors Dataset},\n author={Laurence Moroney},\n year={2019},\n url={https://laurencemoroney.com/datasets.html}}', 'license': 'CC By 2.0'})

VERSION: Version = 1.0.0

class STL10(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

STL-10 Dataset

SOURCE: Mapping = mappingproxy({'homepage': 'https://cs.stanford.edu/~acoates/stl10/', 'assets': mappingproxy({'train': 'https://cs.stanford.edu/~acoates/stl10/stl10_binary.tar.gz', 'test': 'https://cs.stanford.edu/~acoates/stl10/stl10_binary.tar.gz', 'unlabeled': 'https://cs.stanford.edu/~acoates/stl10/stl10_binary.tar.gz'}), 'citation': '@article{coates2011analysis,\n title={An analysis of single-layer networks in unsupervised feature learning},\n author={Coates, Adam and Ng, Andrew Y},\n journal={AISTATS},\n year={2011}}'})

VERSION: Version = 1.0.0

class SVHN(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

SVHN (Street View House Numbers) Dataset for image classification.

SVHN is a real-world image dataset for developing machine learning and object recognition algorithms with minimal requirement on data preprocessing and formatting. It can be seen as similar in flavor to MNIST, but incorporates an order of magnitude more labeled data (over 600,000 digit images) and comes from a significantly harder, unsolved, real world problem (recognizing digits and numbers in natural scene images). SVHN is obtained from house numbers in Google Street View images.

SOURCE: Mapping = mappingproxy({'homepage': 'http://ufldl.stanford.edu/housenumbers/', 'assets': mappingproxy({'train': 'http://ufldl.stanford.edu/housenumbers/train_32x32.mat', 'test': 'http://ufldl.stanford.edu/housenumbers/test_32x32.mat', 'extra': 'http://ufldl.stanford.edu/housenumbers/extra_32x32.mat'}), 'citation': '@inproceedings{netzer2011reading,\n title={Reading digits in natural images with unsupervised feature learning},\n author={Netzer, Yuval and Wang, Tao and Coates, Adam and Bissacco, Alessandro and Wu, Baolin and Ng, Andrew Y and others},\n booktitle={NIPS workshop on deep learning and unsupervised feature learning},\n volume={2011},\n number={2},\n pages={4},\n year={2011},\n organization={Granada}\n }'})

VERSION: Version = 1.0.0

class Shapes3D(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

Shapes3D dataset: 10x10x10x8x4x15 factor combinations, 64x64 RGB images.

SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/google-deepmind/3dshapes-dataset/', 'assets': mappingproxy({'train': 'https://huggingface.co/datasets/randall-lab/shapes3d/resolve/main/shapes3d.npz'}), 'license': 'apache-2.0', 'citation': '@InProceedings{pmlr-v80-kim18b,\n title = {Disentangling by Factorising},\n author = {Kim, Hyunjik and Mnih, Andriy},\n booktitle = {Proceedings of the 35th International Conference on Machine Learning},\n pages = {2649--2658},\n year = {2018},\n editor = {Dy, Jennifer and Krause, Andreas},\n volume = {80},\n series = {Proceedings of Machine Learning Research},\n month = {10--15 Jul},\n publisher = {PMLR},\n pdf = {http://proceedings.mlr.press/v80/kim18b/kim18b.pdf},\n url = {https://proceedings.mlr.press/v80/kim18b.html}\n}'})

VERSION: Version = 1.0.0

class SmallNORB(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]

Bases: BaseDatasetBuilder

SmallNORB dataset: 96x96 stereo images with 5 known factors.

SOURCE: Mapping = mappingproxy({'homepage': 'https://cs.nyu.edu/~ylclab/data/norb-v1.0-small/', 'assets': mappingproxy({'train': 'https://huggingface.co/datasets/randall-lab/small-norb/resolve/main/smallnorb-train.zip', 'test': 'https://huggingface.co/datasets/randall-lab/small-norb/resolve/main/smallnorb-test.zip'}), 'license': 'Apache-2.0', 'citation': '@inproceedings{lecun2004learning,\n title={Learning methods for generic object recognition with invariance to pose and lighting},\n author={LeCun, Yann and Huang, Fu Jie and Bottou, Leon},\n booktitle={Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004.},\n volume={2},\n pages={II--104},\n year={2004},\n organization={IEEE}\n}'})

VERSION: Version = 1.0.0