stable_datasets.images package
Submodules
stable_datasets.images.arabic_characters module
- class ArabicCharacters(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderArabic Handwritten Characters Dataset
Abstract Handwritten Arabic character recognition systems face several challenges, including the unlimited variation in human handwriting and large public databases. In this work, we model a deep learning architecture that can be effectively apply to recognizing Arabic handwritten characters. A Convolutional Neural Network (CNN) is a special type of feed-forward multilayer trained in supervised mode. The CNN trained and tested our database that contain 16800 of handwritten Arabic characters. In this paper, the optimization methods implemented to increase the performance of CNN. Common machine learning methods usually apply a combination of feature extractor and trainable classifier. The use of CNN leads to significant improvements across different machine-learning classification algorithms. Our proposed CNN is giving an average 5.1% misclassification error on testing data.
Context The motivation of this study is to use cross knowledge learned from multiple works to enhancement the performance of Arabic handwritten character recognition. In recent years, Arabic handwritten characters recognition with different handwriting styles as well, making it important to find and work on a new and advanced solution for handwriting recognition. A deep learning systems needs a huge number of data (images) to be able to make a good decisions.
Content The data-set is composed of 16,800 characters written by 60 participants, the age range is between 19 to 40 years, and 90% of participants are right-hand. Each participant wrote each character (from ’alef’ to ’yeh’) ten times on two forms as shown in Fig. 7(a) & 7(b). The forms were scanned at the resolution of 300 dpi. Each block is segmented automatically using Matlab 2016a to determining the coordinates for each block. The database is partitioned into two sets: a training set (13,440 characters to 480 images per class) and a test set (3,360 characters to 120 images per class). Writers of training set and test set are exclusive. Ordering of including writers to test set are randomized to make sure that writers of test set are not from a single institution (to ensure variability of the test set).
- SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/mloey/Arabic-Handwritten-Characters-Dataset', 'assets': mappingproxy({'train': 'https://github.com/mloey/Arabic-Handwritten-Characters-Dataset/raw/master/Train%20Images%2013440x32x32.zip', 'test': 'https://github.com/mloey/Arabic-Handwritten-Characters-Dataset/raw/master/Test%20Images%203360x32x32.zip'}), 'citation': '@article{el2017arabic,\n title={Arabic handwritten characters recognition using convolutional neural network},\n author={El-Sawy, Ahmed and Loey, Mohamed and El-Bakry, Hazem},\n journal={WSEAS Transactions on Computer Research},\n volume={5},\n pages={11--19},\n year={2017}}'})
stable_datasets.images.arabic_digits module
- class ArabicDigits(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderArabic Handwritten Digits Dataset.
- SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/mloey/Arabic-Handwritten-Digits-Dataset', 'assets': mappingproxy({'train': 'https://raw.githubusercontent.com/mloey/Arabic-Handwritten-Digits-Dataset/master/Arabic%20Handwritten%20Digits%20Dataset%20CSV.zip', 'test': 'https://raw.githubusercontent.com/mloey/Arabic-Handwritten-Digits-Dataset/master/Arabic%20Handwritten%20Digits%20Dataset%20CSV.zip'}), 'citation': '@inproceedings{el2016cnn,\n title={CNN for handwritten arabic digits recognition based on LeNet-5},\n author={El-Sawy, Ahmed and Hazem, EL-Bakry and Loey, Mohamed},\n booktitle={International conference on advanced intelligent systems and informatics},\n pages={566--575},\n year={2016},\n organization={Springer}\n }'})
stable_datasets.images.awa2 module
- class AWA2(cache_dir: str | None = None, dataset_name: str | None = None, config_name: str | None = None, hash: str | None = None, base_path: str | None = None, info: DatasetInfo | None = None, features: Features | None = None, token: bool | str | None = None, repo_id: str | None = None, data_files: str | list | dict | DataFilesDict | None = None, data_dir: str | None = None, storage_options: dict | None = None, writer_batch_size: int | None = None, config_id: str | None = None, **config_kwargs)[source]
Bases:
GeneratorBasedBuilderThe Animals with Attributes 2 (AwA2) dataset provides images across 50 animal classes, useful for attribute-based classification and zero-shot learning research. See https://cvml.ista.ac.at/AwA2/ for more information.
stable_datasets.images.beans module
- class Beans(cache_dir: str | None = None, dataset_name: str | None = None, config_name: str | None = None, hash: str | None = None, base_path: str | None = None, info: DatasetInfo | None = None, features: Features | None = None, token: bool | str | None = None, repo_id: str | None = None, data_files: str | list | dict | DataFilesDict | None = None, data_dir: str | None = None, storage_options: dict | None = None, writer_batch_size: int | None = None, config_id: str | None = None, **config_kwargs)[source]
Bases:
GeneratorBasedBuilderBean disease dataset for classification of three classes: Angular Leaf Spot, Bean Rust, and Healthy leaves.
stable_datasets.images.cars196 module
- class Cars196(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderCars-196 Dataset The Cars-196 dataset, also known as the Stanford Cars dataset, is a benchmark dataset for fine-grained visual classification of automobiles. It contains 16,185 color images covering 196 car categories, where each category is defined by a specific combination of make, model, and year. The dataset is split into 8,144 training images and 8,041 test images, with the first 98 classes used exclusively for training and the remaining 98 classes reserved for testing, ensuring that training and test classes are disjoint. Images are collected from real-world scenes and exhibit significant variation in v iewpoint, background, and lighting conditions. Each image is annotated with a class label and a tight bounding box around the car, making the dataset suitable for fine-grained recognition tasks that require precise object localization and strong generalization to unseen categories.
- SOURCE: Mapping = mappingproxy({'homepage': 'https://ai.stanford.edu/~jkrause/cars/car_dataset.html', 'assets': mappingproxy({'train': 'https://huggingface.co/datasets/haodoz0118/cars196-img/resolve/main/cars196_train.zip', 'test': 'https://huggingface.co/datasets/haodoz0118/cars196-img/resolve/main/cars196_test.zip'}), 'citation': '@inproceedings{krause20133d,\n title={3d object representations for fine-grained categorization},\n author={Krause, Jonathan and Stark, Michael and Deng, Jia and Fei-Fei, Li},\n booktitle={Proceedings of the IEEE international conference on computer vision workshops},\n pages={554--561},\n year={2013}}'})
stable_datasets.images.cars3d module
- class CARS3D[source]
Bases:
BaseDatasetBuilder183 car types x 24 azimuth angles x 4 elevation angles.
- SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/google-research/disentanglement_lib/tree/master', 'assets': mappingproxy({'train': 'http://www.scottreed.info/files/nips2015-analogy-data.tar.gz'}), 'license': 'Apache-2.0', 'citation': '@inproceedings{locatello2019challenging,\n title={Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations},\n author={Locatello, Francesco and Bauer, Stefan and Lucic, Mario and Raetsch, Gunnar and Gelly, Sylvain and Sch{"o}lkopf, Bernhard and Bachem, Olivier},\n booktitle={International Conference on Machine Learning},\n pages={4114--4124},\n year={2019}\n}'})
stable_datasets.images.cassava module
Legacy Cassava loader (to be refactored into a BaseDatasetBuilder).
This module was moved under stable_datasets.images to align the repository layout. It still exposes the original imperative cassava.load(…) API for now.
- class cassava[source]
Bases:
objectPlant images classification.
The data consists of two folders, a training folder that contains 5 subfolders that contain the respective images for the different 5 classes and a test folder containing test images.
- static download(path)[source]
- static load(path=None)[source]
stable_datasets.images.celeb_a module
- class CelebA(cache_dir: str | None = None, dataset_name: str | None = None, config_name: str | None = None, hash: str | None = None, base_path: str | None = None, info: DatasetInfo | None = None, features: Features | None = None, token: bool | str | None = None, repo_id: str | None = None, data_files: str | list | dict | DataFilesDict | None = None, data_dir: str | None = None, storage_options: dict | None = None, writer_batch_size: int | None = None, config_id: str | None = None, **config_kwargs)[source]
Bases:
GeneratorBasedBuilderThe CelebA dataset is a large-scale face attributes dataset with more than 200K celebrity images, each with 40 attribute annotations.
stable_datasets.images.cifar10 module
- class CIFAR10(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderImage classification. The `CIFAR-10 < https: // www.cs.toronto.edu/~kriz/cifar.html >`_ dataset was collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. It consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class.
- SOURCE: Mapping = mappingproxy({'homepage': 'https://www.cs.toronto.edu/~kriz/cifar.html', 'assets': mappingproxy({'train': 'https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz', 'test': 'https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz'}), 'citation': '@article{krizhevsky2009learning,\n title={Learning multiple layers of features from tiny images},\n author={Krizhevsky, Alex and Hinton, Geoffrey and others},\n year={2009},\n publisher={Toronto, ON, Canada}}'})
stable_datasets.images.cifar100 module
- class CIFAR100(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderCIFAR-100 dataset, a variant of CIFAR-10 with 100 classes.
- SOURCE: Mapping = mappingproxy({'homepage': 'https://www.cs.toronto.edu/~kriz/cifar.html', 'assets': mappingproxy({'train': 'https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz', 'test': 'https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz'}), 'citation': '@article{krizhevsky2009learning,\n title={Learning multiple layers of features from tiny images},\n author={Krizhevsky, Alex and Hinton, Geoffrey and others},\n year={2009},\n publisher={Toronto, ON, Canada}}'})
stable_datasets.images.cifar100_c module
- class CIFAR100C(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderCIFAR-100-C dataset with corrupted CIFAR-100 images.
- SOURCE: Mapping = mappingproxy({'homepage': 'https://zenodo.org/records/3555552', 'assets': mappingproxy({'test': 'https://zenodo.org/records/3555552/files/CIFAR-100-C.tar?download=1'}), 'citation': '@article{hendrycks2019robustness,\n title={Benchmarking Neural Network Robustness to Common Corruptions and Perturbations},\n author={Dan Hendrycks and Thomas Dietterich},\n journal={Proceedings of the International Conference on Learning Representations},\n year={2019}}'})
stable_datasets.images.cifar10_c module
- class CIFAR10C(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderCIFAR-10-C dataset with corrupted CIFAR-10 images.
- SOURCE: Mapping = mappingproxy({'homepage': 'https://zenodo.org/records/2535967', 'assets': mappingproxy({'test': 'https://zenodo.org/records/2535967/files/CIFAR-10-C.tar?download=1'}), 'citation': '@article{hendrycks2019robustness,\n title={Benchmarking Neural Network Robustness to Common Corruptions and Perturbations},\n author={Dan Hendrycks and Thomas Dietterich},\n journal={Proceedings of the International Conference on Learning Representations},\n year={2019}}'})
stable_datasets.images.clevrer module
- class CLEVRER(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderCLEVRER: CoLlision Events for Video REpresentation and Reasoning.
A diagnostic video dataset for systematic evaluation of computational models on a wide range of reasoning tasks. The dataset includes four types of questions: descriptive (e.g., “what color”), explanatory (“what’s responsible for”), predictive (“what will happen next”), and counterfactual (“what if”).
The dataset contains 20,000 synthetic videos of moving and colliding objects. Each video is 5 seconds long and contains 128 frames with resolution 480 x 320.
- Splits:
train: 10,000 videos (index 0 - 9999)
validation: 5,000 videos (index 10000 - 14999)
test: 5,000 videos (index 15000 - 19999)
- SOURCE: Mapping = mappingproxy({'homepage': 'http://clevrer.csail.mit.edu/', 'assets': mappingproxy({'train_videos': 'http://data.csail.mit.edu/clevrer/videos/train/video_train.zip', 'train_annotations': 'http://data.csail.mit.edu/clevrer/annotations/train/annotation_train.zip', 'train_questions': 'http://data.csail.mit.edu/clevrer/questions/train.json', 'validation_videos': 'http://data.csail.mit.edu/clevrer/videos/validation/video_validation.zip', 'validation_annotations': 'http://data.csail.mit.edu/clevrer/annotations/validation/annotation_validation.zip', 'validation_questions': 'http://data.csail.mit.edu/clevrer/questions/validation.json', 'test_videos': 'http://data.csail.mit.edu/clevrer/videos/test/video_test.zip', 'test_questions': 'http://data.csail.mit.edu/clevrer/questions/test.json'}), 'citation': '@inproceedings{yi2020clevrer,\n title={CLEVRER: CoLlision Events for Video REpresentation and Reasoning},\n author={Yi, Kexin and Gan, Chuang and Li, Yunzhu and Kohli, Pushmeet and Wu, Jiajun and Torralba, Antonio and Tenenbaum, Joshua B},\n booktitle={International Conference on Learning Representations},\n year={2020}\n }'})
stable_datasets.images.country211 module
- class Country211(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderCountry211: Image Classification Dataset for Geolocation. This dataset uses a subset of the YFCC100M dataset, filtered by GPS coordinates to include images labeled with ISO-3166 country codes. Each country has a balanced sample of images for training, validation, and testing.
- SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/openai/CLIP/blob/main/data/country211.md', 'assets': mappingproxy({'train': 'https://openaipublic.azureedge.net/clip/data/country211.tgz', 'valid': 'https://openaipublic.azureedge.net/clip/data/country211.tgz', 'test': 'https://openaipublic.azureedge.net/clip/data/country211.tgz'}), 'citation': '@inproceedings{radford2021learning,\n title = {Learning transferable visual models from natural language supervision},\n author = {Radford, Alec and Kim, Jong Wook and Hallacy, Chris and Ramesh, Aditya and Goh, Gabriel and Agarwal, Sandhini and Sastry, Girish and Askell, Amanda and Mishkin, Pamela and Clark, Jack and others},\n booktitle = {International conference on machine learning},\n pages = {8748--8763},\n year = {2021},\n organization = {PmLR} }\n '})
stable_datasets.images.cub200 module
- class CUB200(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderCaltech-UCSD Birds-200-2011 (CUB-200-2011) Dataset
- SOURCE: Mapping = mappingproxy({'homepage': 'https://www.vision.caltech.edu/datasets/cub_200_2011/', 'assets': mappingproxy({'train': 'https://data.caltech.edu/records/65de6-vp158/files/CUB_200_2011.tgz?download=1', 'test': 'https://data.caltech.edu/records/65de6-vp158/files/CUB_200_2011.tgz?download=1'}), 'citation': '@techreport{WahCUB_200_2011,\n Title = {The Caltech-UCSD Birds-200-2011 Dataset},\n Author = {Wah, C. and Branson, S. and Welinder, P. and Perona, P. and Belongie, S.},\n Year = {2011},\n Institution = {California Institute of Technology},\n Number = {CNS-TR-2011-001}}'})
stable_datasets.images.dsprites module
- class DSprites[source]
Bases:
BaseDatasetBuilderdSprites is a dataset of 2D shapes procedurally generated from 6 ground truth independent latent factors. These factors are color, shape, scale, rotation, x and y positions of a sprite.
- SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/deepmind/dsprites-dataset', 'assets': mappingproxy({'train': 'https://github.com/google-deepmind/dsprites-dataset/raw/refs/heads/master/dsprites_ndarray_co1sh3sc6or40x32y32_64x64.npz'}), 'citation': '@inproceedings{higgins2017beta,\n title={beta-vae: Learning basic visual concepts with a constrained variational framework},\n author={Higgins, Irina and Matthey, Loic and Pal, Arka and Burgess, Christopher and Glorot, Xavier and Botvinick, Matthew and Mohamed, Shakir and Lerchner, Alexander},\n booktitle={International conference on learning representations},\n year={2017}'})
stable_datasets.images.dsprites_color module
- class DSpritesColor(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderDSprites dSprites is a dataset of 2D shapes procedurally generated from 6 ground truth independent latent factors. These factors are color, shape, scale, rotation, x and y positions of a sprite.
- SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/deepmind/dsprites-dataset', 'assets': mappingproxy({'train': 'https://github.com/google-deepmind/dsprites-dataset/raw/refs/heads/master/dsprites_ndarray_co1sh3sc6or40x32y32_64x64.npz'}), 'citation': '@inproceedings{locatello2019challenging,\n title={Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations},\n author={Locatello, Francesco and Bauer, Stefan and Lucic, Mario and Raetsch, Gunnar and Gelly, Sylvain and Sch{"o}lkopf, Bernhard and Bachem, Olivier},\n booktitle={International Conference on Machine Learning},\n pages={4114--4124},\n year={2019}\n }'})
stable_datasets.images.dsprites_noise module
- class DSpritesNoise(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderDSprites dSprites is a dataset of 2D shapes procedurally generated from 6 ground truth independent latent factors. These factors are color, shape, scale, rotation, x and y positions of a sprite.
- SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/deepmind/dsprites-dataset', 'assets': mappingproxy({'train': 'https://github.com/google-deepmind/dsprites-dataset/raw/refs/heads/master/dsprites_ndarray_co1sh3sc6or40x32y32_64x64.npz'}), 'citation': '@inproceedings{locatello2019challenging,\n title={Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations},\n author={Locatello, Francesco and Bauer, Stefan and Lucic, Mario and Raetsch, Gunnar and Gelly, Sylvain and Sch{"o}lkopf, Bernhard and Bachem, Olivier},\n booktitle={International Conference on Machine Learning},\n pages={4114--4124},\n year={2019}\n }'})
stable_datasets.images.dsprites_scream module
- class DSpritesScream(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderDSprites dSprites is a dataset of 2D shapes procedurally generated from 6 ground truth independent latent factors. These factors are color, shape, scale, rotation, x and y positions of a sprite.
- SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/deepmind/dsprites-dataset', 'assets': mappingproxy({'train': 'https://github.com/google-deepmind/dsprites-dataset/raw/refs/heads/master/dsprites_ndarray_co1sh3sc6or40x32y32_64x64.npz'}), 'citation': '@inproceedings{higgins2017beta,\n title={beta-vae: Learning basic visual concepts with a constrained variational framework},\n author={Higgins, Irina and Matthey, Loic and Pal, Arka and Burgess, Christopher and Glorot, Xavier and Botvinick, Matthew and Mohamed, Shakir and Lerchner, Alexander},\n booktitle={International conference on learning representations},\n year={2017}'})
stable_datasets.images.dtd module
- class DTD(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderDescribable Textures Dataset (DTD)
DTD is a texture database, consisting of 5640 images, organized according to a list of 47 terms (categories) inspired from human perception. There are 120 images for each category. Image sizes range between 300x300 and 640x640, and the images contain at least 90% of the surface representing the category attribute. The images were collected from Google and Flickr by entering our proposed attributes and related terms as search queries. The images were annotated using Amazon Mechanical Turk in several iterations. For each image we provide key attribute (main category) and a list of joint attributes.
The data is split in three equal parts, in train, validation and test, 40 images per class, for each split. We provide the ground truth annotation for both key and joint attributes, as well as the 10 splits of the data we used for evaluation.
- SOURCE: Mapping = mappingproxy({'homepage': 'https://www.robots.ox.ac.uk/~vgg/data/dtd/', 'assets': mappingproxy({'train': 'https://www.robots.ox.ac.uk/~vgg/data/dtd/download/dtd-r1.0.1.tar.gz', 'test': 'https://www.robots.ox.ac.uk/~vgg/data/dtd/download/dtd-r1.0.1.tar.gz', 'val': 'https://www.robots.ox.ac.uk/~vgg/data/dtd/download/dtd-r1.0.1.tar.gz'}), 'citation': '@InProceedings{cimpoi14describing,\n Author = {M. Cimpoi and S. Maji and I. Kokkinos and S. Mohamed and and A. Vedaldi},\n Title = {Describing Textures in the Wild},\n Booktitle = {Proceedings of the {IEEE} Conf. on Computer Vision and Pattern Recognition ({CVPR})},\n Year = {2014}}'})
stable_datasets.images.e_mnist module
- class EMNIST(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderEMNIST (Extended MNIST) Dataset
Abstract EMNIST is a set of handwritten characters derived from the NIST Special Database 19 and converted to a 28x28 pixel format that directly matches the MNIST dataset. It serves as a challenging “drop-in” replacement for MNIST, introducing handwritten letters and a larger variety of writing styles while preserving the original file structure and pixel density.
Context While the original MNIST dataset is considered “solved” by modern architectures, EMNIST restores the challenge by providing a larger, more diverse benchmark. It bridges the gap between simple digit recognition and complex handwriting tasks, offering up to 62 classes (digits + uppercase + lowercase) to test generalization and writer-independent recognition.
Content The dataset contains up to 814,255 grayscale images (28x28). It is provided in six split configurations to suit different needs: * ByClass & ByMerge: Full unbalanced sets (up to 62 classes). * Balanced: 131,600 images across 47 classes (ideal for benchmarking). * Letters: 145,600 images across 26 classes (A-Z). * Digits & MNIST: 280,000+ images across 10 classes (0-9).
- BUILDER_CONFIGS = [EMNISTConfig(name='byclass', version=1.0.0, data_dir=None, data_files=None, description=None), EMNISTConfig(name='bymerge', version=1.0.0, data_dir=None, data_files=None, description=None), EMNISTConfig(name='balanced', version=1.0.0, data_dir=None, data_files=None, description=None), EMNISTConfig(name='letters', version=1.0.0, data_dir=None, data_files=None, description=None), EMNISTConfig(name='digits', version=1.0.0, data_dir=None, data_files=None, description=None), EMNISTConfig(name='mnist', version=1.0.0, data_dir=None, data_files=None, description=None)]
- SOURCE: Mapping = mappingproxy({'homepage': 'https://www.nist.gov/itl/iad/image-group/emnist-dataset', 'citation': '@misc{cohen2017emnistextensionmnisthandwritten,\n title={EMNIST: an extension of MNIST to handwritten letters},\n author={Gregory Cohen and Saeed Afshar and Jonathan Tapson and André van Schaik},\n year={2017},\n eprint={1702.05373},\n archivePrefix={arXiv},\n primaryClass={cs.CV},\n url={https://arxiv.org/abs/1702.05373},\n }', 'assets': mappingproxy({'train': 'https://biometrics.nist.gov/cs_links/EMNIST/matlab.zip', 'test': 'https://biometrics.nist.gov/cs_links/EMNIST/matlab.zip'})})
- class EMNISTConfig(variant, **kwargs)[source]
Bases:
BuilderConfig
stable_datasets.images.face_pointing module
- class FacePointing(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderHead angle classification dataset.
- SOURCE: Mapping = mappingproxy({'homepage': 'http://crowley-coutaz.fr/HeadPoseDataSet/', 'assets': mappingproxy({'train': 'http://crowley-coutaz.fr/HeadPoseDataSet/HeadPoseImageDatabase.tar.gz'}), 'citation': '@inproceedings{gourier2004estimating,\n title={Estimating face orientation from robust detection of salient facial features},\n author={Gourier, Nicolas and Hall, Daniela and Crowley, James L},\n booktitle={ICPR International Workshop on Visual Observation of Deictic Gestures},\n year={2004},\n organization={Citeseer}}'})
stable_datasets.images.fashion_mnist module
- class FashionMNIST(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderGrayscale image classification.
Fashion-MNIST is a dataset of Zalando’s article images consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes.
- SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/zalandoresearch/fashion-mnist', 'assets': mappingproxy({'train': 'http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz', 'test': 'http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz'}), 'citation': '@article{xiao2017fashion,\n title={Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms},\n author={Xiao, Han and Rasul, Kashif and Vollgraf, Roland},\n journal={arXiv preprint arXiv:1708.07747},\n year={2017}}'})
stable_datasets.images.fgvc_aircraft module
- class FGVCAircraft(cache_dir: str | None = None, dataset_name: str | None = None, config_name: str | None = None, hash: str | None = None, base_path: str | None = None, info: DatasetInfo | None = None, features: Features | None = None, token: bool | str | None = None, repo_id: str | None = None, data_files: str | list | dict | DataFilesDict | None = None, data_dir: str | None = None, storage_options: dict | None = None, writer_batch_size: int | None = None, config_id: str | None = None, **config_kwargs)[source]
Bases:
GeneratorBasedBuilderFGVC Aircraft Dataset.
stable_datasets.images.flowers102 module
- class Flowers102(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderFlowers102 Dataset
Abstract The Flowers102 dataset is a fine-grained image classification benchmark consisting of 102 flower categories commonly found in the United Kingdom. It was created to address the challenge of classifying objects with large intra-class variability and small inter-class differences. Each category contains between 40 and 258 images, totaling 8,189 images.
Context Fine-grained visual categorization (FGVC) focuses on differentiating between similar sub-categories of objects (e.g., different species of flowers or birds). Flowers102 serves as a standard benchmark in this domain. Unlike general object recognition (e.g., CIFAR-10), where classes are visually distinct (car vs. dog), Flowers102 requires models to learn subtle features like petal shape, texture, and color patterns.
Content The dataset consists of: - Images: 8,189 images stored in a single archive. - Labels: A MATLAB file mapping each image to one of 102 classes (0-101). - Splits: A predefined split ID file dividing the data into Training (1,020 images), Validation (1,020 images), and Test (6,149 images).
- SOURCE: Mapping = mappingproxy({'homepage': 'https://www.robots.ox.ac.uk/~vgg/data/flowers/102/', 'citation': '@inproceedings{nilsback2008flowers102,\n title={Automated flower classification over a large number of classes},\n author={Nilsback, Maria-Elena and Zisserman, Andrew},\n booktitle={2008 Sixth Indian conference on computer vision, graphics \\& image processing},\n pages={722--729},\n year={2008},\n organization={IEEE}}', 'assets': mappingproxy({'images': 'https://www.robots.ox.ac.uk/~vgg/data/flowers/102/102flowers.tgz', 'labels': 'https://www.robots.ox.ac.uk/~vgg/data/flowers/102/imagelabels.mat', 'setid': 'https://www.robots.ox.ac.uk/~vgg/data/flowers/102/setid.mat'})})
stable_datasets.images.food101 module
- class Food101(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilder- SOURCE: Mapping = mappingproxy({'homepage': 'https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/', 'assets': mappingproxy({'train': 'https://huggingface.co/datasets/haodoz0118/food101-img/resolve/main/food101_train.zip', 'test': 'https://huggingface.co/datasets/haodoz0118/food101-img/resolve/main/food101_test.zip'}), 'citation': '@inproceedings{bossard14,\n title = {Food-101 -- Mining Discriminative Components with Random Forests},\n author = {Bossard, Lukas and Guillaumin, Matthieu and Van Gool, Luc},\n booktitle = {European Conference on Computer Vision},\n year = {2014}}'})
stable_datasets.images.hasy_v2 module
- class HASYv2(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderHASYv2 Dataset
Abstract The HASYv2 dataset contains handwritten symbol images of 369 classes. It includes over 168,000 samples categorized into various classes like Latin characters, numerals, and symbols. Each image is 32x32 pixels in size. The dataset was created to benchmark the classification of mathematical symbols and handwritten characters.
Context Recognizing handwritten mathematical symbols is a challenging task due to the similarity between classes (e.g., ‘1’, ‘l’, ‘|’) and the large number of unique symbols used in scientific notation. HASYv2 serves as a standard benchmark for testing classifiers on a large number of classes (369) with low resolution (32x32).
Content The dataset consists of: - Images: 168,236 black-and-white images (32x32 pixels). - Labels: 369 distinct classes. - Splits: The dataset includes 10 pre-defined folds. This implementation uses ‘Fold 1’ as the standard train/test split.
- BUILDER_CONFIGS = [BuilderConfig(name='fold-1', version=1.0.0, data_dir=None, data_files=None, description='HASYv2 dataset using fold 1 as the test set.'), BuilderConfig(name='fold-2', version=1.0.0, data_dir=None, data_files=None, description='HASYv2 dataset using fold 2 as the test set.'), BuilderConfig(name='fold-3', version=1.0.0, data_dir=None, data_files=None, description='HASYv2 dataset using fold 3 as the test set.'), BuilderConfig(name='fold-4', version=1.0.0, data_dir=None, data_files=None, description='HASYv2 dataset using fold 4 as the test set.'), BuilderConfig(name='fold-5', version=1.0.0, data_dir=None, data_files=None, description='HASYv2 dataset using fold 5 as the test set.'), BuilderConfig(name='fold-6', version=1.0.0, data_dir=None, data_files=None, description='HASYv2 dataset using fold 6 as the test set.'), BuilderConfig(name='fold-7', version=1.0.0, data_dir=None, data_files=None, description='HASYv2 dataset using fold 7 as the test set.'), BuilderConfig(name='fold-8', version=1.0.0, data_dir=None, data_files=None, description='HASYv2 dataset using fold 8 as the test set.'), BuilderConfig(name='fold-9', version=1.0.0, data_dir=None, data_files=None, description='HASYv2 dataset using fold 9 as the test set.'), BuilderConfig(name='fold-10', version=1.0.0, data_dir=None, data_files=None, description='HASYv2 dataset using fold 10 as the test set.')]
- SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/MartinThoma/HASY', 'citation': '@article{thoma2017hasyv2,\n title={The hasyv2 dataset},\n author={Thoma, Martin},\n journal={arXiv preprint arXiv:1701.08380},\n year={2017}}', 'assets': mappingproxy({'train': 'https://zenodo.org/record/259444/files/HASYv2.tar.bz2?download=1', 'test': 'https://zenodo.org/record/259444/files/HASYv2.tar.bz2?download=1'})})
stable_datasets.images.imagenet module
- exception DownloadError(message='')[source]
Bases:
ExceptionBase class for exceptions in this module.
- download(n_images, min_size, n_threads, wnids_list, out_dir)[source]
- download_images(dir_path, image_url_list, n_images, min_size)[source]
- get_url_request_list_function(request_url)[source]
- get_words_wnid(wnid)[source]
- main(wnid, out_dir, n_threads, n_images, fullsubtree, noroot, nosubtree, min_size)[source]
- mkdir(path)[source]
stable_datasets.images.imagenette module
- class Imagenette(cache_dir: str | None = None, dataset_name: str | None = None, config_name: str | None = None, hash: str | None = None, base_path: str | None = None, info: DatasetInfo | None = None, features: Features | None = None, token: bool | str | None = None, repo_id: str | None = None, data_files: str | list | dict | DataFilesDict | None = None, data_dir: str | None = None, storage_options: dict | None = None, writer_batch_size: int | None = None, config_id: str | None = None, **config_kwargs)[source]
Bases:
GeneratorBasedBuilderTODO: Short description of my dataset.
- BUILDER_CONFIGS = [BuilderConfig(name='imagenet', version=1.1.0, data_dir=None, data_files=None, description='1000-class version'), BuilderConfig(name='imagenette', version=1.1.0, data_dir=None, data_files=None, description='10-class version'), BuilderConfig(name='imagenet100', version=1.1.0, data_dir=None, data_files=None, description='100-class version')]
stable_datasets.images.k_mnist module
- class KMNIST(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderImage classification. The Kuzushiji-MNIST dataset consists of 70,000 28x28 grayscale images of 10 classes of Kuzushiji (cursive Japanese) characters, with 7,000 images per class. There are 60,000 training images and 10,000 test images. Kuzushiji-MNIST is a drop-in replacement for the MNIST dataset, providing a more challenging alternative for benchmarking machine learning algorithms.
- SOURCE: Mapping = mappingproxy({'homepage': 'http://codh.rois.ac.jp/kmnist/', 'assets': mappingproxy({'train': 'https://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-train-imgs.npz', 'test': 'https://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-test-imgs.npz'}), 'citation': '@online{clanuwat2018deep,\n author = {Tarin Clanuwat and Mikel Bober-Irizar and Asanobu Kitamoto and Alex Lamb and Kazuaki Yamamoto and David Ha},\n title = {Deep Learning for Classical Japanese Literature},\n date = {2018-12-03},\n year = {2018},\n eprintclass = {cs.CV},\n eprinttype = {arXiv},\n eprint = {cs.CV/1812.01718}}'})
stable_datasets.images.linnaeus5 module
- class Linnaeus5(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderLinnaeus 5 Dataset
Abstract The Linnaeus 5 dataset contains 1,600 RGB images sized 256x256 pixels, categorized into 5 classes: berry, bird, dog, flower, and other (negative set). It was created to benchmark fine-grained classification and object recognition tasks.
Context While many datasets focus on broad object categories (like CIFAR-10), Linnaeus 5 offers a focused challenge on specific natural objects plus a “negative” class (‘other’). It serves as a good middle-ground benchmark between simple digit recognition (MNIST) and large-scale natural image classification (ImageNet).
Content The dataset consists of: - Images: 8,000 color images (256x256 pixels). - Classes: 5 categories (berry, bird, dog, flower, other). - Splits: Pre-split into Training (1,200 images per class) and Test (400 images per class).
- SOURCE: Mapping = mappingproxy({'homepage': 'http://chaladze.com/l5/', 'citation': '@article{chaladze2017linnaeus,\n title={Linnaeus 5 dataset for machine learning},\n author={Chaladze, G and Kalatozishvili, L},\n journal={chaladze.com},\n year={2017}}', 'assets': mappingproxy({'data': 'http://chaladze.com/l5/img/Linnaeus%205%20256X256.rar'})})
stable_datasets.images.med_mnist module
- class MedMNIST(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderMedMNIST, a large-scale MNIST-like collection of standardized biomedical images, including 12 datasets for 2D and 6 datasets for 3D.
- BUILDER_CONFIGS = [MedMNISTConfig(name='pathmnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST PathMNIST (2D)'), MedMNISTConfig(name='chestmnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST ChestMNIST (2D, multi-label)'), MedMNISTConfig(name='dermamnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST DermaMNIST (2D)'), MedMNISTConfig(name='octmnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST OCTMNIST (2D)'), MedMNISTConfig(name='pneumoniamnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST PneumoniaMNIST (2D)'), MedMNISTConfig(name='retinamnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST RetinaMNIST (2D)'), MedMNISTConfig(name='breastmnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST BreastMNIST (2D)'), MedMNISTConfig(name='bloodmnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST BloodMNIST (2D)'), MedMNISTConfig(name='tissuemnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST TissueMNIST (2D)'), MedMNISTConfig(name='organamnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST OrganAMNIST (2D)'), MedMNISTConfig(name='organcmnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST OrganCMNIST (2D)'), MedMNISTConfig(name='organsmnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST OrganSMNIST (2D)'), MedMNISTConfig(name='organmnist3d', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST OrganMNIST3D (3D)'), MedMNISTConfig(name='nodulemnist3d', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST NoduleMNIST3D (3D)'), MedMNISTConfig(name='adrenalmnist3d', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST AdrenalMNIST3D (3D)'), MedMNISTConfig(name='fracturemnist3d', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST FractureMNIST3D (3D)'), MedMNISTConfig(name='vesselmnist3d', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST VesselMNIST3D (3D)'), MedMNISTConfig(name='synapsemnist3d', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST SynapseMNIST3D (3D)')]
- class MedMNISTConfig(*, num_classes: int, is_3d: bool = False, multi_label: bool = False, **kwargs)[source]
Bases:
BuilderConfigBuilderConfig with per-variant metadata used by MedMNIST._info().
stable_datasets.images.mnist module
- class MNIST(cache_dir: str | None = None, dataset_name: str | None = None, config_name: str | None = None, hash: str | None = None, base_path: str | None = None, info: DatasetInfo | None = None, features: Features | None = None, token: bool | str | None = None, repo_id: str | None = None, data_files: str | list | dict | DataFilesDict | None = None, data_dir: str | None = None, storage_options: dict | None = None, writer_batch_size: int | None = None, config_id: str | None = None, **config_kwargs)[source]
Bases:
GeneratorBasedBuilderMNIST Dataset using raw IDX files for digit classification.
stable_datasets.images.not_mnist module
- class NotMNIST(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderNotMNIST Dataset that contains images of letters A-J.
- SOURCE: Mapping = mappingproxy({'homepage': 'https://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html', 'assets': mappingproxy({'train_images': 'https://github.com/davidflanagan/notMNIST-to-MNIST/raw/refs/heads/master/train-images-idx3-ubyte.gz', 'train_labels': 'https://github.com/davidflanagan/notMNIST-to-MNIST/raw/refs/heads/master/train-labels-idx1-ubyte.gz', 'test_images': 'https://github.com/davidflanagan/notMNIST-to-MNIST/raw/refs/heads/master/t10k-images-idx3-ubyte.gz', 'test_labels': 'https://github.com/davidflanagan/notMNIST-to-MNIST/raw/refs/heads/master/t10k-labels-idx1-ubyte.gz'}), 'citation': '@misc{bulatov2011notmnist,\n author={Yaroslav Bulatov},\n title={notMNIST dataset},\n year={2011},\n url={http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html}\n }'})
stable_datasets.images.patch_camelyon module
PatchCamelyon dataset (stub).
This file was previously a broken legacy loader at the top-level package. It was moved under stable_datasets.images to match the repository layout.
TODO: Implement as a HuggingFace-compatible builder using BaseDatasetBuilder and the local download helpers in stable_datasets.utils.
- class PatchCamelyon(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilder
stable_datasets.images.places365_small module
- class Places365Small(cache_dir: str | None = None, dataset_name: str | None = None, config_name: str | None = None, hash: str | None = None, base_path: str | None = None, info: DatasetInfo | None = None, features: Features | None = None, token: bool | str | None = None, repo_id: str | None = None, data_files: str | list | dict | DataFilesDict | None = None, data_dir: str | None = None, storage_options: dict | None = None, writer_batch_size: int | None = None, config_id: str | None = None, **config_kwargs)[source]
Bases:
GeneratorBasedBuilderThe Places365-Standard dataset (small version) for image classification.
- static extract_train_class(input_string)[source]
stable_datasets.images.rock_paper_scissor module
- class RockPaperScissor(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderRock Paper Scissors dataset.
- SOURCE: Mapping = mappingproxy({'homepage': 'https://laurencemoroney.com/datasets.html', 'assets': mappingproxy({'train': 'https://storage.googleapis.com/download.tensorflow.org/data/rps.zip', 'test': 'https://storage.googleapis.com/download.tensorflow.org/data/rps-test-set.zip'}), 'citation': '@misc{laurence2019rock,\n title={Rock Paper Scissors Dataset},\n author={Laurence Moroney},\n year={2019},\n url={https://laurencemoroney.com/datasets.html}}', 'license': 'CC By 2.0'})
stable_datasets.images.shapes3d module
- class Shapes3D(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderShapes3D dataset: 10x10x10x8x4x15 factor combinations, 64x64 RGB images.
- SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/google-deepmind/3dshapes-dataset/', 'assets': mappingproxy({'train': 'https://huggingface.co/datasets/randall-lab/shapes3d/resolve/main/shapes3d.npz'}), 'license': 'apache-2.0', 'citation': '@InProceedings{pmlr-v80-kim18b,\n title = {Disentangling by Factorising},\n author = {Kim, Hyunjik and Mnih, Andriy},\n booktitle = {Proceedings of the 35th International Conference on Machine Learning},\n pages = {2649--2658},\n year = {2018},\n editor = {Dy, Jennifer and Krause, Andreas},\n volume = {80},\n series = {Proceedings of Machine Learning Research},\n month = {10--15 Jul},\n publisher = {PMLR},\n pdf = {http://proceedings.mlr.press/v80/kim18b/kim18b.pdf},\n url = {https://proceedings.mlr.press/v80/kim18b.html}\n}'})
stable_datasets.images.small_norb module
- class SmallNORB(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderSmallNORB dataset: 96x96 stereo images with 5 known factors.
- SOURCE: Mapping = mappingproxy({'homepage': 'https://cs.nyu.edu/~ylclab/data/norb-v1.0-small/', 'assets': mappingproxy({'train': 'https://huggingface.co/datasets/randall-lab/small-norb/resolve/main/smallnorb-train.zip', 'test': 'https://huggingface.co/datasets/randall-lab/small-norb/resolve/main/smallnorb-test.zip'}), 'license': 'Apache-2.0', 'citation': '@inproceedings{lecun2004learning,\n title={Learning methods for generic object recognition with invariance to pose and lighting},\n author={LeCun, Yann and Huang, Fu Jie and Bottou, Leon},\n booktitle={Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004.},\n volume={2},\n pages={II--104},\n year={2004},\n organization={IEEE}\n}'})
stable_datasets.images.stl10 module
- class STL10(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderSTL-10 Dataset
- SOURCE: Mapping = mappingproxy({'homepage': 'https://cs.stanford.edu/~acoates/stl10/', 'assets': mappingproxy({'train': 'https://cs.stanford.edu/~acoates/stl10/stl10_binary.tar.gz', 'test': 'https://cs.stanford.edu/~acoates/stl10/stl10_binary.tar.gz', 'unlabeled': 'https://cs.stanford.edu/~acoates/stl10/stl10_binary.tar.gz'}), 'citation': '@article{coates2011analysis,\n title={An analysis of single-layer networks in unsupervised feature learning},\n author={Coates, Adam and Ng, Andrew Y},\n journal={AISTATS},\n year={2011}}'})
stable_datasets.images.svhn module
- class SVHN(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderSVHN (Street View House Numbers) Dataset for image classification.
SVHN is a real-world image dataset for developing machine learning and object recognition algorithms with minimal requirement on data preprocessing and formatting. It can be seen as similar in flavor to MNIST, but incorporates an order of magnitude more labeled data (over 600,000 digit images) and comes from a significantly harder, unsolved, real world problem (recognizing digits and numbers in natural scene images). SVHN is obtained from house numbers in Google Street View images.
- SOURCE: Mapping = mappingproxy({'homepage': 'http://ufldl.stanford.edu/housenumbers/', 'assets': mappingproxy({'train': 'http://ufldl.stanford.edu/housenumbers/train_32x32.mat', 'test': 'http://ufldl.stanford.edu/housenumbers/test_32x32.mat', 'extra': 'http://ufldl.stanford.edu/housenumbers/extra_32x32.mat'}), 'citation': '@inproceedings{netzer2011reading,\n title={Reading digits in natural images with unsupervised feature learning},\n author={Netzer, Yuval and Wang, Tao and Coates, Adam and Bissacco, Alessandro and Wu, Baolin and Ng, Andrew Y and others},\n booktitle={NIPS workshop on deep learning and unsupervised feature learning},\n volume={2011},\n number={2},\n pages={4},\n year={2011},\n organization={Granada}\n }'})
stable_datasets.images.tiny_imagenet module
- class TinyImagenet(cache_dir: str | None = None, dataset_name: str | None = None, config_name: str | None = None, hash: str | None = None, base_path: str | None = None, info: DatasetInfo | None = None, features: Features | None = None, token: bool | str | None = None, repo_id: str | None = None, data_files: str | list | dict | DataFilesDict | None = None, data_dir: str | None = None, storage_options: dict | None = None, writer_batch_size: int | None = None, config_id: str | None = None, **config_kwargs)[source]
Bases:
GeneratorBasedBuilderTiny ImageNet dataset for image classification tasks. It contains 200 classes with 500 training images, 50 validation images, and 50 test images per class.
stable_datasets.images.tiny_imagenet_c module
- class TinyImagenetC(cache_dir: str | None = None, dataset_name: str | None = None, config_name: str | None = None, hash: str | None = None, base_path: str | None = None, info: DatasetInfo | None = None, features: Features | None = None, token: bool | str | None = None, repo_id: str | None = None, data_files: str | list | dict | DataFilesDict | None = None, data_dir: str | None = None, storage_options: dict | None = None, writer_batch_size: int | None = None, config_id: str | None = None, **config_kwargs)[source]
Bases:
GeneratorBasedBuilderTiny ImageNet-C dataset for image classification tasks with corruptions applied.
Module contents
- class ArabicCharacters(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderArabic Handwritten Characters Dataset
Abstract Handwritten Arabic character recognition systems face several challenges, including the unlimited variation in human handwriting and large public databases. In this work, we model a deep learning architecture that can be effectively apply to recognizing Arabic handwritten characters. A Convolutional Neural Network (CNN) is a special type of feed-forward multilayer trained in supervised mode. The CNN trained and tested our database that contain 16800 of handwritten Arabic characters. In this paper, the optimization methods implemented to increase the performance of CNN. Common machine learning methods usually apply a combination of feature extractor and trainable classifier. The use of CNN leads to significant improvements across different machine-learning classification algorithms. Our proposed CNN is giving an average 5.1% misclassification error on testing data.
Context The motivation of this study is to use cross knowledge learned from multiple works to enhancement the performance of Arabic handwritten character recognition. In recent years, Arabic handwritten characters recognition with different handwriting styles as well, making it important to find and work on a new and advanced solution for handwriting recognition. A deep learning systems needs a huge number of data (images) to be able to make a good decisions.
Content The data-set is composed of 16,800 characters written by 60 participants, the age range is between 19 to 40 years, and 90% of participants are right-hand. Each participant wrote each character (from ’alef’ to ’yeh’) ten times on two forms as shown in Fig. 7(a) & 7(b). The forms were scanned at the resolution of 300 dpi. Each block is segmented automatically using Matlab 2016a to determining the coordinates for each block. The database is partitioned into two sets: a training set (13,440 characters to 480 images per class) and a test set (3,360 characters to 120 images per class). Writers of training set and test set are exclusive. Ordering of including writers to test set are randomized to make sure that writers of test set are not from a single institution (to ensure variability of the test set).
- SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/mloey/Arabic-Handwritten-Characters-Dataset', 'assets': mappingproxy({'train': 'https://github.com/mloey/Arabic-Handwritten-Characters-Dataset/raw/master/Train%20Images%2013440x32x32.zip', 'test': 'https://github.com/mloey/Arabic-Handwritten-Characters-Dataset/raw/master/Test%20Images%203360x32x32.zip'}), 'citation': '@article{el2017arabic,\n title={Arabic handwritten characters recognition using convolutional neural network},\n author={El-Sawy, Ahmed and Loey, Mohamed and El-Bakry, Hazem},\n journal={WSEAS Transactions on Computer Research},\n volume={5},\n pages={11--19},\n year={2017}}'})
- class ArabicDigits(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderArabic Handwritten Digits Dataset.
- SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/mloey/Arabic-Handwritten-Digits-Dataset', 'assets': mappingproxy({'train': 'https://raw.githubusercontent.com/mloey/Arabic-Handwritten-Digits-Dataset/master/Arabic%20Handwritten%20Digits%20Dataset%20CSV.zip', 'test': 'https://raw.githubusercontent.com/mloey/Arabic-Handwritten-Digits-Dataset/master/Arabic%20Handwritten%20Digits%20Dataset%20CSV.zip'}), 'citation': '@inproceedings{el2016cnn,\n title={CNN for handwritten arabic digits recognition based on LeNet-5},\n author={El-Sawy, Ahmed and Hazem, EL-Bakry and Loey, Mohamed},\n booktitle={International conference on advanced intelligent systems and informatics},\n pages={566--575},\n year={2016},\n organization={Springer}\n }'})
- class CARS3D[source]
Bases:
BaseDatasetBuilder183 car types x 24 azimuth angles x 4 elevation angles.
- SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/google-research/disentanglement_lib/tree/master', 'assets': mappingproxy({'train': 'http://www.scottreed.info/files/nips2015-analogy-data.tar.gz'}), 'license': 'Apache-2.0', 'citation': '@inproceedings{locatello2019challenging,\n title={Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations},\n author={Locatello, Francesco and Bauer, Stefan and Lucic, Mario and Raetsch, Gunnar and Gelly, Sylvain and Sch{"o}lkopf, Bernhard and Bachem, Olivier},\n booktitle={International Conference on Machine Learning},\n pages={4114--4124},\n year={2019}\n}'})
- class CIFAR10(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderImage classification. The `CIFAR-10 < https: // www.cs.toronto.edu/~kriz/cifar.html >`_ dataset was collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. It consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class.
- SOURCE: Mapping = mappingproxy({'homepage': 'https://www.cs.toronto.edu/~kriz/cifar.html', 'assets': mappingproxy({'train': 'https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz', 'test': 'https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz'}), 'citation': '@article{krizhevsky2009learning,\n title={Learning multiple layers of features from tiny images},\n author={Krizhevsky, Alex and Hinton, Geoffrey and others},\n year={2009},\n publisher={Toronto, ON, Canada}}'})
- class CIFAR100(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderCIFAR-100 dataset, a variant of CIFAR-10 with 100 classes.
- SOURCE: Mapping = mappingproxy({'homepage': 'https://www.cs.toronto.edu/~kriz/cifar.html', 'assets': mappingproxy({'train': 'https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz', 'test': 'https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz'}), 'citation': '@article{krizhevsky2009learning,\n title={Learning multiple layers of features from tiny images},\n author={Krizhevsky, Alex and Hinton, Geoffrey and others},\n year={2009},\n publisher={Toronto, ON, Canada}}'})
- class CIFAR100C(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderCIFAR-100-C dataset with corrupted CIFAR-100 images.
- SOURCE: Mapping = mappingproxy({'homepage': 'https://zenodo.org/records/3555552', 'assets': mappingproxy({'test': 'https://zenodo.org/records/3555552/files/CIFAR-100-C.tar?download=1'}), 'citation': '@article{hendrycks2019robustness,\n title={Benchmarking Neural Network Robustness to Common Corruptions and Perturbations},\n author={Dan Hendrycks and Thomas Dietterich},\n journal={Proceedings of the International Conference on Learning Representations},\n year={2019}}'})
- class CIFAR10C(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderCIFAR-10-C dataset with corrupted CIFAR-10 images.
- SOURCE: Mapping = mappingproxy({'homepage': 'https://zenodo.org/records/2535967', 'assets': mappingproxy({'test': 'https://zenodo.org/records/2535967/files/CIFAR-10-C.tar?download=1'}), 'citation': '@article{hendrycks2019robustness,\n title={Benchmarking Neural Network Robustness to Common Corruptions and Perturbations},\n author={Dan Hendrycks and Thomas Dietterich},\n journal={Proceedings of the International Conference on Learning Representations},\n year={2019}}'})
- class CLEVRER(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderCLEVRER: CoLlision Events for Video REpresentation and Reasoning.
A diagnostic video dataset for systematic evaluation of computational models on a wide range of reasoning tasks. The dataset includes four types of questions: descriptive (e.g., “what color”), explanatory (“what’s responsible for”), predictive (“what will happen next”), and counterfactual (“what if”).
The dataset contains 20,000 synthetic videos of moving and colliding objects. Each video is 5 seconds long and contains 128 frames with resolution 480 x 320.
- Splits:
train: 10,000 videos (index 0 - 9999)
validation: 5,000 videos (index 10000 - 14999)
test: 5,000 videos (index 15000 - 19999)
- SOURCE: Mapping = mappingproxy({'homepage': 'http://clevrer.csail.mit.edu/', 'assets': mappingproxy({'train_videos': 'http://data.csail.mit.edu/clevrer/videos/train/video_train.zip', 'train_annotations': 'http://data.csail.mit.edu/clevrer/annotations/train/annotation_train.zip', 'train_questions': 'http://data.csail.mit.edu/clevrer/questions/train.json', 'validation_videos': 'http://data.csail.mit.edu/clevrer/videos/validation/video_validation.zip', 'validation_annotations': 'http://data.csail.mit.edu/clevrer/annotations/validation/annotation_validation.zip', 'validation_questions': 'http://data.csail.mit.edu/clevrer/questions/validation.json', 'test_videos': 'http://data.csail.mit.edu/clevrer/videos/test/video_test.zip', 'test_questions': 'http://data.csail.mit.edu/clevrer/questions/test.json'}), 'citation': '@inproceedings{yi2020clevrer,\n title={CLEVRER: CoLlision Events for Video REpresentation and Reasoning},\n author={Yi, Kexin and Gan, Chuang and Li, Yunzhu and Kohli, Pushmeet and Wu, Jiajun and Torralba, Antonio and Tenenbaum, Joshua B},\n booktitle={International Conference on Learning Representations},\n year={2020}\n }'})
- class CUB200(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderCaltech-UCSD Birds-200-2011 (CUB-200-2011) Dataset
- SOURCE: Mapping = mappingproxy({'homepage': 'https://www.vision.caltech.edu/datasets/cub_200_2011/', 'assets': mappingproxy({'train': 'https://data.caltech.edu/records/65de6-vp158/files/CUB_200_2011.tgz?download=1', 'test': 'https://data.caltech.edu/records/65de6-vp158/files/CUB_200_2011.tgz?download=1'}), 'citation': '@techreport{WahCUB_200_2011,\n Title = {The Caltech-UCSD Birds-200-2011 Dataset},\n Author = {Wah, C. and Branson, S. and Welinder, P. and Perona, P. and Belongie, S.},\n Year = {2011},\n Institution = {California Institute of Technology},\n Number = {CNS-TR-2011-001}}'})
- class Cars196(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderCars-196 Dataset The Cars-196 dataset, also known as the Stanford Cars dataset, is a benchmark dataset for fine-grained visual classification of automobiles. It contains 16,185 color images covering 196 car categories, where each category is defined by a specific combination of make, model, and year. The dataset is split into 8,144 training images and 8,041 test images, with the first 98 classes used exclusively for training and the remaining 98 classes reserved for testing, ensuring that training and test classes are disjoint. Images are collected from real-world scenes and exhibit significant variation in v iewpoint, background, and lighting conditions. Each image is annotated with a class label and a tight bounding box around the car, making the dataset suitable for fine-grained recognition tasks that require precise object localization and strong generalization to unseen categories.
- SOURCE: Mapping = mappingproxy({'homepage': 'https://ai.stanford.edu/~jkrause/cars/car_dataset.html', 'assets': mappingproxy({'train': 'https://huggingface.co/datasets/haodoz0118/cars196-img/resolve/main/cars196_train.zip', 'test': 'https://huggingface.co/datasets/haodoz0118/cars196-img/resolve/main/cars196_test.zip'}), 'citation': '@inproceedings{krause20133d,\n title={3d object representations for fine-grained categorization},\n author={Krause, Jonathan and Stark, Michael and Deng, Jia and Fei-Fei, Li},\n booktitle={Proceedings of the IEEE international conference on computer vision workshops},\n pages={554--561},\n year={2013}}'})
- class Country211(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderCountry211: Image Classification Dataset for Geolocation. This dataset uses a subset of the YFCC100M dataset, filtered by GPS coordinates to include images labeled with ISO-3166 country codes. Each country has a balanced sample of images for training, validation, and testing.
- SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/openai/CLIP/blob/main/data/country211.md', 'assets': mappingproxy({'train': 'https://openaipublic.azureedge.net/clip/data/country211.tgz', 'valid': 'https://openaipublic.azureedge.net/clip/data/country211.tgz', 'test': 'https://openaipublic.azureedge.net/clip/data/country211.tgz'}), 'citation': '@inproceedings{radford2021learning,\n title = {Learning transferable visual models from natural language supervision},\n author = {Radford, Alec and Kim, Jong Wook and Hallacy, Chris and Ramesh, Aditya and Goh, Gabriel and Agarwal, Sandhini and Sastry, Girish and Askell, Amanda and Mishkin, Pamela and Clark, Jack and others},\n booktitle = {International conference on machine learning},\n pages = {8748--8763},\n year = {2021},\n organization = {PmLR} }\n '})
- class DSprites[source]
Bases:
BaseDatasetBuilderdSprites is a dataset of 2D shapes procedurally generated from 6 ground truth independent latent factors. These factors are color, shape, scale, rotation, x and y positions of a sprite.
- SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/deepmind/dsprites-dataset', 'assets': mappingproxy({'train': 'https://github.com/google-deepmind/dsprites-dataset/raw/refs/heads/master/dsprites_ndarray_co1sh3sc6or40x32y32_64x64.npz'}), 'citation': '@inproceedings{higgins2017beta,\n title={beta-vae: Learning basic visual concepts with a constrained variational framework},\n author={Higgins, Irina and Matthey, Loic and Pal, Arka and Burgess, Christopher and Glorot, Xavier and Botvinick, Matthew and Mohamed, Shakir and Lerchner, Alexander},\n booktitle={International conference on learning representations},\n year={2017}'})
- class DSpritesColor(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderDSprites dSprites is a dataset of 2D shapes procedurally generated from 6 ground truth independent latent factors. These factors are color, shape, scale, rotation, x and y positions of a sprite.
- SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/deepmind/dsprites-dataset', 'assets': mappingproxy({'train': 'https://github.com/google-deepmind/dsprites-dataset/raw/refs/heads/master/dsprites_ndarray_co1sh3sc6or40x32y32_64x64.npz'}), 'citation': '@inproceedings{locatello2019challenging,\n title={Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations},\n author={Locatello, Francesco and Bauer, Stefan and Lucic, Mario and Raetsch, Gunnar and Gelly, Sylvain and Sch{"o}lkopf, Bernhard and Bachem, Olivier},\n booktitle={International Conference on Machine Learning},\n pages={4114--4124},\n year={2019}\n }'})
- class DSpritesNoise(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderDSprites dSprites is a dataset of 2D shapes procedurally generated from 6 ground truth independent latent factors. These factors are color, shape, scale, rotation, x and y positions of a sprite.
- SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/deepmind/dsprites-dataset', 'assets': mappingproxy({'train': 'https://github.com/google-deepmind/dsprites-dataset/raw/refs/heads/master/dsprites_ndarray_co1sh3sc6or40x32y32_64x64.npz'}), 'citation': '@inproceedings{locatello2019challenging,\n title={Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations},\n author={Locatello, Francesco and Bauer, Stefan and Lucic, Mario and Raetsch, Gunnar and Gelly, Sylvain and Sch{"o}lkopf, Bernhard and Bachem, Olivier},\n booktitle={International Conference on Machine Learning},\n pages={4114--4124},\n year={2019}\n }'})
- class DSpritesScream(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderDSprites dSprites is a dataset of 2D shapes procedurally generated from 6 ground truth independent latent factors. These factors are color, shape, scale, rotation, x and y positions of a sprite.
- SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/deepmind/dsprites-dataset', 'assets': mappingproxy({'train': 'https://github.com/google-deepmind/dsprites-dataset/raw/refs/heads/master/dsprites_ndarray_co1sh3sc6or40x32y32_64x64.npz'}), 'citation': '@inproceedings{higgins2017beta,\n title={beta-vae: Learning basic visual concepts with a constrained variational framework},\n author={Higgins, Irina and Matthey, Loic and Pal, Arka and Burgess, Christopher and Glorot, Xavier and Botvinick, Matthew and Mohamed, Shakir and Lerchner, Alexander},\n booktitle={International conference on learning representations},\n year={2017}'})
- class DTD(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderDescribable Textures Dataset (DTD)
DTD is a texture database, consisting of 5640 images, organized according to a list of 47 terms (categories) inspired from human perception. There are 120 images for each category. Image sizes range between 300x300 and 640x640, and the images contain at least 90% of the surface representing the category attribute. The images were collected from Google and Flickr by entering our proposed attributes and related terms as search queries. The images were annotated using Amazon Mechanical Turk in several iterations. For each image we provide key attribute (main category) and a list of joint attributes.
The data is split in three equal parts, in train, validation and test, 40 images per class, for each split. We provide the ground truth annotation for both key and joint attributes, as well as the 10 splits of the data we used for evaluation.
- SOURCE: Mapping = mappingproxy({'homepage': 'https://www.robots.ox.ac.uk/~vgg/data/dtd/', 'assets': mappingproxy({'train': 'https://www.robots.ox.ac.uk/~vgg/data/dtd/download/dtd-r1.0.1.tar.gz', 'test': 'https://www.robots.ox.ac.uk/~vgg/data/dtd/download/dtd-r1.0.1.tar.gz', 'val': 'https://www.robots.ox.ac.uk/~vgg/data/dtd/download/dtd-r1.0.1.tar.gz'}), 'citation': '@InProceedings{cimpoi14describing,\n Author = {M. Cimpoi and S. Maji and I. Kokkinos and S. Mohamed and and A. Vedaldi},\n Title = {Describing Textures in the Wild},\n Booktitle = {Proceedings of the {IEEE} Conf. on Computer Vision and Pattern Recognition ({CVPR})},\n Year = {2014}}'})
- class EMNIST(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderEMNIST (Extended MNIST) Dataset
Abstract EMNIST is a set of handwritten characters derived from the NIST Special Database 19 and converted to a 28x28 pixel format that directly matches the MNIST dataset. It serves as a challenging “drop-in” replacement for MNIST, introducing handwritten letters and a larger variety of writing styles while preserving the original file structure and pixel density.
Context While the original MNIST dataset is considered “solved” by modern architectures, EMNIST restores the challenge by providing a larger, more diverse benchmark. It bridges the gap between simple digit recognition and complex handwriting tasks, offering up to 62 classes (digits + uppercase + lowercase) to test generalization and writer-independent recognition.
Content The dataset contains up to 814,255 grayscale images (28x28). It is provided in six split configurations to suit different needs: * ByClass & ByMerge: Full unbalanced sets (up to 62 classes). * Balanced: 131,600 images across 47 classes (ideal for benchmarking). * Letters: 145,600 images across 26 classes (A-Z). * Digits & MNIST: 280,000+ images across 10 classes (0-9).
- BUILDER_CONFIGS = [EMNISTConfig(name='byclass', version=1.0.0, data_dir=None, data_files=None, description=None), EMNISTConfig(name='bymerge', version=1.0.0, data_dir=None, data_files=None, description=None), EMNISTConfig(name='balanced', version=1.0.0, data_dir=None, data_files=None, description=None), EMNISTConfig(name='letters', version=1.0.0, data_dir=None, data_files=None, description=None), EMNISTConfig(name='digits', version=1.0.0, data_dir=None, data_files=None, description=None), EMNISTConfig(name='mnist', version=1.0.0, data_dir=None, data_files=None, description=None)]
- SOURCE: Mapping = mappingproxy({'homepage': 'https://www.nist.gov/itl/iad/image-group/emnist-dataset', 'citation': '@misc{cohen2017emnistextensionmnisthandwritten,\n title={EMNIST: an extension of MNIST to handwritten letters},\n author={Gregory Cohen and Saeed Afshar and Jonathan Tapson and André van Schaik},\n year={2017},\n eprint={1702.05373},\n archivePrefix={arXiv},\n primaryClass={cs.CV},\n url={https://arxiv.org/abs/1702.05373},\n }', 'assets': mappingproxy({'train': 'https://biometrics.nist.gov/cs_links/EMNIST/matlab.zip', 'test': 'https://biometrics.nist.gov/cs_links/EMNIST/matlab.zip'})})
- class FacePointing(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderHead angle classification dataset.
- SOURCE: Mapping = mappingproxy({'homepage': 'http://crowley-coutaz.fr/HeadPoseDataSet/', 'assets': mappingproxy({'train': 'http://crowley-coutaz.fr/HeadPoseDataSet/HeadPoseImageDatabase.tar.gz'}), 'citation': '@inproceedings{gourier2004estimating,\n title={Estimating face orientation from robust detection of salient facial features},\n author={Gourier, Nicolas and Hall, Daniela and Crowley, James L},\n booktitle={ICPR International Workshop on Visual Observation of Deictic Gestures},\n year={2004},\n organization={Citeseer}}'})
- class FashionMNIST(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderGrayscale image classification.
Fashion-MNIST is a dataset of Zalando’s article images consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes.
- SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/zalandoresearch/fashion-mnist', 'assets': mappingproxy({'train': 'http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz', 'test': 'http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz'}), 'citation': '@article{xiao2017fashion,\n title={Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms},\n author={Xiao, Han and Rasul, Kashif and Vollgraf, Roland},\n journal={arXiv preprint arXiv:1708.07747},\n year={2017}}'})
- class Flowers102(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderFlowers102 Dataset
Abstract The Flowers102 dataset is a fine-grained image classification benchmark consisting of 102 flower categories commonly found in the United Kingdom. It was created to address the challenge of classifying objects with large intra-class variability and small inter-class differences. Each category contains between 40 and 258 images, totaling 8,189 images.
Context Fine-grained visual categorization (FGVC) focuses on differentiating between similar sub-categories of objects (e.g., different species of flowers or birds). Flowers102 serves as a standard benchmark in this domain. Unlike general object recognition (e.g., CIFAR-10), where classes are visually distinct (car vs. dog), Flowers102 requires models to learn subtle features like petal shape, texture, and color patterns.
Content The dataset consists of: - Images: 8,189 images stored in a single archive. - Labels: A MATLAB file mapping each image to one of 102 classes (0-101). - Splits: A predefined split ID file dividing the data into Training (1,020 images), Validation (1,020 images), and Test (6,149 images).
- SOURCE: Mapping = mappingproxy({'homepage': 'https://www.robots.ox.ac.uk/~vgg/data/flowers/102/', 'citation': '@inproceedings{nilsback2008flowers102,\n title={Automated flower classification over a large number of classes},\n author={Nilsback, Maria-Elena and Zisserman, Andrew},\n booktitle={2008 Sixth Indian conference on computer vision, graphics \\& image processing},\n pages={722--729},\n year={2008},\n organization={IEEE}}', 'assets': mappingproxy({'images': 'https://www.robots.ox.ac.uk/~vgg/data/flowers/102/102flowers.tgz', 'labels': 'https://www.robots.ox.ac.uk/~vgg/data/flowers/102/imagelabels.mat', 'setid': 'https://www.robots.ox.ac.uk/~vgg/data/flowers/102/setid.mat'})})
- class Food101(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilder- SOURCE: Mapping = mappingproxy({'homepage': 'https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/', 'assets': mappingproxy({'train': 'https://huggingface.co/datasets/haodoz0118/food101-img/resolve/main/food101_train.zip', 'test': 'https://huggingface.co/datasets/haodoz0118/food101-img/resolve/main/food101_test.zip'}), 'citation': '@inproceedings{bossard14,\n title = {Food-101 -- Mining Discriminative Components with Random Forests},\n author = {Bossard, Lukas and Guillaumin, Matthieu and Van Gool, Luc},\n booktitle = {European Conference on Computer Vision},\n year = {2014}}'})
- class HASYv2(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderHASYv2 Dataset
Abstract The HASYv2 dataset contains handwritten symbol images of 369 classes. It includes over 168,000 samples categorized into various classes like Latin characters, numerals, and symbols. Each image is 32x32 pixels in size. The dataset was created to benchmark the classification of mathematical symbols and handwritten characters.
Context Recognizing handwritten mathematical symbols is a challenging task due to the similarity between classes (e.g., ‘1’, ‘l’, ‘|’) and the large number of unique symbols used in scientific notation. HASYv2 serves as a standard benchmark for testing classifiers on a large number of classes (369) with low resolution (32x32).
Content The dataset consists of: - Images: 168,236 black-and-white images (32x32 pixels). - Labels: 369 distinct classes. - Splits: The dataset includes 10 pre-defined folds. This implementation uses ‘Fold 1’ as the standard train/test split.
- BUILDER_CONFIGS = [BuilderConfig(name='fold-1', version=1.0.0, data_dir=None, data_files=None, description='HASYv2 dataset using fold 1 as the test set.'), BuilderConfig(name='fold-2', version=1.0.0, data_dir=None, data_files=None, description='HASYv2 dataset using fold 2 as the test set.'), BuilderConfig(name='fold-3', version=1.0.0, data_dir=None, data_files=None, description='HASYv2 dataset using fold 3 as the test set.'), BuilderConfig(name='fold-4', version=1.0.0, data_dir=None, data_files=None, description='HASYv2 dataset using fold 4 as the test set.'), BuilderConfig(name='fold-5', version=1.0.0, data_dir=None, data_files=None, description='HASYv2 dataset using fold 5 as the test set.'), BuilderConfig(name='fold-6', version=1.0.0, data_dir=None, data_files=None, description='HASYv2 dataset using fold 6 as the test set.'), BuilderConfig(name='fold-7', version=1.0.0, data_dir=None, data_files=None, description='HASYv2 dataset using fold 7 as the test set.'), BuilderConfig(name='fold-8', version=1.0.0, data_dir=None, data_files=None, description='HASYv2 dataset using fold 8 as the test set.'), BuilderConfig(name='fold-9', version=1.0.0, data_dir=None, data_files=None, description='HASYv2 dataset using fold 9 as the test set.'), BuilderConfig(name='fold-10', version=1.0.0, data_dir=None, data_files=None, description='HASYv2 dataset using fold 10 as the test set.')]
- SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/MartinThoma/HASY', 'citation': '@article{thoma2017hasyv2,\n title={The hasyv2 dataset},\n author={Thoma, Martin},\n journal={arXiv preprint arXiv:1701.08380},\n year={2017}}', 'assets': mappingproxy({'train': 'https://zenodo.org/record/259444/files/HASYv2.tar.bz2?download=1', 'test': 'https://zenodo.org/record/259444/files/HASYv2.tar.bz2?download=1'})})
- class KMNIST(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderImage classification. The Kuzushiji-MNIST dataset consists of 70,000 28x28 grayscale images of 10 classes of Kuzushiji (cursive Japanese) characters, with 7,000 images per class. There are 60,000 training images and 10,000 test images. Kuzushiji-MNIST is a drop-in replacement for the MNIST dataset, providing a more challenging alternative for benchmarking machine learning algorithms.
- SOURCE: Mapping = mappingproxy({'homepage': 'http://codh.rois.ac.jp/kmnist/', 'assets': mappingproxy({'train': 'https://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-train-imgs.npz', 'test': 'https://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-test-imgs.npz'}), 'citation': '@online{clanuwat2018deep,\n author = {Tarin Clanuwat and Mikel Bober-Irizar and Asanobu Kitamoto and Alex Lamb and Kazuaki Yamamoto and David Ha},\n title = {Deep Learning for Classical Japanese Literature},\n date = {2018-12-03},\n year = {2018},\n eprintclass = {cs.CV},\n eprinttype = {arXiv},\n eprint = {cs.CV/1812.01718}}'})
- class Linnaeus5(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderLinnaeus 5 Dataset
Abstract The Linnaeus 5 dataset contains 1,600 RGB images sized 256x256 pixels, categorized into 5 classes: berry, bird, dog, flower, and other (negative set). It was created to benchmark fine-grained classification and object recognition tasks.
Context While many datasets focus on broad object categories (like CIFAR-10), Linnaeus 5 offers a focused challenge on specific natural objects plus a “negative” class (‘other’). It serves as a good middle-ground benchmark between simple digit recognition (MNIST) and large-scale natural image classification (ImageNet).
Content The dataset consists of: - Images: 8,000 color images (256x256 pixels). - Classes: 5 categories (berry, bird, dog, flower, other). - Splits: Pre-split into Training (1,200 images per class) and Test (400 images per class).
- SOURCE: Mapping = mappingproxy({'homepage': 'http://chaladze.com/l5/', 'citation': '@article{chaladze2017linnaeus,\n title={Linnaeus 5 dataset for machine learning},\n author={Chaladze, G and Kalatozishvili, L},\n journal={chaladze.com},\n year={2017}}', 'assets': mappingproxy({'data': 'http://chaladze.com/l5/img/Linnaeus%205%20256X256.rar'})})
- class MedMNIST(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderMedMNIST, a large-scale MNIST-like collection of standardized biomedical images, including 12 datasets for 2D and 6 datasets for 3D.
- BUILDER_CONFIGS = [MedMNISTConfig(name='pathmnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST PathMNIST (2D)'), MedMNISTConfig(name='chestmnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST ChestMNIST (2D, multi-label)'), MedMNISTConfig(name='dermamnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST DermaMNIST (2D)'), MedMNISTConfig(name='octmnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST OCTMNIST (2D)'), MedMNISTConfig(name='pneumoniamnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST PneumoniaMNIST (2D)'), MedMNISTConfig(name='retinamnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST RetinaMNIST (2D)'), MedMNISTConfig(name='breastmnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST BreastMNIST (2D)'), MedMNISTConfig(name='bloodmnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST BloodMNIST (2D)'), MedMNISTConfig(name='tissuemnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST TissueMNIST (2D)'), MedMNISTConfig(name='organamnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST OrganAMNIST (2D)'), MedMNISTConfig(name='organcmnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST OrganCMNIST (2D)'), MedMNISTConfig(name='organsmnist', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST OrganSMNIST (2D)'), MedMNISTConfig(name='organmnist3d', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST OrganMNIST3D (3D)'), MedMNISTConfig(name='nodulemnist3d', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST NoduleMNIST3D (3D)'), MedMNISTConfig(name='adrenalmnist3d', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST AdrenalMNIST3D (3D)'), MedMNISTConfig(name='fracturemnist3d', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST FractureMNIST3D (3D)'), MedMNISTConfig(name='vesselmnist3d', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST VesselMNIST3D (3D)'), MedMNISTConfig(name='synapsemnist3d', version=1.0.0, data_dir=None, data_files=None, description='MedMNIST SynapseMNIST3D (3D)')]
- class NotMNIST(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderNotMNIST Dataset that contains images of letters A-J.
- SOURCE: Mapping = mappingproxy({'homepage': 'https://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html', 'assets': mappingproxy({'train_images': 'https://github.com/davidflanagan/notMNIST-to-MNIST/raw/refs/heads/master/train-images-idx3-ubyte.gz', 'train_labels': 'https://github.com/davidflanagan/notMNIST-to-MNIST/raw/refs/heads/master/train-labels-idx1-ubyte.gz', 'test_images': 'https://github.com/davidflanagan/notMNIST-to-MNIST/raw/refs/heads/master/t10k-images-idx3-ubyte.gz', 'test_labels': 'https://github.com/davidflanagan/notMNIST-to-MNIST/raw/refs/heads/master/t10k-labels-idx1-ubyte.gz'}), 'citation': '@misc{bulatov2011notmnist,\n author={Yaroslav Bulatov},\n title={notMNIST dataset},\n year={2011},\n url={http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html}\n }'})
- class RockPaperScissor(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderRock Paper Scissors dataset.
- SOURCE: Mapping = mappingproxy({'homepage': 'https://laurencemoroney.com/datasets.html', 'assets': mappingproxy({'train': 'https://storage.googleapis.com/download.tensorflow.org/data/rps.zip', 'test': 'https://storage.googleapis.com/download.tensorflow.org/data/rps-test-set.zip'}), 'citation': '@misc{laurence2019rock,\n title={Rock Paper Scissors Dataset},\n author={Laurence Moroney},\n year={2019},\n url={https://laurencemoroney.com/datasets.html}}', 'license': 'CC By 2.0'})
- class STL10(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderSTL-10 Dataset
- SOURCE: Mapping = mappingproxy({'homepage': 'https://cs.stanford.edu/~acoates/stl10/', 'assets': mappingproxy({'train': 'https://cs.stanford.edu/~acoates/stl10/stl10_binary.tar.gz', 'test': 'https://cs.stanford.edu/~acoates/stl10/stl10_binary.tar.gz', 'unlabeled': 'https://cs.stanford.edu/~acoates/stl10/stl10_binary.tar.gz'}), 'citation': '@article{coates2011analysis,\n title={An analysis of single-layer networks in unsupervised feature learning},\n author={Coates, Adam and Ng, Andrew Y},\n journal={AISTATS},\n year={2011}}'})
- class SVHN(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderSVHN (Street View House Numbers) Dataset for image classification.
SVHN is a real-world image dataset for developing machine learning and object recognition algorithms with minimal requirement on data preprocessing and formatting. It can be seen as similar in flavor to MNIST, but incorporates an order of magnitude more labeled data (over 600,000 digit images) and comes from a significantly harder, unsolved, real world problem (recognizing digits and numbers in natural scene images). SVHN is obtained from house numbers in Google Street View images.
- SOURCE: Mapping = mappingproxy({'homepage': 'http://ufldl.stanford.edu/housenumbers/', 'assets': mappingproxy({'train': 'http://ufldl.stanford.edu/housenumbers/train_32x32.mat', 'test': 'http://ufldl.stanford.edu/housenumbers/test_32x32.mat', 'extra': 'http://ufldl.stanford.edu/housenumbers/extra_32x32.mat'}), 'citation': '@inproceedings{netzer2011reading,\n title={Reading digits in natural images with unsupervised feature learning},\n author={Netzer, Yuval and Wang, Tao and Coates, Adam and Bissacco, Alessandro and Wu, Baolin and Ng, Andrew Y and others},\n booktitle={NIPS workshop on deep learning and unsupervised feature learning},\n volume={2011},\n number={2},\n pages={4},\n year={2011},\n organization={Granada}\n }'})
- class Shapes3D(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderShapes3D dataset: 10x10x10x8x4x15 factor combinations, 64x64 RGB images.
- SOURCE: Mapping = mappingproxy({'homepage': 'https://github.com/google-deepmind/3dshapes-dataset/', 'assets': mappingproxy({'train': 'https://huggingface.co/datasets/randall-lab/shapes3d/resolve/main/shapes3d.npz'}), 'license': 'apache-2.0', 'citation': '@InProceedings{pmlr-v80-kim18b,\n title = {Disentangling by Factorising},\n author = {Kim, Hyunjik and Mnih, Andriy},\n booktitle = {Proceedings of the 35th International Conference on Machine Learning},\n pages = {2649--2658},\n year = {2018},\n editor = {Dy, Jennifer and Krause, Andreas},\n volume = {80},\n series = {Proceedings of Machine Learning Research},\n month = {10--15 Jul},\n publisher = {PMLR},\n pdf = {http://proceedings.mlr.press/v80/kim18b/kim18b.pdf},\n url = {https://proceedings.mlr.press/v80/kim18b.html}\n}'})
- class SmallNORB(*args, split=None, processed_cache_dir=None, download_dir=None, **kwargs)[source]
Bases:
BaseDatasetBuilderSmallNORB dataset: 96x96 stereo images with 5 known factors.
- SOURCE: Mapping = mappingproxy({'homepage': 'https://cs.nyu.edu/~ylclab/data/norb-v1.0-small/', 'assets': mappingproxy({'train': 'https://huggingface.co/datasets/randall-lab/small-norb/resolve/main/smallnorb-train.zip', 'test': 'https://huggingface.co/datasets/randall-lab/small-norb/resolve/main/smallnorb-test.zip'}), 'license': 'Apache-2.0', 'citation': '@inproceedings{lecun2004learning,\n title={Learning methods for generic object recognition with invariance to pose and lighting},\n author={LeCun, Yann and Huang, Fu Jie and Bottou, Leon},\n booktitle={Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004.},\n volume={2},\n pages={II--104},\n year={2004},\n organization={IEEE}\n}'})