stable_datasets.features package

Submodules

stable_datasets.features.array module

Array-based feature codecs.

class Array3D(shape: tuple, dtype: str = 'uint8')[source]

Bases: FeatureType

Fixed-shape 3D array stored as flat bytes.

encode(value, *, cache_dir: Path | None = None) bytes | None[source]
format(value, *, format_type: str, decode_images: bool = True, cache_dir: Path | None = None)[source]
to_arrow_type() DataType[source]

stable_datasets.features.base module

Core feature descriptors shared across modalities.

class ClassLabel(names: list[str] | None = None, num_classes: int | None = None)[source]

Bases: FeatureType

Categorical label with name-to-int mapping.

encode(value, *, cache_dir: Path | None = None)[source]
format(value, *, format_type: str, decode_images: bool = True, cache_dir: Path | None = None)[source]
int2str(idx: int) str[source]
str2int(name: str) int[source]
to_arrow_type() DataType[source]
class FeatureType[source]

Bases: object

Base class for feature type descriptors.

arrow_metadata() dict[bytes, bytes][source]
encode(value, *, cache_dir: Path | None = None)[source]
fingerprint_data() str[source]
format(value, *, format_type: str, decode_images: bool = True, cache_dir: Path | None = None)[source]
to_arrow_type() DataType[source]
class Sequence(feature: FeatureType)[source]

Bases: FeatureType

Variable-length list of a sub-feature.

encode(value, *, cache_dir: Path | None = None)[source]
to_arrow_type() DataType[source]
class Value(dtype: str)[source]

Bases: FeatureType

Scalar value type. Maps dtype strings to PyArrow types.

format(value, *, format_type: str, decode_images: bool = True, cache_dir: Path | None = None)[source]
to_arrow_type() DataType[source]

stable_datasets.features.image module

Image feature codec.

class Image(encode_format: str = 'PNG')[source]

Bases: FeatureType

Image feature stored as raw bytes in Arrow.

encode(value, *, cache_dir: Path | None = None) bytes | None[source]
format(value, *, format_type: str, decode_images: bool = True, cache_dir: Path | None = None)[source]
to_arrow_type()[source]

stable_datasets.features.video module

Video feature codec and lazy reference objects.

class Video(storage: str = 'path', allowed_extensions: tuple[str, ...] = ('.mp4', '.avi', '.mov', '.webm', '.mkv'))[source]

Bases: FeatureType

Video feature with validated path, bytes, or specialized frame storage.

arrow_metadata() dict[bytes, bytes][source]
encode(value, *, cache_dir: Path | None = None)[source]
fingerprint_data() str[source]
format(value, *, format_type: str, decode_images: bool = True, cache_dir: Path | None = None)[source]
to_arrow_type() DataType[source]
class VideoRef(cell: Mapping[str, Any], cache_dir: Path | None = None)[source]

Bases: object

Lazy reference to a cached video asset.

property bytes: bytes
cache_dir: Path | None = None
cell: Mapping[str, Any]
property checksum: str | None
property extension: str
property media_type: str
property mode: str
property path: Path | None
property size: int

Module contents

Feature codec modules.

class Array3D(shape: tuple, dtype: str = 'uint8')[source]

Bases: FeatureType

Fixed-shape 3D array stored as flat bytes.

encode(value, *, cache_dir: Path | None = None) bytes | None[source]
format(value, *, format_type: str, decode_images: bool = True, cache_dir: Path | None = None)[source]
to_arrow_type() DataType[source]
class ClassLabel(names: list[str] | None = None, num_classes: int | None = None)[source]

Bases: FeatureType

Categorical label with name-to-int mapping.

encode(value, *, cache_dir: Path | None = None)[source]
format(value, *, format_type: str, decode_images: bool = True, cache_dir: Path | None = None)[source]
int2str(idx: int) str[source]
str2int(name: str) int[source]
to_arrow_type() DataType[source]
class FeatureType[source]

Bases: object

Base class for feature type descriptors.

arrow_metadata() dict[bytes, bytes][source]
encode(value, *, cache_dir: Path | None = None)[source]
fingerprint_data() str[source]
format(value, *, format_type: str, decode_images: bool = True, cache_dir: Path | None = None)[source]
to_arrow_type() DataType[source]
class Image(encode_format: str = 'PNG')[source]

Bases: FeatureType

Image feature stored as raw bytes in Arrow.

encode(value, *, cache_dir: Path | None = None) bytes | None[source]
format(value, *, format_type: str, decode_images: bool = True, cache_dir: Path | None = None)[source]
to_arrow_type()[source]
class Sequence(feature: FeatureType)[source]

Bases: FeatureType

Variable-length list of a sub-feature.

encode(value, *, cache_dir: Path | None = None)[source]
to_arrow_type() DataType[source]
class Value(dtype: str)[source]

Bases: FeatureType

Scalar value type. Maps dtype strings to PyArrow types.

format(value, *, format_type: str, decode_images: bool = True, cache_dir: Path | None = None)[source]
to_arrow_type() DataType[source]
class Video(storage: str = 'path', allowed_extensions: tuple[str, ...] = ('.mp4', '.avi', '.mov', '.webm', '.mkv'))[source]

Bases: FeatureType

Video feature with validated path, bytes, or specialized frame storage.

arrow_metadata() dict[bytes, bytes][source]
encode(value, *, cache_dir: Path | None = None)[source]
fingerprint_data() str[source]
format(value, *, format_type: str, decode_images: bool = True, cache_dir: Path | None = None)[source]
to_arrow_type() DataType[source]
class VideoRef(cell: Mapping[str, Any], cache_dir: Path | None = None)[source]

Bases: object

Lazy reference to a cached video asset.

property bytes: bytes
cache_dir: Path | None = None
cell: Mapping[str, Any]
property checksum: str | None
property extension: str
property media_type: str
property mode: str
property path: Path | None
property size: int