SimMIM

SimMIM#

class stable_pretraining.methods.SimMIM(encoder_name: str | Module = 'vit_small_patch16_224', patch_size: int = 16, mask_ratio: float = 0.6, in_channels: int = 3, image_size: int = 224, pretrained: bool = False)[source]#

Bases: Module

SimMIM masked image modeling.

Parameters:
  • encoder_name – timm model name (default "vit_small_patch16_224").

  • patch_size – Patch size (must match the encoder’s).

  • mask_ratio – Fraction of patches to mask (default 0.6, paper used 0.6).

  • in_channels – Image channels (default 3).

  • image_size – Input image size (default 224).

  • pretrained – Load pretrained timm weights.

forward(images: Tensor) SimMIMOutput[source]#

Forward pass.

Parameters:

images[B, C, H, W] images.

Returns:

SimMIMOutput.