iGPT

Contents

iGPT#

class stable_pretraining.methods.iGPT(encoder_name: str | Module = 'vit_small_patch16_224', patch_size: int = 16, image_size: int = 224, in_channels: int = 3, pretrained: bool = False)[source]#

Bases: Module

Autoregressive image GPT (AIM-style next-patch regression).

A standard timm ViT encoder is used in causal mode: every patch can only attend to itself and earlier patches (raster order). At every position the model predicts the next patch’s pixel values via a linear head and minimises MSE.

Parameters:
  • encoder_name – timm ViT model name (default "vit_small_patch16_224").

  • patch_size – Patch side length (default 16, must match encoder).

  • image_size – Input size (default 224).

  • in_channels – Image channels (default 3).

  • pretrained – Load pretrained timm weights.

forward(images: Tensor) iGPTOutput[source]#

Forward pass.

Parameters:

images[B, C, H, W].