DINOv3#

class stable_pretraining.methods.DINOv3(encoder_name: str | Module = 'vit_small_patch16_224', n_register_tokens: int = 4, koleo_weight: float = 0.1, projector_hidden_dim: int = 2048, projector_bottleneck_dim: int = 256, n_cls_prototypes: int = 65536, n_patch_prototypes: int = 8192, mask_ratio: float = 0.3, patch_loss_weight: float = 1.0, temperature_student: float = 0.1, temperature_teacher: float = 0.07, ema_decay_start: float = 0.996, ema_decay_end: float = 1.0, image_size: int = 224, pretrained: bool = False)[source]#

Bases: Module

DINOv3: DINOv2 with register tokens + KoLeo.

Parameters:

encoder_name – timm ViT name. Register tokens are added on top of the timm model via a Parameter.
n_register_tokens – Number of register tokens (default 4).
koleo_weight – Weight on the KoLeo penalty (default 0.1).
projector_hidden_dim – Hidden dim for both heads (default 2048).
projector_bottleneck_dim – Bottleneck dim (default 256).
n_cls_prototypes – CLS prototypes (default 65536).
n_patch_prototypes – Patch prototypes (default 8192).
mask_ratio – Patch mask ratio for the student (default 0.3).
patch_loss_weight – Weight on the patch loss term (default 1.0).
temperature_student – Student softmax temperature (default 0.1).
temperature_teacher – Teacher softmax temperature (default 0.07).
ema_decay_start – Initial EMA (default 0.996).
ema_decay_end – Final EMA (default 1.0).
image_size – Input size (default 224).
pretrained – Load pretrained timm weights.

forward(global_views: Sequence[Tensor] | None = None, local_views: Sequence[Tensor] | None = None, images: Tensor | None = None) → DINOv3Output[source]#

Same as torch.nn.Module.forward().

Parameters:

*args – Whatever you decide to pass into the forward method.
**kwargs – Keyword arguments are also possible.

Returns:

Your model’s output

DINOv3

Contents

DINOv3#