Manager#

class stable_pretraining.manager.Manager(*args, **kwargs)[source]#

Bases: Checkpointable

Manages training with logging, scheduling, and checkpointing support.

Parameters:
  • trainer (Union[dict, DictConfig, pl.Trainer]) – PyTorch Lightning trainer configuration or instance.

  • module (Union[dict, DictConfig, pl.LightningModule]) – Lightning module configuration or instance.

  • data (Union[dict, DictConfig, pl.LightningDataModule]) – Data module configuration or instance.

  • seed (int, optional) – Random seed for reproducibility. Defaults to None.

  • ckpt_path (str, optional) – Absolute path to a checkpoint to load from at the very start of a fresh run. Loaded once at step 0; after that the run lives in its own freshly-created run_dir and produces its own last.ckpt. Ignored on SLURM requeue — see below. Must be absolute and must exist on disk; otherwise Manager raises before training. Defaults to None (train from scratch / pretrained backbone).

  • weights_only (bool, optional) –

    Controls how ckpt_path is loaded on a fresh run. Forwarded to Trainer.fit(weights_only=...) when supported by the installed Lightning version. True (the PyTorch ≥ 2.6 default for torch.load) loads only model weights — optimizer / scheduler / RNG state are discarded, which is the usual “transfer-learning init” semantics. Set False to fully restore everything from the checkpoint.

    Has no effect on SLURM requeue: when SLURM_RESTART_COUNT >= 1 the Manager always loads <run_dir>/checkpoints/last.ckpt with full state (weights_only=False) regardless of this flag, because the goal is to resume in-flight training exactly where preempt struck.

  • compile (bool, optional) – Should we compile the given module. Defaults to False.

init_and_sync_wandb()[source]#

Handles some utilities for WandB.

property instantiated_data#

Lazily instantiate and return the LightningDataModule.

If data was supplied as a dict or DictConfig, it is instantiated via hydra.utils.instantiate on first access and the result is cached. If it was supplied as a pre-built pl.LightningDataModule instance it is returned as-is.

Returns:

The instantiated data module ready for use.

Return type:

pl.LightningDataModule

property instantiated_module#

Lazily instantiate and return the LightningModule.

If module was supplied as a dict or DictConfig, it is instantiated via hydra.utils.instantiate on first access and the result is cached. If it was supplied as a pre-built pl.LightningModule instance it is returned as-is.

Returns:

The instantiated module ready for training.

Return type:

pl.LightningModule

predict()[source]#

Run inference using the configured module and data.

Calls Trainer.predict with the lazily-instantiated module and data module, then flushes any buffered wandb offline data.

test()[source]#

Run the test split using the configured module and data.

Calls Trainer.test with the lazily-instantiated module and data module, then flushes any buffered wandb offline data.

validate()[source]#

Run one validation pass using the configured module and data.

Calls Trainer.validate with the lazily-instantiated module and data module, then flushes any buffered wandb offline data. Use this after __call__ has already set up the trainer, or standalone when only evaluation is needed.