Model-based planning solvers for action optimization

[ Base Class ]

Solver

Bases: Protocol

Protocol for model-based planning solvers.

configure

configure(
    *, action_space: Space, n_envs: int, config: Any
) -> None

Configure the solver with environment and planning specifications.

Parameters:

  • action_space (Space) –

    The action space of the environment.

  • n_envs (int) –

    Number of parallel environments.

  • config (Any) –

    Planning configuration object.

solve

solve(
    info_dict: dict, init_action: Tensor | None = None
) -> dict

Solve the planning optimization problem to find optimal actions.

Parameters:

  • info_dict (dict) –

    Dictionary containing environment state information.

  • init_action (Tensor | None, default: None ) –

    Optional initial action sequence to warm-start the solver.

Returns:

  • dict

    Dictionary containing optimized actions and other solver-specific info.

action_dim property

action_dim: int

Flattened action dimension including action_block grouping.

n_envs property

n_envs: int

Number of parallel environments being planned for.

horizon property

horizon: int

Planning horizon length in timesteps.

[ Implementations ]

CEMSolver

CEMSolver(
    model: Costable,
    batch_size: int = 1,
    num_samples: int = 300,
    var_scale: float = 1,
    n_steps: int = 30,
    topk: int = 30,
    device: str | device = 'cpu',
    seed: int = 1234,
)

Cross Entropy Method solver for action optimization.

Parameters:

  • model (Costable) –

    World model implementing the Costable protocol.

  • batch_size (int, default: 1 ) –

    Number of environments to process in parallel.

  • num_samples (int, default: 300 ) –

    Number of action candidates to sample per iteration.

  • var_scale (float, default: 1 ) –

    Initial variance scale for the action distribution.

  • n_steps (int, default: 30 ) –

    Number of CEM iterations.

  • topk (int, default: 30 ) –

    Number of elite samples to keep for distribution update.

  • device (str | device, default: 'cpu' ) –

    Device for tensor computations.

  • seed (int, default: 1234 ) –

    Random seed for reproducibility.

configure

configure(
    *, action_space: Space, n_envs: int, config: Any
) -> None

Configure the solver with environment specifications.

solve

solve(
    info_dict: dict, init_action: Tensor | None = None
) -> dict

Solve the planning problem using Cross Entropy Method.

MPPISolver

MPPISolver(
    model: Costable,
    batch_size: int = 1,
    num_samples: int = 300,
    var_scale: float = 1.0,
    n_steps: int = 30,
    topk: int = 30,
    temperature: float = 0.5,
    device: str | device = 'cpu',
    seed: int = 1234,
)

Model Predictive Path Integral solver for action optimization.

Parameters:

  • model (Costable) –

    World model implementing the Costable protocol.

  • batch_size (int, default: 1 ) –

    Number of environments to process in parallel.

  • num_samples (int, default: 300 ) –

    Number of action candidates to sample per iteration.

  • var_scale (float, default: 1.0 ) –

    Initial variance scale for action noise.

  • n_steps (int, default: 30 ) –

    Number of MPPI iterations.

  • topk (int, default: 30 ) –

    Number of elite samples for weighted averaging.

  • temperature (float, default: 0.5 ) –

    Temperature parameter for softmax weighting.

  • device (str | device, default: 'cpu' ) –

    Device for tensor computations.

  • seed (int, default: 1234 ) –

    Random seed for reproducibility.

configure

configure(
    *, action_space: Space, n_envs: int, config: Any
) -> None

Configure the solver with environment specifications.

solve

solve(
    info_dict: dict, init_action: Tensor | None = None
) -> dict

Solve the planning problem using MPPI.

GradientSolver

GradientSolver(
    model: Costable,
    n_steps: int,
    batch_size: int | None = None,
    var_scale: float = 1,
    num_samples: int = 1,
    action_noise: float = 0.0,
    device: str | device = 'cpu',
    seed: int = 1234,
    optimizer_cls: type[Optimizer] = SGD,
    optimizer_kwargs: dict | None = None,
)

Bases: Module

Gradient-based solver using backpropagation through the world model.

Parameters:

  • model (Costable) –

    World model implementing the Costable protocol.

  • n_steps (int) –

    Number of gradient descent iterations.

  • batch_size (int | None, default: None ) –

    Number of environments to process in parallel.

  • var_scale (float, default: 1 ) –

    Initial variance scale for action perturbations.

  • num_samples (int, default: 1 ) –

    Number of action samples to optimize in parallel.

  • action_noise (float, default: 0.0 ) –

    Noise added to actions during optimization.

  • device (str | device, default: 'cpu' ) –

    Device for tensor computations.

  • seed (int, default: 1234 ) –

    Random seed for reproducibility.

  • optimizer_cls (type[Optimizer], default: SGD ) –

    PyTorch optimizer class to use.

  • optimizer_kwargs (dict | None, default: None ) –

    Keyword arguments for the optimizer.

configure

configure(
    *, action_space: Space, n_envs: int, config: Any
) -> None

Configure the solver with environment specifications.

solve

solve(
    info_dict: dict, init_action: Tensor | None = None
) -> dict

Solve the planning problem using gradient descent.

PGDSolver

PGDSolver(
    model: Costable,
    n_steps: int,
    batch_size: int | None = None,
    var_scale: float = 1,
    num_samples: int = 1,
    action_noise: float = 0.0,
    device: str | device = 'cpu',
    seed: int = 1234,
)

Bases: Module

Projected Gradient Descent solver for discrete action optimization.

Parameters:

  • model (Costable) –

    World model implementing the Costable protocol.

  • n_steps (int) –

    Number of gradient descent iterations.

  • batch_size (int | None, default: None ) –

    Number of environments to process in parallel.

  • var_scale (float, default: 1 ) –

    Initial variance scale for action perturbations.

  • num_samples (int, default: 1 ) –

    Number of action samples to optimize in parallel.

  • action_noise (float, default: 0.0 ) –

    Noise added to actions during optimization.

  • device (str | device, default: 'cpu' ) –

    Device for tensor computations.

  • seed (int, default: 1234 ) –

    Random seed for reproducibility.

configure

configure(
    *, action_space: Space, n_envs: int, config: Any
) -> None

Configure the solver with environment specifications.

solve

solve(
    info_dict: dict,
    init_action: Tensor | None = None,
    from_scalar: bool = False,
) -> dict

Solve the planning problem using projected gradient descent.