Classic continuous control tasks from DeepMind Control Suite

Description

A collection of continuous control environments built on the DeepMind Control Suite. These environments cover a range of classic locomotion and manipulation tasks in the MuJoCo physics engine. Some environments support multiple tasks via the task constructor argument. In general, the task only affects how the reward is computed.

import stable_worldmodel as swm

# Default task
world = swm.World('swm/CheetahDMControl-v0', num_envs=4)

# Specify a task explicitly
world = swm.World('swm/CheetahDMControl-v0', num_envs=4, task='run-backward')

Available Environments

Environment Environment ID Task
Humanoid swm/HumanoidDMControl-v0 Walk forward at 1 m/s
Cheetah swm/CheetahDMControl-v0 Run forward
Hopper swm/HopperDMControl-v0 Hop forward
Reacher swm/ReacherDMControl-v0 Reach a target
Walker swm/WalkerDMControl-v0 Walk forward at 1 m/s
Quadruped swm/QuadrupedDMControl-v0 Walk forward
Acrobot swm/AcrobotDMControl-v0 Swing up and balance
Pendulum swm/PendulumDMControl-v0 Swing up and balance
Cartpole swm/CartpoleDMControl-v0 Swing up and balance
Ball in Cup swm/BallInCupDMControl-v0 Catch ball in cup
Finger swm/FingerDMControl-v0 Turn spinner to target
Manipulator swm/ManipulatorDMControl-v0 Grasp and place ball

Humanoid

A 21-DoF humanoid body that must learn to walk forward at a target speed. The task uses feature-based observations (joint angles, head height, extremities, torso orientation, center-of-mass velocity).

Default task: Walk forward at a speed of 1 m/s.

Task Description
stand Stand upright without moving
walk (default) Walk forward at 1 m/s
run Run forward at 10 m/s
world = swm.World('swm/HumanoidDMControl-v0', num_envs=4)

# Specify a task explicitly
world = swm.World('swm/HumanoidDMControl-v0', num_envs=4, task='run')

Environment Specs

Property Value
Action Space Box(-1, 1, shape=(21,)) — 21 joint torques
Observation Space Feature vector (joint angles, head height, extremities, torso vertical, CoM velocity)
Episode Length 1000 steps (25s at 0.025s timestep)
Environment ID swm/HumanoidDMControl-v0
Physics MuJoCo

Variation Space

Factor Type Description
agent.color Box(0, 1, shape=(3,)) Humanoid body RGB color
agent.torso_density Box(500, 1500, shape=(1,)) Torso geom density
agent.right_lower_arm_density Box(500, 1500, shape=(1,)) Right lower arm geom density
agent.left_knee_locked Discrete(2) Whether the left knee joint is locked
floor.friction Box(0, 1, shape=(1,)) Floor friction coefficient
floor.color Box(0, 1, shape=(2, 3)) Checkerboard floor colors
light.intensity Box(0, 1, shape=(1,)) Scene lighting intensity

Cheetah

cheetah

A planar biped (half-cheetah) that must learn to run forward as fast as possible. The task uses feature-based observations (joint angles and velocities).

Default task: Run forward.

Task Description
run (default) Run forward at 10 m/s
run-backward Run backward at 10 m/s
stand-front Stand on front foot
stand-back Stand on back foot
jump Jump (both feet off ground)
run-front Run forward while lifting back foot
run-back Run forward while lifting front foot
lie-down Lie down flat
legs-up Lie on back with legs up
flip Flip forward
flip-backward Flip backward
world = swm.World('swm/CheetahDMControl-v0', num_envs=4)

# Specify a task explicitly
world = swm.World('swm/CheetahDMControl-v0', num_envs=4, task='run-backward')

Environment Specs

Property Value
Action Space Box(-1, 1, shape=(6,)) — 6 joint torques
Observation Space Feature vector (joint angles, joint velocities)
Episode Length 1000 steps (25s at 0.025s timestep)
Environment ID swm/CheetahDMControl-v0
Physics MuJoCo

Variation Space

Factor Type Description
agent.color Box(0, 1, shape=(3,)) Cheetah body RGB color
agent.torso_density Box(500, 1500, shape=(1,)) Torso geom density
agent.back_foot_density Box(500, 1500, shape=(1,)) Back foot geom density
agent.back_foot_locked Discrete(2) Whether the back foot joint is locked
floor.friction Box(0, 1, shape=(1,)) Floor friction coefficient
floor.color Box(0, 1, shape=(2, 3)) Checkerboard floor colors
light.intensity Box(0, 1, shape=(1,)) Scene lighting intensity

cheetah_var


Hopper

hopper

A planar one-legged hopper that must learn to hop forward. The task uses feature-based observations (joint angles, velocities, touch sensor).

Default task: Hop forward at 2 m/s.

Task Description
stand Balance upright without moving
hop (default) Hop forward at 2 m/s
hop-backward Hop backward at 1 m/s
flip Flip forward
flip-backward Flip backward
world = swm.World('swm/HopperDMControl-v0', num_envs=4)

# Specify a task explicitly
world = swm.World('swm/HopperDMControl-v0', num_envs=4, task='hop-backward')

Environment Specs

Property Value
Action Space Box(-1, 1, shape=(4,)) — 4 joint torques
Observation Space Feature vector (joint angles, velocities, touch)
Episode Length 1000 steps (20s at 0.02s timestep)
Environment ID swm/HopperDMControl-v0
Physics MuJoCo

Variation Space

Factor Type Description
agent.color Box(0, 1, shape=(3,)) Hopper body RGB color
agent.torso_density Box(500, 1500, shape=(1,)) Torso geom density
agent.foot_density Box(500, 1500, shape=(1,)) Foot geom density
agent.foot_locked Discrete(2) Whether the foot joint is locked
floor.friction Box(0, 1, shape=(1,)) Floor friction coefficient
floor.color Box(0, 1, shape=(2, 3)) Checkerboard floor colors
light.intensity Box(0, 1, shape=(1,)) Scene lighting intensity

hopper_var


Reacher

reacher

A planar two-link arm that must reach a small target. The task uses feature-based observations (joint angles, velocities, finger-to-target distance).

Default task: Move the fingertip to a randomly placed target (size 0.015).

Task Description
easy Reach a large target (radius 0.05)
hard (default) Reach a small target (radius 0.015)
qpos_match Match a target joint configuration
world = swm.World('swm/ReacherDMControl-v0', num_envs=4)

# Specify a task explicitly
world = swm.World('swm/ReacherDMControl-v0', num_envs=4, task='easy')

Environment Specs

Property Value
Action Space Box(-1, 1, shape=(2,)) — 2 joint torques
Observation Space Feature vector (joint angles, velocities, finger-to-target vector)
Episode Length 1000 steps (20s at 0.02s timestep)
Environment ID swm/ReacherDMControl-v0
Physics MuJoCo

Variation Space

Factor Type Description
agent.color Box(0, 1, shape=(3,)) Reacher arm RGB color
agent.arm_density Box(500, 1500, shape=(1,)) Arm geom density
agent.finger_density Box(500, 1500, shape=(1,)) Finger geom density
agent.finger_locked Discrete(2) Whether the finger joint is locked
target.color Box(0, 1, shape=(3,)) Target RGB color
target.shape Discrete(2) Target shape (0: box, 1: sphere)
rendering.render_target Discrete(2) Whether to render the target (0: hidden, 1: visible)
floor.color Box(0, 1, shape=(2, 3)) Checkerboard floor colors
light.intensity Box(0, 1, shape=(1,)) Scene lighting intensity

reacher_var


Walker

walker

A planar bipedal walker that must learn to walk forward at a target speed. The task uses feature-based observations (joint angles, velocities, body height, orientation).

Default task: Walk forward at a speed of 1 m/s.

Task Description
stand Stand upright without moving
walk (default) Walk forward at 1 m/s
run Run forward at 8 m/s
walk-backward Walk backward at 1 m/s
arabesque Stand on one foot in arabesque pose
lie_down Lie down flat
legs_up Lie on back with legs up
flip Flip upside down
world = swm.World('swm/WalkerDMControl-v0', num_envs=4)

# Specify a task explicitly
world = swm.World('swm/WalkerDMControl-v0', num_envs=4, task='run')

Environment Specs

Property Value
Action Space Box(-1, 1, shape=(6,)) — 6 joint torques
Observation Space Feature vector (joint angles, velocities, body height, orientation)
Episode Length 1000 steps (25s at 0.025s timestep)
Environment ID swm/WalkerDMControl-v0
Physics MuJoCo

Variation Space

Factor Type Description
agent.color Box(0, 1, shape=(3,)) Walker body RGB color
agent.torso_density Box(500, 1500, shape=(1,)) Torso geom density
agent.left_foot_density Box(500, 1500, shape=(1,)) Left foot geom density
agent.right_knee_locked Discrete(2) Whether the right knee joint is locked
floor.friction Box(0, 1, shape=(1,)) Floor friction coefficient
floor.color Box(0, 1, shape=(2, 3)) Checkerboard floor colors
floor.rotation_y Box(-10, 10, shape=(1,)) Floor rotation around Y axis (degrees)
light.intensity Box(0, 1, shape=(1,)) Scene lighting intensity

walker_var


Quadruped

quadruped

A four-legged quadruped robot that must learn to walk forward. The task uses feature-based observations (joint angles, velocities, torso orientation, end effector positions).

Default task: Walk forward at a speed of 0.5 m/s.

Task Description
walk (default) Walk forward at 0.5 m/s
run Run forward at 5 m/s
world = swm.World('swm/QuadrupedDMControl-v0', num_envs=4)

# Specify a task explicitly
world = swm.World('swm/QuadrupedDMControl-v0', num_envs=4, task='run')

Environment Specs

Property Value
Action Space Box(-1, 1, shape=(12,)) — 12 joint torques (4 legs × 3 joints)
Observation Space Feature vector (joint angles, velocities, torso orientation, end effectors)
Episode Length 1000 steps (20s at 0.02s timestep)
Environment ID swm/QuadrupedDMControl-v0
Physics MuJoCo

Variation Space

Factor Type Description
agent.color Box(0, 1, shape=(3,)) Quadruped body RGB color
agent.torso_density Box(500, 1500, shape=(1,)) Torso geom density
agent.foot_back_left_density Box(500, 1500, shape=(1,)) Back left foot geom density
agent.knee_back_left_locked Discrete(2) Whether the back left knee joint is locked
floor.friction Box(0, 1, shape=(1,)) Floor friction coefficient
floor.color Box(0, 1, shape=(2, 3)) Checkerboard floor colors
light.intensity Box(0, 1, shape=(1,)) Scene lighting intensity

quadruped_var


Acrobot

acrobot

A two-link acrobot that must swing up and balance. The task uses sparse rewards and feature-based observations (joint angles, velocities).

Task: Swing up and balance both links upright (sparse reward).

world = swm.World('swm/AcrobotDMControl-v0', num_envs=4)

Environment Specs

Property Value
Action Space Box(-1, 1, shape=(1,)) — 1 joint torque (elbow)
Observation Space Feature vector (joint angles, velocities)
Episode Length 500 steps (10s at 0.02s timestep)
Environment ID swm/AcrobotDMControl-v0
Physics MuJoCo

Variation Space

Factor Type Description
agent.color Box(0, 1, shape=(3,)) Acrobot body RGB color
agent.upper_arm_density Box(500, 1500, shape=(1,)) Upper arm geom density
agent.lower_arm_density Box(500, 1500, shape=(1,)) Lower arm geom density
agent.upper_arm_locked Discrete(2) Whether the upper arm joint is locked
target.color Box(0, 1, shape=(3,)) Target RGB color
target.shape Discrete(2) Target shape (0: box, 1: sphere)
floor.color Box(0, 1, shape=(2, 3)) Checkerboard floor colors
light.intensity Box(0, 1, shape=(1,)) Scene lighting intensity

Pendulum

pendulum

A single-link pendulum that must swing up and balance. The task uses feature-based observations (angle, angular velocity).

Task: Swing up and balance the pendulum upright.

world = swm.World('swm/PendulumDMControl-v0', num_envs=4)

Environment Specs

Property Value
Action Space Box(-1, 1, shape=(1,)) — 1 joint torque
Observation Space Feature vector (angle, angular velocity)
Episode Length 1000 steps (20s at 0.02s timestep)
Environment ID swm/PendulumDMControl-v0
Physics MuJoCo

Variation Space

Factor Type Description
agent.color Box(0, 1, shape=(3,)) Pendulum body RGB color
agent.pole_density Box(500, 1500, shape=(1,)) Pole geom density
agent.mass_density Box(500, 1500, shape=(1,)) Tip mass geom density
agent.mass_shape Discrete(2) Tip mass shape (0: box, 1: sphere)
floor.color Box(0, 1, shape=(2, 3)) Checkerboard floor colors
light.intensity Box(0, 1, shape=(1,)) Scene lighting intensity

pendulum_var


Cartpole

cartpole

A cart-pole system that must swing up and balance. The task uses sparse rewards and feature-based observations (cart position, pole angle, velocities).

Task: Swing up and balance the pole upright (sparse reward).

world = swm.World('swm/CartpoleDMControl-v0', num_envs=4)

Environment Specs

Property Value
Action Space Box(-1, 1, shape=(1,)) — 1 cart force
Observation Space Feature vector (cart position, pole angle, velocities)
Episode Length 500 steps (10s at 0.02s timestep)
Environment ID swm/CartpoleDMControl-v0
Physics MuJoCo

Variation Space

Factor Type Description
agent.color Box(0, 1, shape=(3,)) Cartpole body RGB color
agent.cart_mass Box(0.5, 1.5, shape=(1,)) Cart geom mass
agent.pole_density Box(500, 1500, shape=(1,)) Pole geom density
agent.cart_shape Discrete(2) Cart shape (0: box, 1: sphere)
floor.color Box(0, 1, shape=(2, 3)) Checkerboard floor colors
light.intensity Box(0, 1, shape=(1,)) Scene lighting intensity

cartpole_var


Ball in Cup

ballincup

A planar ball-in-cup system where a cup must catch and hold a ball attached by a string. The task uses feature-based observations (cup position, ball position, velocities).

Task: Swing the ball into the cup and keep it there.

world = swm.World('swm/BallInCupDMControl-v0', num_envs=4)

Environment Specs

Property Value
Action Space Box(-1, 1, shape=(2,)) — 2 cup forces (x, z)
Observation Space Feature vector (cup position, ball position, velocities)
Episode Length 1000 steps (20s at 0.02s timestep)
Environment ID swm/BallInCupDMControl-v0
Physics MuJoCo

Variation Space

Factor Type Description
agent.color Box(0, 1, shape=(3,)) Cup RGB color
agent.density Box(500, 1500, shape=(1,)) Cup geom density
ball.color Box(0, 1, shape=(3,)) Ball RGB color
ball.density Box(500, 1500, shape=(1,)) Ball geom density
ball.size Box(0.01, 0.05, shape=(1,)) Ball radius
target.color Box(0, 1, shape=(3,)) Target RGB color
target.shape Discrete(2) Target shape (0: box, 1: sphere)
rendering.render_target Discrete(2) Whether to render the target (0: hidden, 1: visible)
floor.color Box(0, 1, shape=(2, 3)) Checkerboard floor colors
light.intensity Box(0, 1, shape=(1,)) Scene lighting intensity

ballincup_var


Finger

finger

A planar finger that must turn a spinner to reach a target angle. The task uses feature-based observations (finger joint angles, spinner angle, target position).

Default task: Turn the spinner so that a target on it reaches a goal position (target radius 0.03).

Task Description
spin Spin the body as fast as possible (no target/tip in observations)
turn_easy Turn to a target angle (target radius 0.07)
turn_hard (default) Turn to a target angle (target radius 0.03)

Note: the spin task produces different observations than the turn_* tasks (no target position or distance-to-target).

world = swm.World('swm/FingerDMControl-v0', num_envs=4)

# Specify a task explicitly
world = swm.World('swm/FingerDMControl-v0', num_envs=4, task='spin')

Environment Specs

Property Value
Action Space Box(-1, 1, shape=(2,)) — 2 joint torques
Observation Space Feature vector (finger joint angles, spinner angle, target position)
Episode Length 1000 steps (20s at 0.02s timestep)
Environment ID swm/FingerDMControl-v0
Physics MuJoCo

Variation Space

Factor Type Description
agent.color Box(0, 1, shape=(3,)) Finger body RGB color
agent.proximal_density Box(500, 1500, shape=(1,)) Proximal link geom density
agent.fingertip_density Box(500, 1500, shape=(1,)) Fingertip geom density
spinner.color Box(0, 1, shape=(3,)) Spinner RGB color
spinner.density Box(500, 1500, shape=(1,)) Spinner geom density
spinner.friction Box(0, 1, shape=(1,)) Spinner hinge friction loss
target.color Box(0, 1, shape=(3,)) Target RGB color
target.shape Discrete(2) Target shape (0: box, 1: sphere)
rendering.render_target Discrete(2) Whether to render the target (0: hidden, 1: visible)
floor.color Box(0, 1, shape=(2, 3)) Checkerboard floor colors
light.intensity Box(0, 1, shape=(1,)) Scene lighting intensity

finger_var


Manipulator

A planar robotic arm with a gripper that must bring a ball to a target location. The task is fully observable and uses feature-based observations (arm joint angles, velocities, object and target positions).

Task: Grasp a ball and bring it to a target position.

world = swm.World('swm/ManipulatorDMControl-v0', num_envs=4)

Environment Specs

Property Value
Action Space Box(-1, 1, shape=(5,)) — 5 joint torques (arm + gripper)
Observation Space Feature vector (arm joints, velocities, object/target positions)
Episode Length 1000 steps (10s at 0.01s timestep)
Environment ID swm/ManipulatorDMControl-v0
Physics MuJoCo

Variation Space

Factor Type Description
agent.color Box(0, 1, shape=(3,)) Manipulator body RGB color
agent.upper_arm_density Box(500, 1500, shape=(1,)) Upper arm geom density
agent.hand_density Box(500, 1500, shape=(1,)) Hand geom density
agent.upper_arm_length Box(500, 1500, shape=(1,)) Upper arm length
target.color Box(0, 1, shape=(3,)) Target RGB color
target.shape Discrete(2) Target shape (0: box, 1: sphere)
rendering.render_target Discrete(2) Whether to render the target (0: hidden, 1: visible)
floor.color Box(0, 1, shape=(2, 3)) Checkerboard floor colors
light.intensity Box(0, 1, shape=(1,)) Scene lighting intensity

Expert Policy

An expert policy is available for all DMC environments. It can be used to collect high-quality demonstration datasets.

Training Details

The expert policies were trained using the Soft Actor-Critic (SAC) algorithm from Stable-Baselines3 via scripts/expert/train_policies.py. Each policy operates on the environment's feature-based observations (joint angles, velocities, etc.) and outputs continuous actions clipped to [-1, 1].

Observations are normalized at inference time using running statistics saved during training (VecNormalize from Stable-Baselines3). Each environment's expert comes as two files:

File Description
expert_policy.zip Stable-Baselines3 SAC model checkpoint
vec_normalize.pkl Observation normalization statistics (VecNormalize)

Pre-trained checkpoints for all environments are available for download here.

Installation

The expert policy requires stable-baselines3 as an additional dependency:

uv add stable-baselines3

Usage

import os

os.environ['MUJOCO_GL'] = 'egl'

import stable_worldmodel as swm
from stable_worldmodel.envs.dmcontrol import ExpertPolicy

world = swm.World(
    'swm/CheetahDMControl-v0',
    num_envs=3,
    image_shape=(224, 224),
    max_episode_steps=500,
)
world.set_policy(
    ExpertPolicy(
        ckpt_path='path/to/dmc/cheetah/expert_policy.zip',
        vec_normalize_path='path/to/dmc/cheetah/vec_normalize.pkl',
        device='cuda',
    )
)

world.record_video('./', max_steps=500)

API Reference

ExpertPolicy

ExpertPolicy(
    ckpt_path: str,
    vec_normalize_path: str,
    noise_std: float = 0.0,
    seed: int = None,
    device: str = 'cpu',
    **kwargs,
)

Bases: BasePolicy

Parameters:

  • ckpt_path (str) –

    Path to the stable_baselines3 .zip file of the trained policy.

  • vec_normalize_path (str) –

    Path to the .pkl file containing the normalization statistics.

  • noise_std (float, default: 0.0 ) –

    Standard deviation of Gaussian noise added to actions.

  • seed (int, default: None ) –

    Random seed for action noise.

  • device (str, default: 'cpu' ) –

    Device to load the model on, e.g., 'cpu' or 'cuda'.

get_action

get_action(info_dict, **kwargs)