Classic continuous control tasks from DeepMind Control Suite

Description

A collection of continuous control environments built on the DeepMind Control Suite. These environments cover a range of classic locomotion and manipulation tasks in the MuJoCo physics engine.

import stable_worldmodel as swm

# Example: Cheetah environment
world = swm.World('swm/CheetahDMControl-v0', num_envs=4)

Available Environments

Environment Environment ID Task
Humanoid swm/HumanoidDMControl-v0 Walk forward at 1 m/s
Cheetah swm/CheetahDMControl-v0 Run forward
Hopper swm/HopperDMControl-v0 Hop forward
Reacher swm/ReacherDMControl-v0 Reach a target
Walker swm/WalkerDMControl-v0 Walk forward at 1 m/s
Quadruped swm/QuadrupedDMControl-v0 Walk forward
Acrobot swm/AcrobotDMControl-v0 Swing up and balance
Pendulum swm/PendulumDMControl-v0 Swing up and balance
Cartpole swm/CartpoleDMControl-v0 Swing up and balance
Ball in Cup swm/BallInCupDMControl-v0 Catch ball in cup
Finger swm/FingerDMControl-v0 Turn spinner to target
Manipulator swm/ManipulatorDMControl-v0 Grasp and place ball

Humanoid

A 21-DoF humanoid body that must learn to walk forward at a target speed. The task uses feature-based observations (joint angles, head height, extremities, torso orientation, center-of-mass velocity).

Task: Walk forward at a speed of 1 m/s.

world = swm.World('swm/HumanoidDMControl-v0', num_envs=4)

Environment Specs

Property Value
Action Space Box(-1, 1, shape=(21,)) — 21 joint torques
Observation Space Feature vector (joint angles, head height, extremities, torso vertical, CoM velocity)
Episode Length 1000 steps (25s at 0.025s timestep)
Environment ID swm/HumanoidDMControl-v0
Physics MuJoCo

Variation Space

Factor Type Description
agent.color Box(0, 1, shape=(3,)) Humanoid body RGB color
agent.torso_density Box(500, 1500, shape=(1,)) Torso geom density
agent.right_lower_arm_density Box(500, 1500, shape=(1,)) Right lower arm geom density
agent.left_knee_locked Discrete(2) Whether the left knee joint is locked
floor.friction Box(0, 1, shape=(1,)) Floor friction coefficient
floor.color Box(0, 1, shape=(2, 3)) Checkerboard floor colors
light.intensity Box(0, 1, shape=(1,)) Scene lighting intensity

Cheetah

cheetah

A planar biped (half-cheetah) that must learn to run forward as fast as possible. The task uses feature-based observations (joint angles and velocities).

Task: Run forward (maximize forward velocity).

world = swm.World('swm/CheetahDMControl-v0', num_envs=4)

Environment Specs

Property Value
Action Space Box(-1, 1, shape=(6,)) — 6 joint torques
Observation Space Feature vector (joint angles, joint velocities)
Episode Length 1000 steps (25s at 0.025s timestep)
Environment ID swm/CheetahDMControl-v0
Physics MuJoCo

Variation Space

Factor Type Description
agent.color Box(0, 1, shape=(3,)) Cheetah body RGB color
agent.torso_density Box(500, 1500, shape=(1,)) Torso geom density
agent.back_foot_density Box(500, 1500, shape=(1,)) Back foot geom density
agent.back_foot_locked Discrete(2) Whether the back foot joint is locked
floor.friction Box(0, 1, shape=(1,)) Floor friction coefficient
floor.color Box(0, 1, shape=(2, 3)) Checkerboard floor colors
light.intensity Box(0, 1, shape=(1,)) Scene lighting intensity

Hopper

hopper

A planar one-legged hopper that must learn to hop forward. The task uses feature-based observations (joint angles, velocities, touch sensor).

Task: Hop forward (maximize forward velocity).

world = swm.World('swm/HopperDMControl-v0', num_envs=4)

Environment Specs

Property Value
Action Space Box(-1, 1, shape=(4,)) — 4 joint torques
Observation Space Feature vector (joint angles, velocities, touch)
Episode Length 1000 steps (20s at 0.02s timestep)
Environment ID swm/HopperDMControl-v0
Physics MuJoCo

Variation Space

Factor Type Description
agent.color Box(0, 1, shape=(3,)) Hopper body RGB color
agent.torso_density Box(500, 1500, shape=(1,)) Torso geom density
agent.foot_density Box(500, 1500, shape=(1,)) Foot geom density
agent.foot_locked Discrete(2) Whether the foot joint is locked
floor.friction Box(0, 1, shape=(1,)) Floor friction coefficient
floor.color Box(0, 1, shape=(2, 3)) Checkerboard floor colors
light.intensity Box(0, 1, shape=(1,)) Scene lighting intensity

Reacher

reacher

A planar two-link arm that must reach a small target. The task uses feature-based observations (joint angles, velocities, finger-to-target distance).

Task: Move the fingertip to a randomly placed target (size 0.015).

world = swm.World('swm/ReacherDMControl-v0', num_envs=4)

Environment Specs

Property Value
Action Space Box(-1, 1, shape=(2,)) — 2 joint torques
Observation Space Feature vector (joint angles, velocities, finger-to-target vector)
Episode Length 1000 steps (20s at 0.02s timestep)
Environment ID swm/ReacherDMControl-v0
Physics MuJoCo

Variation Space

Factor Type Description
agent.color Box(0, 1, shape=(3,)) Reacher arm RGB color
agent.arm_density Box(500, 1500, shape=(1,)) Arm geom density
agent.finger_density Box(500, 1500, shape=(1,)) Finger geom density
agent.finger_locked Discrete(2) Whether the finger joint is locked
target.color Box(0, 1, shape=(3,)) Target RGB color
target.shape Discrete(2) Target shape (0: box, 1: sphere)
rendering.render_target Discrete(2) Whether to render the target (0: hidden, 1: visible)
floor.color Box(0, 1, shape=(2, 3)) Checkerboard floor colors
light.intensity Box(0, 1, shape=(1,)) Scene lighting intensity

Walker

A planar bipedal walker that must learn to walk forward at a target speed. The task uses feature-based observations (joint angles, velocities, body height, orientation).

Task: Walk forward at a speed of 1 m/s.

world = swm.World('swm/WalkerDMControl-v0', num_envs=4)

Environment Specs

Property Value
Action Space Box(-1, 1, shape=(6,)) — 6 joint torques
Observation Space Feature vector (joint angles, velocities, body height, orientation)
Episode Length 1000 steps (25s at 0.025s timestep)
Environment ID swm/WalkerDMControl-v0
Physics MuJoCo

Variation Space

Factor Type Description
agent.color Box(0, 1, shape=(3,)) Walker body RGB color
agent.torso_density Box(500, 1500, shape=(1,)) Torso geom density
agent.left_foot_density Box(500, 1500, shape=(1,)) Left foot geom density
agent.right_knee_locked Discrete(2) Whether the right knee joint is locked
floor.friction Box(0, 1, shape=(1,)) Floor friction coefficient
floor.color Box(0, 1, shape=(2, 3)) Checkerboard floor colors
floor.rotation_y Box(-10, 10, shape=(1,)) Floor rotation around Y axis (degrees)
light.intensity Box(0, 1, shape=(1,)) Scene lighting intensity

Quadruped

A four-legged quadruped robot that must learn to walk forward. The task uses feature-based observations (joint angles, velocities, torso orientation, end effector positions).

Task: Walk forward at a speed of 0.5 m/s.

world = swm.World('swm/QuadrupedDMControl-v0', num_envs=4)

Environment Specs

Property Value
Action Space Box(-1, 1, shape=(12,)) — 12 joint torques (4 legs × 3 joints)
Observation Space Feature vector (joint angles, velocities, torso orientation, end effectors)
Episode Length 1000 steps (20s at 0.02s timestep)
Environment ID swm/QuadrupedDMControl-v0
Physics MuJoCo

Variation Space

Factor Type Description
agent.color Box(0, 1, shape=(3,)) Quadruped body RGB color
agent.torso_density Box(500, 1500, shape=(1,)) Torso geom density
agent.foot_back_left_density Box(500, 1500, shape=(1,)) Back left foot geom density
agent.knee_back_left_locked Discrete(2) Whether the back left knee joint is locked
floor.friction Box(0, 1, shape=(1,)) Floor friction coefficient
floor.color Box(0, 1, shape=(2, 3)) Checkerboard floor colors
light.intensity Box(0, 1, shape=(1,)) Scene lighting intensity

Acrobot

acrobot

A two-link acrobot that must swing up and balance. The task uses sparse rewards and feature-based observations (joint angles, velocities).

Task: Swing up and balance both links upright (sparse reward).

world = swm.World('swm/AcrobotDMControl-v0', num_envs=4)

Environment Specs

Property Value
Action Space Box(-1, 1, shape=(1,)) — 1 joint torque (elbow)
Observation Space Feature vector (joint angles, velocities)
Episode Length 500 steps (10s at 0.02s timestep)
Environment ID swm/AcrobotDMControl-v0
Physics MuJoCo

Variation Space

Factor Type Description
agent.color Box(0, 1, shape=(3,)) Acrobot body RGB color
agent.upper_arm_density Box(500, 1500, shape=(1,)) Upper arm geom density
agent.lower_arm_density Box(500, 1500, shape=(1,)) Lower arm geom density
agent.upper_arm_locked Discrete(2) Whether the upper arm joint is locked
target.color Box(0, 1, shape=(3,)) Target RGB color
target.shape Discrete(2) Target shape (0: box, 1: sphere)
floor.color Box(0, 1, shape=(2, 3)) Checkerboard floor colors
light.intensity Box(0, 1, shape=(1,)) Scene lighting intensity

Pendulum

pendulum

A single-link pendulum that must swing up and balance. The task uses feature-based observations (angle, angular velocity).

Task: Swing up and balance the pendulum upright.

world = swm.World('swm/PendulumDMControl-v0', num_envs=4)

Environment Specs

Property Value
Action Space Box(-1, 1, shape=(1,)) — 1 joint torque
Observation Space Feature vector (angle, angular velocity)
Episode Length 1000 steps (20s at 0.02s timestep)
Environment ID swm/PendulumDMControl-v0
Physics MuJoCo

Variation Space

Factor Type Description
agent.color Box(0, 1, shape=(3,)) Pendulum body RGB color
agent.pole_density Box(500, 1500, shape=(1,)) Pole geom density
agent.mass_density Box(500, 1500, shape=(1,)) Tip mass geom density
agent.mass_shape Discrete(2) Tip mass shape (0: box, 1: sphere)
floor.color Box(0, 1, shape=(2, 3)) Checkerboard floor colors
light.intensity Box(0, 1, shape=(1,)) Scene lighting intensity

Cartpole

cartpole

A cart-pole system that must swing up and balance. The task uses sparse rewards and feature-based observations (cart position, pole angle, velocities).

Task: Swing up and balance the pole upright (sparse reward).

world = swm.World('swm/CartpoleDMControl-v0', num_envs=4)

Environment Specs

Property Value
Action Space Box(-1, 1, shape=(1,)) — 1 cart force
Observation Space Feature vector (cart position, pole angle, velocities)
Episode Length 500 steps (10s at 0.02s timestep)
Environment ID swm/CartpoleDMControl-v0
Physics MuJoCo

Variation Space

Factor Type Description
agent.color Box(0, 1, shape=(3,)) Cartpole body RGB color
agent.cart_mass Box(0.5, 1.5, shape=(1,)) Cart geom mass
agent.pole_density Box(500, 1500, shape=(1,)) Pole geom density
agent.cart_shape Discrete(2) Cart shape (0: box, 1: sphere)
floor.color Box(0, 1, shape=(2, 3)) Checkerboard floor colors
light.intensity Box(0, 1, shape=(1,)) Scene lighting intensity

Ball in Cup

cup

A planar ball-in-cup system where a cup must catch and hold a ball attached by a string. The task uses feature-based observations (cup position, ball position, velocities).

Task: Swing the ball into the cup and keep it there.

world = swm.World('swm/BallInCupDMControl-v0', num_envs=4)

Environment Specs

Property Value
Action Space Box(-1, 1, shape=(2,)) — 2 cup forces (x, z)
Observation Space Feature vector (cup position, ball position, velocities)
Episode Length 1000 steps (20s at 0.02s timestep)
Environment ID swm/BallInCupDMControl-v0
Physics MuJoCo

Variation Space

Factor Type Description
agent.color Box(0, 1, shape=(3,)) Cup RGB color
agent.density Box(500, 1500, shape=(1,)) Cup geom density
ball.color Box(0, 1, shape=(3,)) Ball RGB color
ball.density Box(500, 1500, shape=(1,)) Ball geom density
ball.size Box(0.01, 0.05, shape=(1,)) Ball radius
target.color Box(0, 1, shape=(3,)) Target RGB color
target.shape Discrete(2) Target shape (0: box, 1: sphere)
rendering.render_target Discrete(2) Whether to render the target (0: hidden, 1: visible)
floor.color Box(0, 1, shape=(2, 3)) Checkerboard floor colors
light.intensity Box(0, 1, shape=(1,)) Scene lighting intensity

Finger

A planar finger that must turn a spinner to reach a target angle. The task uses feature-based observations (finger joint angles, spinner angle, target position).

Task: Turn the spinner so that a target on it reaches a goal position (target radius 0.03).

world = swm.World('swm/FingerDMControl-v0', num_envs=4)

Environment Specs

Property Value
Action Space Box(-1, 1, shape=(2,)) — 2 joint torques
Observation Space Feature vector (finger joint angles, spinner angle, target position)
Episode Length 1000 steps (20s at 0.02s timestep)
Environment ID swm/FingerDMControl-v0
Physics MuJoCo

Variation Space

Factor Type Description
agent.color Box(0, 1, shape=(3,)) Finger body RGB color
agent.proximal_density Box(500, 1500, shape=(1,)) Proximal link geom density
agent.fingertip_density Box(500, 1500, shape=(1,)) Fingertip geom density
spinner.color Box(0, 1, shape=(3,)) Spinner RGB color
spinner.density Box(500, 1500, shape=(1,)) Spinner geom density
spinner.friction Box(0, 1, shape=(1,)) Spinner hinge friction loss
target.color Box(0, 1, shape=(3,)) Target RGB color
target.shape Discrete(2) Target shape (0: box, 1: sphere)
rendering.render_target Discrete(2) Whether to render the target (0: hidden, 1: visible)
floor.color Box(0, 1, shape=(2, 3)) Checkerboard floor colors
light.intensity Box(0, 1, shape=(1,)) Scene lighting intensity

Manipulator

A planar robotic arm with a gripper that must bring a ball to a target location. The task is fully observable and uses feature-based observations (arm joint angles, velocities, object and target positions).

Task: Grasp a ball and bring it to a target position.

world = swm.World('swm/ManipulatorDMControl-v0', num_envs=4)

Environment Specs

Property Value
Action Space Box(-1, 1, shape=(5,)) — 5 joint torques (arm + gripper)
Observation Space Feature vector (arm joints, velocities, object/target positions)
Episode Length 1000 steps (10s at 0.01s timestep)
Environment ID swm/ManipulatorDMControl-v0
Physics MuJoCo

Variation Space

Factor Type Description
agent.color Box(0, 1, shape=(3,)) Manipulator body RGB color
agent.upper_arm_density Box(500, 1500, shape=(1,)) Upper arm geom density
agent.hand_density Box(500, 1500, shape=(1,)) Hand geom density
agent.upper_arm_length Box(500, 1500, shape=(1,)) Upper arm length
target.color Box(0, 1, shape=(3,)) Target RGB color
target.shape Discrete(2) Target shape (0: box, 1: sphere)
rendering.render_target Discrete(2) Whether to render the target (0: hidden, 1: visible)
floor.color Box(0, 1, shape=(2, 3)) Checkerboard floor colors
light.intensity Box(0, 1, shape=(1,)) Scene lighting intensity