Description
A collection of continuous control environments built on the DeepMind Control Suite. These environments cover a range of classic locomotion and manipulation tasks in the MuJoCo physics engine.
import stable_worldmodel as swm
# Example: Cheetah environment
world = swm.World('swm/CheetahDMControl-v0', num_envs=4)
Available Environments
| Environment | Environment ID | Task |
|---|---|---|
| Humanoid | swm/HumanoidDMControl-v0 |
Walk forward at 1 m/s |
| Cheetah | swm/CheetahDMControl-v0 |
Run forward |
| Hopper | swm/HopperDMControl-v0 |
Hop forward |
| Reacher | swm/ReacherDMControl-v0 |
Reach a target |
| Walker | swm/WalkerDMControl-v0 |
Walk forward at 1 m/s |
| Quadruped | swm/QuadrupedDMControl-v0 |
Walk forward |
| Acrobot | swm/AcrobotDMControl-v0 |
Swing up and balance |
| Pendulum | swm/PendulumDMControl-v0 |
Swing up and balance |
| Cartpole | swm/CartpoleDMControl-v0 |
Swing up and balance |
| Ball in Cup | swm/BallInCupDMControl-v0 |
Catch ball in cup |
| Finger | swm/FingerDMControl-v0 |
Turn spinner to target |
| Manipulator | swm/ManipulatorDMControl-v0 |
Grasp and place ball |
Humanoid
A 21-DoF humanoid body that must learn to walk forward at a target speed. The task uses feature-based observations (joint angles, head height, extremities, torso orientation, center-of-mass velocity).
Task: Walk forward at a speed of 1 m/s.
world = swm.World('swm/HumanoidDMControl-v0', num_envs=4)
Environment Specs
| Property | Value |
|---|---|
| Action Space | Box(-1, 1, shape=(21,)) — 21 joint torques |
| Observation Space | Feature vector (joint angles, head height, extremities, torso vertical, CoM velocity) |
| Episode Length | 1000 steps (25s at 0.025s timestep) |
| Environment ID | swm/HumanoidDMControl-v0 |
| Physics | MuJoCo |
Variation Space
| Factor | Type | Description |
|---|---|---|
agent.color |
Box(0, 1, shape=(3,)) | Humanoid body RGB color |
agent.torso_density |
Box(500, 1500, shape=(1,)) | Torso geom density |
agent.right_lower_arm_density |
Box(500, 1500, shape=(1,)) | Right lower arm geom density |
agent.left_knee_locked |
Discrete(2) | Whether the left knee joint is locked |
floor.friction |
Box(0, 1, shape=(1,)) | Floor friction coefficient |
floor.color |
Box(0, 1, shape=(2, 3)) | Checkerboard floor colors |
light.intensity |
Box(0, 1, shape=(1,)) | Scene lighting intensity |
Cheetah

A planar biped (half-cheetah) that must learn to run forward as fast as possible. The task uses feature-based observations (joint angles and velocities).
Task: Run forward (maximize forward velocity).
world = swm.World('swm/CheetahDMControl-v0', num_envs=4)
Environment Specs
| Property | Value |
|---|---|
| Action Space | Box(-1, 1, shape=(6,)) — 6 joint torques |
| Observation Space | Feature vector (joint angles, joint velocities) |
| Episode Length | 1000 steps (25s at 0.025s timestep) |
| Environment ID | swm/CheetahDMControl-v0 |
| Physics | MuJoCo |
Variation Space
| Factor | Type | Description |
|---|---|---|
agent.color |
Box(0, 1, shape=(3,)) | Cheetah body RGB color |
agent.torso_density |
Box(500, 1500, shape=(1,)) | Torso geom density |
agent.back_foot_density |
Box(500, 1500, shape=(1,)) | Back foot geom density |
agent.back_foot_locked |
Discrete(2) | Whether the back foot joint is locked |
floor.friction |
Box(0, 1, shape=(1,)) | Floor friction coefficient |
floor.color |
Box(0, 1, shape=(2, 3)) | Checkerboard floor colors |
light.intensity |
Box(0, 1, shape=(1,)) | Scene lighting intensity |
Hopper

A planar one-legged hopper that must learn to hop forward. The task uses feature-based observations (joint angles, velocities, touch sensor).
Task: Hop forward (maximize forward velocity).
world = swm.World('swm/HopperDMControl-v0', num_envs=4)
Environment Specs
| Property | Value |
|---|---|
| Action Space | Box(-1, 1, shape=(4,)) — 4 joint torques |
| Observation Space | Feature vector (joint angles, velocities, touch) |
| Episode Length | 1000 steps (20s at 0.02s timestep) |
| Environment ID | swm/HopperDMControl-v0 |
| Physics | MuJoCo |
Variation Space
| Factor | Type | Description |
|---|---|---|
agent.color |
Box(0, 1, shape=(3,)) | Hopper body RGB color |
agent.torso_density |
Box(500, 1500, shape=(1,)) | Torso geom density |
agent.foot_density |
Box(500, 1500, shape=(1,)) | Foot geom density |
agent.foot_locked |
Discrete(2) | Whether the foot joint is locked |
floor.friction |
Box(0, 1, shape=(1,)) | Floor friction coefficient |
floor.color |
Box(0, 1, shape=(2, 3)) | Checkerboard floor colors |
light.intensity |
Box(0, 1, shape=(1,)) | Scene lighting intensity |
Reacher

A planar two-link arm that must reach a small target. The task uses feature-based observations (joint angles, velocities, finger-to-target distance).
Task: Move the fingertip to a randomly placed target (size 0.015).
world = swm.World('swm/ReacherDMControl-v0', num_envs=4)
Environment Specs
| Property | Value |
|---|---|
| Action Space | Box(-1, 1, shape=(2,)) — 2 joint torques |
| Observation Space | Feature vector (joint angles, velocities, finger-to-target vector) |
| Episode Length | 1000 steps (20s at 0.02s timestep) |
| Environment ID | swm/ReacherDMControl-v0 |
| Physics | MuJoCo |
Variation Space
| Factor | Type | Description |
|---|---|---|
agent.color |
Box(0, 1, shape=(3,)) | Reacher arm RGB color |
agent.arm_density |
Box(500, 1500, shape=(1,)) | Arm geom density |
agent.finger_density |
Box(500, 1500, shape=(1,)) | Finger geom density |
agent.finger_locked |
Discrete(2) | Whether the finger joint is locked |
target.color |
Box(0, 1, shape=(3,)) | Target RGB color |
target.shape |
Discrete(2) | Target shape (0: box, 1: sphere) |
rendering.render_target |
Discrete(2) | Whether to render the target (0: hidden, 1: visible) |
floor.color |
Box(0, 1, shape=(2, 3)) | Checkerboard floor colors |
light.intensity |
Box(0, 1, shape=(1,)) | Scene lighting intensity |
Walker
A planar bipedal walker that must learn to walk forward at a target speed. The task uses feature-based observations (joint angles, velocities, body height, orientation).
Task: Walk forward at a speed of 1 m/s.
world = swm.World('swm/WalkerDMControl-v0', num_envs=4)
Environment Specs
| Property | Value |
|---|---|
| Action Space | Box(-1, 1, shape=(6,)) — 6 joint torques |
| Observation Space | Feature vector (joint angles, velocities, body height, orientation) |
| Episode Length | 1000 steps (25s at 0.025s timestep) |
| Environment ID | swm/WalkerDMControl-v0 |
| Physics | MuJoCo |
Variation Space
| Factor | Type | Description |
|---|---|---|
agent.color |
Box(0, 1, shape=(3,)) | Walker body RGB color |
agent.torso_density |
Box(500, 1500, shape=(1,)) | Torso geom density |
agent.left_foot_density |
Box(500, 1500, shape=(1,)) | Left foot geom density |
agent.right_knee_locked |
Discrete(2) | Whether the right knee joint is locked |
floor.friction |
Box(0, 1, shape=(1,)) | Floor friction coefficient |
floor.color |
Box(0, 1, shape=(2, 3)) | Checkerboard floor colors |
floor.rotation_y |
Box(-10, 10, shape=(1,)) | Floor rotation around Y axis (degrees) |
light.intensity |
Box(0, 1, shape=(1,)) | Scene lighting intensity |
Quadruped
A four-legged quadruped robot that must learn to walk forward. The task uses feature-based observations (joint angles, velocities, torso orientation, end effector positions).
Task: Walk forward at a speed of 0.5 m/s.
world = swm.World('swm/QuadrupedDMControl-v0', num_envs=4)
Environment Specs
| Property | Value |
|---|---|
| Action Space | Box(-1, 1, shape=(12,)) — 12 joint torques (4 legs × 3 joints) |
| Observation Space | Feature vector (joint angles, velocities, torso orientation, end effectors) |
| Episode Length | 1000 steps (20s at 0.02s timestep) |
| Environment ID | swm/QuadrupedDMControl-v0 |
| Physics | MuJoCo |
Variation Space
| Factor | Type | Description |
|---|---|---|
agent.color |
Box(0, 1, shape=(3,)) | Quadruped body RGB color |
agent.torso_density |
Box(500, 1500, shape=(1,)) | Torso geom density |
agent.foot_back_left_density |
Box(500, 1500, shape=(1,)) | Back left foot geom density |
agent.knee_back_left_locked |
Discrete(2) | Whether the back left knee joint is locked |
floor.friction |
Box(0, 1, shape=(1,)) | Floor friction coefficient |
floor.color |
Box(0, 1, shape=(2, 3)) | Checkerboard floor colors |
light.intensity |
Box(0, 1, shape=(1,)) | Scene lighting intensity |
Acrobot

A two-link acrobot that must swing up and balance. The task uses sparse rewards and feature-based observations (joint angles, velocities).
Task: Swing up and balance both links upright (sparse reward).
world = swm.World('swm/AcrobotDMControl-v0', num_envs=4)
Environment Specs
| Property | Value |
|---|---|
| Action Space | Box(-1, 1, shape=(1,)) — 1 joint torque (elbow) |
| Observation Space | Feature vector (joint angles, velocities) |
| Episode Length | 500 steps (10s at 0.02s timestep) |
| Environment ID | swm/AcrobotDMControl-v0 |
| Physics | MuJoCo |
Variation Space
| Factor | Type | Description |
|---|---|---|
agent.color |
Box(0, 1, shape=(3,)) | Acrobot body RGB color |
agent.upper_arm_density |
Box(500, 1500, shape=(1,)) | Upper arm geom density |
agent.lower_arm_density |
Box(500, 1500, shape=(1,)) | Lower arm geom density |
agent.upper_arm_locked |
Discrete(2) | Whether the upper arm joint is locked |
target.color |
Box(0, 1, shape=(3,)) | Target RGB color |
target.shape |
Discrete(2) | Target shape (0: box, 1: sphere) |
floor.color |
Box(0, 1, shape=(2, 3)) | Checkerboard floor colors |
light.intensity |
Box(0, 1, shape=(1,)) | Scene lighting intensity |
Pendulum

A single-link pendulum that must swing up and balance. The task uses feature-based observations (angle, angular velocity).
Task: Swing up and balance the pendulum upright.
world = swm.World('swm/PendulumDMControl-v0', num_envs=4)
Environment Specs
| Property | Value |
|---|---|
| Action Space | Box(-1, 1, shape=(1,)) — 1 joint torque |
| Observation Space | Feature vector (angle, angular velocity) |
| Episode Length | 1000 steps (20s at 0.02s timestep) |
| Environment ID | swm/PendulumDMControl-v0 |
| Physics | MuJoCo |
Variation Space
| Factor | Type | Description |
|---|---|---|
agent.color |
Box(0, 1, shape=(3,)) | Pendulum body RGB color |
agent.pole_density |
Box(500, 1500, shape=(1,)) | Pole geom density |
agent.mass_density |
Box(500, 1500, shape=(1,)) | Tip mass geom density |
agent.mass_shape |
Discrete(2) | Tip mass shape (0: box, 1: sphere) |
floor.color |
Box(0, 1, shape=(2, 3)) | Checkerboard floor colors |
light.intensity |
Box(0, 1, shape=(1,)) | Scene lighting intensity |
Cartpole

A cart-pole system that must swing up and balance. The task uses sparse rewards and feature-based observations (cart position, pole angle, velocities).
Task: Swing up and balance the pole upright (sparse reward).
world = swm.World('swm/CartpoleDMControl-v0', num_envs=4)
Environment Specs
| Property | Value |
|---|---|
| Action Space | Box(-1, 1, shape=(1,)) — 1 cart force |
| Observation Space | Feature vector (cart position, pole angle, velocities) |
| Episode Length | 500 steps (10s at 0.02s timestep) |
| Environment ID | swm/CartpoleDMControl-v0 |
| Physics | MuJoCo |
Variation Space
| Factor | Type | Description |
|---|---|---|
agent.color |
Box(0, 1, shape=(3,)) | Cartpole body RGB color |
agent.cart_mass |
Box(0.5, 1.5, shape=(1,)) | Cart geom mass |
agent.pole_density |
Box(500, 1500, shape=(1,)) | Pole geom density |
agent.cart_shape |
Discrete(2) | Cart shape (0: box, 1: sphere) |
floor.color |
Box(0, 1, shape=(2, 3)) | Checkerboard floor colors |
light.intensity |
Box(0, 1, shape=(1,)) | Scene lighting intensity |
Ball in Cup

A planar ball-in-cup system where a cup must catch and hold a ball attached by a string. The task uses feature-based observations (cup position, ball position, velocities).
Task: Swing the ball into the cup and keep it there.
world = swm.World('swm/BallInCupDMControl-v0', num_envs=4)
Environment Specs
| Property | Value |
|---|---|
| Action Space | Box(-1, 1, shape=(2,)) — 2 cup forces (x, z) |
| Observation Space | Feature vector (cup position, ball position, velocities) |
| Episode Length | 1000 steps (20s at 0.02s timestep) |
| Environment ID | swm/BallInCupDMControl-v0 |
| Physics | MuJoCo |
Variation Space
| Factor | Type | Description |
|---|---|---|
agent.color |
Box(0, 1, shape=(3,)) | Cup RGB color |
agent.density |
Box(500, 1500, shape=(1,)) | Cup geom density |
ball.color |
Box(0, 1, shape=(3,)) | Ball RGB color |
ball.density |
Box(500, 1500, shape=(1,)) | Ball geom density |
ball.size |
Box(0.01, 0.05, shape=(1,)) | Ball radius |
target.color |
Box(0, 1, shape=(3,)) | Target RGB color |
target.shape |
Discrete(2) | Target shape (0: box, 1: sphere) |
rendering.render_target |
Discrete(2) | Whether to render the target (0: hidden, 1: visible) |
floor.color |
Box(0, 1, shape=(2, 3)) | Checkerboard floor colors |
light.intensity |
Box(0, 1, shape=(1,)) | Scene lighting intensity |
Finger
A planar finger that must turn a spinner to reach a target angle. The task uses feature-based observations (finger joint angles, spinner angle, target position).
Task: Turn the spinner so that a target on it reaches a goal position (target radius 0.03).
world = swm.World('swm/FingerDMControl-v0', num_envs=4)
Environment Specs
| Property | Value |
|---|---|
| Action Space | Box(-1, 1, shape=(2,)) — 2 joint torques |
| Observation Space | Feature vector (finger joint angles, spinner angle, target position) |
| Episode Length | 1000 steps (20s at 0.02s timestep) |
| Environment ID | swm/FingerDMControl-v0 |
| Physics | MuJoCo |
Variation Space
| Factor | Type | Description |
|---|---|---|
agent.color |
Box(0, 1, shape=(3,)) | Finger body RGB color |
agent.proximal_density |
Box(500, 1500, shape=(1,)) | Proximal link geom density |
agent.fingertip_density |
Box(500, 1500, shape=(1,)) | Fingertip geom density |
spinner.color |
Box(0, 1, shape=(3,)) | Spinner RGB color |
spinner.density |
Box(500, 1500, shape=(1,)) | Spinner geom density |
spinner.friction |
Box(0, 1, shape=(1,)) | Spinner hinge friction loss |
target.color |
Box(0, 1, shape=(3,)) | Target RGB color |
target.shape |
Discrete(2) | Target shape (0: box, 1: sphere) |
rendering.render_target |
Discrete(2) | Whether to render the target (0: hidden, 1: visible) |
floor.color |
Box(0, 1, shape=(2, 3)) | Checkerboard floor colors |
light.intensity |
Box(0, 1, shape=(1,)) | Scene lighting intensity |
Manipulator
A planar robotic arm with a gripper that must bring a ball to a target location. The task is fully observable and uses feature-based observations (arm joint angles, velocities, object and target positions).
Task: Grasp a ball and bring it to a target position.
world = swm.World('swm/ManipulatorDMControl-v0', num_envs=4)
Environment Specs
| Property | Value |
|---|---|
| Action Space | Box(-1, 1, shape=(5,)) — 5 joint torques (arm + gripper) |
| Observation Space | Feature vector (arm joints, velocities, object/target positions) |
| Episode Length | 1000 steps (10s at 0.01s timestep) |
| Environment ID | swm/ManipulatorDMControl-v0 |
| Physics | MuJoCo |
Variation Space
| Factor | Type | Description |
|---|---|---|
agent.color |
Box(0, 1, shape=(3,)) | Manipulator body RGB color |
agent.upper_arm_density |
Box(500, 1500, shape=(1,)) | Upper arm geom density |
agent.hand_density |
Box(500, 1500, shape=(1,)) | Hand geom density |
agent.upper_arm_length |
Box(500, 1500, shape=(1,)) | Upper arm length |
target.color |
Box(0, 1, shape=(3,)) | Target RGB color |
target.shape |
Discrete(2) | Target shape (0: box, 1: sphere) |
rendering.render_target |
Discrete(2) | Whether to render the target (0: hidden, 1: visible) |
floor.color |
Box(0, 1, shape=(2, 3)) | Checkerboard floor colors |
light.intensity |
Box(0, 1, shape=(1,)) | Scene lighting intensity |