Push-T

A 2D contact-rich manipulation task

pusht

Description

A 2D contact-rich manipulation task where an agent controls a circular end-effector to push a T-shaped block into a target pose. The environment uses Pymunk physics simulation with realistic friction and collision dynamics.

The agent must push the block to match both the target position and orientation, making this a challenging task that requires planning multi-step pushing sequences rather than simple point-to-point control.

Success criteria: The episode is successful when the block position error is less than 20 pixels AND the orientation error is less than π/9 radians (~20°) than the target configuration.

import stable_worldmodel as swm
world = swm.World('swm/PushT-v1', num_envs=4, image_shape=(128, 128))

Environment Specs

Property	Value
Action Space	`Box(-1, 1, shape=(2,))` — 2D relative velocity control
Observation Space	`Dict(proprio=(4,), state=(7,))`
Reward	Negative distance to goal state (higher is better)
Episode Length	200 steps (default)
Render Size	224×224 (configurable)
Physics	Pymunk, 10 Hz control, PD controller

Observation Details

Key	Shape	Description
`proprio`	`(4,)`	Agent position (x, y) and velocity (vx, vy)
`state`	`(7,)`	Full state: agent pos (2), block pos (2), block angle (1), agent vel (2)

Info Dictionary

The info dict returned by step() and reset() contains:

Key	Description
`goal`	Goal image (H, W, 3)
`goal_state`	Goal state vector (7,)
`goal_proprio`	Goal proprioception (4,)
`pos_agent`	Current agent position
`vel_agent`	Current agent velocity
`block_pose`	Block position and angle (3,)
`n_contacts`	Number of contact points this step

Variation Space

pusht_fov

The environment supports extensive customization through the variation space:

Factor	Type	Description
`agent.color`	RGBBox	Agent color (default: RoyalBlue)
`agent.scale`	Box(20, 60)	Agent size
`agent.shape`	Discrete(8)	Shape index (default: circle)
`agent.angle`	Box(-2π, 2π)	Initial rotation
`agent.start_position`	Box([50, 50], [450, 450])	Starting position
`agent.velocity`	Box([0, 0], [512, 512])	Initial velocity
`block.color`	RGBBox	Block color (default: LightSlateGray)
`block.scale`	Box(20, 60)	Block size
`block.shape`	Discrete(7)	Shape: L, T, Z, square, I, small_tee, +
`block.angle`	Box(-2π, 2π)	Initial rotation
`block.start_position`	Box([100, 100], [400, 400])	Starting position
`goal.color`	RGBBox	Goal overlay color (default: LightGreen)
`goal.scale`	Box(20, 60)	Goal size
`goal.angle`	Box(-2π, 2π)	Target rotation
`goal.position`	Box([50, 50], [450, 450])	Target position
`background.color`	RGBBox	Background color (default: white)
`rendering.render_goal`	Discrete(2)	Whether to render the goal overlay (default: 1)

Default Variations

By default, only these factors are randomized at each reset:

agent.start_position
block.start_position
block.angle

To randomize additional factors, pass them via the variation option:

# Randomize colors for domain randomization
world.reset(options={'variation': ['agent.color', 'block.color', 'background.color']})

# Randomize everything
world.reset(options={'variation': ['all']})

Datasets

Name	Episodes	Policy	Download
`pusht_expert`	1000	Weak Expert	—

Expert Policy

This environment includes a built-in weak expert policy for data collection. The WeakPolicy generates actions that keep the agent near the block, increasing the probability of meaningful interactions during data collection.

from stable_worldmodel.envs.pusht import WeakPolicy

policy = WeakPolicy(dist_constraint=100)
world.set_policy(policy)

WeakPolicy Parameters

Parameter	Type	Default	Description
`dist_constraint`	int	100	Pixel distance constraint around the block for sampling actions. Actions are clipped to keep the agent within this distance of the block center.

Usage with Vectorized Environments

The WeakPolicy works seamlessly with both single and vectorized environments. When used with vectorized environments (like those created by World), it automatically detects the environment spec from sub-environments:

import stable_worldmodel as swm
from stable_worldmodel.envs.pusht import WeakPolicy

# Works with vectorized environments
world = swm.World('swm/PushT-v1', num_envs=4, image_shape=(64, 64))
policy = WeakPolicy(dist_constraint=50)  # Tighter constraint for more interactions
world.set_policy(policy)

# Collect data
world.record_dataset(
    dataset_name='pusht_weak_expert',
    episodes=100,
    seed=42
)

Discrete Action Space

The policy automatically detects and handles discrete action spaces when using swm/PushTDiscrete-v* environments:

world = swm.World('swm/PushTDiscrete-v1', num_envs=4, image_shape=(64, 64))
policy = WeakPolicy()  # Automatically uses quantized actions
world.set_policy(policy)