Two-Room

A 2D navigation task through doorways

pusht

Description

A 2D navigation task where a circular agent must reach a target position in another room by navigating through doorways. The environment uses PyTorch-based rendering and collision detection with a wall dividing the space into two rooms connected by one or more doors.

The agent starts in one room and must navigate to the target in the opposite room. The task requires planning a path through the door openings rather than simple point-to-point navigation.

Success criteria: The episode terminates when the agent is within 16 pixels of the target.

import stable_worldmodel as swm
world = swm.World('swm/TwoRoom-v1', num_envs=4, image_shape=(128, 128))

Environment Specs

Property	Value
Action Space	`Box(-1, 1, shape=(2,))` — 2D velocity direction
Observation Space	`Box(0, 224, shape=(10,))` — state vector
Reward	0 (sparse)
Episode Length	Until target reached or timeout
Render Size	224×224 (fixed)
Physics	Torch-based, 10 Hz control

Fixed Geometry Constants

Constant	Value	Description
`IMG_SIZE`	224	Image dimensions
`BORDER_SIZE`	14	Border thickness
`WALL_CENTER`	112	Wall position (center)
`MAX_DOOR`	3	Maximum number of doors
`MAX_SPEED`	10.5	Maximum agent speed

Observation Details

The observation is a flat state vector of shape (10,):

Index	Description
0-1	Agent position (x, y)
2-3	Target position (x, y)
4-9	Door center positions (up to 3 doors × 2 coords)

Info Dictionary

The info dict returned by step() and reset() contains:

Key	Description
`target`	Target image (3, H, W) — agent rendered at target position
`pos_agent`	Current agent position as numpy array
`pos_target`	Target position as numpy array
`target_pos`	Target position from variation space
`state`	Agent position tensor
`distance_to_target`	Euclidean distance to target

Variation Space

tworoom_fov

The environment supports extensive customization through the variation space:

Factor	Type	Description
`agent.color`	RGBBox	Agent color (default: red)
`agent.radius`	Box(7, 14)	Agent radius in pixels
`agent.position`	Box	Starting position (can be either room)
`agent.speed`	Box(1.75, 10.5)	Movement speed in pixels/step
`target.color`	RGBBox	Target color (default: green)
`target.radius`	Box(7, 14)	Target radius in pixels
`target.position`	Box	Target position (opposite room from agent)
`wall.color`	RGBBox	Wall color (default: black)
`wall.thickness`	Discrete(7, 35)	Wall thickness in pixels
`wall.axis`	Discrete(2)	0: horizontal, 1: vertical
`wall.border_color`	RGBBox	Border color (default: black)
`door.color`	RGBBox	Door color (default: white)
`door.number`	Discrete(1, 3)	Number of doors
`door.size`	MultiDiscrete(1, 21)	Half-extent size of each door
`door.position`	MultiDiscrete(0, 224)	Center position of each door along wall
`background.color`	RGBBox	Background color (default: white)
`rendering.render_target`	Discrete(2)	Whether to render target (0: no, 1: yes)
`task.min_steps`	Discrete(15, 100)	Minimum steps required to reach target

Constraints

Agent position: Must not overlap with wall (collision-constrained)
Target position: Must be in opposite room from agent
Door size: At least one door must fit the agent (door_size ≥ 1.1 × agent_radius)
Min steps: Target is sampled such that path length / speed ≥ min_steps

Default Variations

By default, these factors are randomized at each reset:

agent.position
target.position

To randomize additional factors:

# Randomize colors for domain randomization
world.reset(options={'variation': ['agent.color', 'target.color', 'background.color']})

# Randomize everything
world.reset(options={'variation': ['all']})

Datasets

Name	Episodes	Policy	Download
`tworoom_expert`	1000	Weak Expert	—

Expert Policy

This environment includes a built-in weak expert policy for data collection:

from stable_worldmodel.envs.two_room import ExpertPolicy

policy = ExpertPolicy()
world.set_policy(policy)