
Description
A 2D navigation task where a circular agent must reach a target position in another room by navigating through doorways. The environment uses PyTorch-based rendering and collision detection with a wall dividing the space into two rooms connected by one or more doors.
The agent starts in one room and must navigate to the target in the opposite room. The task requires planning a path through the door openings rather than simple point-to-point navigation.
Success criteria: The episode terminates when the agent is within 16 pixels of the target.
import stable_worldmodel as swm
world = swm.World('swm/TwoRoom-v1', num_envs=4, image_shape=(128, 128))
Environment Specs
| Property | Value |
|---|---|
| Action Space | Box(-1, 1, shape=(2,)) — 2D velocity direction |
| Observation Space | Box(0, 224, shape=(10,)) — state vector |
| Reward | 0 (sparse) |
| Episode Length | Until target reached or timeout |
| Render Size | 224×224 (fixed) |
| Physics | Torch-based, 10 Hz control |
Fixed Geometry Constants
| Constant | Value | Description |
|---|---|---|
IMG_SIZE |
224 | Image dimensions |
BORDER_SIZE |
14 | Border thickness |
WALL_CENTER |
112 | Wall position (center) |
MAX_DOOR |
3 | Maximum number of doors |
MAX_SPEED |
10.5 | Maximum agent speed |
Observation Details
The observation is a flat state vector of shape (10,):
| Index | Description |
|---|---|
| 0-1 | Agent position (x, y) |
| 2-3 | Target position (x, y) |
| 4-9 | Door center positions (up to 3 doors × 2 coords) |
Info Dictionary
The info dict returned by step() and reset() contains:
| Key | Description |
|---|---|
target |
Target image (3, H, W) — agent rendered at target position |
pos_agent |
Current agent position as numpy array |
pos_target |
Target position as numpy array |
target_pos |
Target position from variation space |
state |
Agent position tensor |
distance_to_target |
Euclidean distance to target |
Variation Space

The environment supports extensive customization through the variation space:
| Factor | Type | Description |
|---|---|---|
agent.color |
RGBBox | Agent color (default: red) |
agent.radius |
Box(7, 14) | Agent radius in pixels |
agent.position |
Box | Starting position (can be either room) |
agent.speed |
Box(1.75, 10.5) | Movement speed in pixels/step |
target.color |
RGBBox | Target color (default: green) |
target.radius |
Box(7, 14) | Target radius in pixels |
target.position |
Box | Target position (opposite room from agent) |
wall.color |
RGBBox | Wall color (default: black) |
wall.thickness |
Discrete(7, 35) | Wall thickness in pixels |
wall.axis |
Discrete(2) | 0: horizontal, 1: vertical |
wall.border_color |
RGBBox | Border color (default: black) |
door.color |
RGBBox | Door color (default: white) |
door.number |
Discrete(1, 3) | Number of doors |
door.size |
MultiDiscrete(1, 21) | Half-extent size of each door |
door.position |
MultiDiscrete(0, 224) | Center position of each door along wall |
background.color |
RGBBox | Background color (default: white) |
rendering.render_target |
Discrete(2) | Whether to render target (0: no, 1: yes) |
task.min_steps |
Discrete(15, 100) | Minimum steps required to reach target |
Constraints
- Agent position: Must not overlap with wall (collision-constrained)
- Target position: Must be in opposite room from agent
- Door size: At least one door must fit the agent (door_size ≥ 1.1 × agent_radius)
- Min steps: Target is sampled such that path length / speed ≥ min_steps
Default Variations
By default, these factors are randomized at each reset:
agent.positiontarget.position
To randomize additional factors:
# Randomize colors for domain randomization
world.reset(options={'variation': ['agent.color', 'target.color', 'background.color']})
# Randomize everything
world.reset(options={'variation': ['all']})
Datasets
| Name | Episodes | Policy | Download |
|---|---|---|---|
tworoom_expert |
1000 | Weak Expert | — |
Expert Policy
This environment includes a built-in weak expert policy for data collection:
from stable_worldmodel.envs.two_room import ExpertPolicy
policy = ExpertPolicy()
world.set_policy(policy)