Design Principles#
Simulator Lifecycle and Callback Integration#
The MineStudio simulator follows a standard reinforcement learning environment lifecycle, including reset
, step
, render
, and close
methods. A key design principle is the integration of a flexible callback system, allowing users to hook into these lifecycle methods to customize behavior without modifying the core simulator code.
Callbacks are executed in the order they are provided in the callbacks
list during MinecraftSim
initialization.
reset()
: Initializes or resets the environment to a starting state.before_reset(self, sim, reset_flag: bool) -> bool
: Executed for each callback before the main reset logic. It receives the simulator instance (sim
) and areset_flag
. A callback can returnFalse
to potentially suppress the underlyingself.env.reset()
call (e.g., for a custom fast reset). Thereset_flag
passed to subsequent callbacks is the result of the previous one.The core environment reset (
self.env.reset()
) is called ifreset_flag
remainsTrue
after allbefore_reset
calls.A fixed number of no-op actions (
self.num_empty_frames
) are then performed to skip initial loading frames.The observation and info are wrapped by
_wrap_obs_info
.after_reset(self, sim, obs, info)
: Executed for each callback after the main reset logic and initial frame skipping. It receives the simulator instance, the initial observation (obs
), and info dictionary (info
). Callbacks can modifyobs
andinfo
here. The modifiedobs
andinfo
are passed to subsequent callbacks.The final
obs
andinfo
are returned.
# Simplified structure of MinecraftSim.reset() def reset(self) -> Tuple[np.ndarray, Dict]: reset_flag = True for callback in self.callbacks: reset_flag = callback.before_reset(self, reset_flag) # Hook before reset if reset_flag: # Main environment reset self.env.reset() self.already_reset = True for _ in range(self.num_empty_frames): # Skip initial frames action = self.env.action_space.no_op() obs, reward, done, info = self.env.step(action) obs, info = self._wrap_obs_info(obs, info) # Wrap observation and info for callback in self.callbacks: # Hook after reset, can modify obs and info obs, info = callback.after_reset(self, obs, info) self.obs, self.info = obs, info # Update internal state return obs, info
Hint
Use Cases for
reset
callbacks:Custom Initialization: Use
after_reset
to send commands (e.g.,/time set day
,/give
), set player properties, or log initial state.Fast Reset: Implement
before_reset
to returnFalse
and handle resetting the agent’s state (e.g., teleport, clear inventory) without a full environment reload.after_reset
can then finalize this custom reset.Observation/Info Augmentation: Add task-specific information or modify the initial observation in
after_reset
.
step(action)
: Executes one time-step in the environment.If
action_type
is'agent'
, the inputaction
is first converted to the environment’s action format usingagent_action_to_env_action
.before_step(self, sim, action)
: Executed for each callback. It receives the simulator instance and the (potentially converted)action
. Callbacks can modify theaction
before it’s passed to the environment. The modifiedaction
is passed to subsequent callbacks.The core environment step (
self.env.step(action.copy())
) is performed.terminated
andtruncated
flags are set (both todone
in the current implementation).The observation and info are wrapped by
_wrap_obs_info
.after_step(self, sim, obs, reward, terminated, truncated, info)
: Executed for each callback. It receives the simulator instance and the results fromself.env.step()
. Callbacks can modify these values. The modified values are passed to subsequent callbacks.The final
obs
,reward
,terminated
,truncated
, andinfo
are returned.
# Simplified structure of MinecraftSim.step() def step(self, action: Dict[str, Any]) -> Tuple[np.ndarray, float, bool, bool, Dict[str, Any]]: if self.action_type == 'agent': env_action = self.agent_action_to_env_action(action) # ... action dictionary manipulation ... action.update(env_action) for callback in self.callbacks: action = callback.before_step(self, action) # Hook before step obs, reward, done, info = self.env.step(action.copy()) # Main environment step terminated, truncated = done, done # Determine termination obs, info = self._wrap_obs_info(obs, info) # Wrap observation and info for callback in self.callbacks: # Hook after step, can modify results obs, reward, terminated, truncated, info = callback.after_step(self, obs, reward, terminated, truncated, info) self.obs, self.info = obs, info # Update internal state return obs, reward, terminated, truncated, info
Hint
Use Cases for
step
callbacks:Action Masking/Modification: Change or restrict actions in
before_step
.Custom Reward Shaping: Modify the
reward
inafter_step
based onobs
orinfo
.Trajectory Recording: Log
obs
,action
,reward
,info
inafter_step
.Early Termination: Modify
terminated
ortruncated
flags inafter_step
based on custom conditions.
render()
: Renders the current environment observation.Retrieves the current observation image (
self.obs['image']
).before_render(self, sim, image)
: Executed for each callback. Receives the simulator instance and the currentimage
. Callbacks can modify theimage
(e.g., add overlays, annotations) before the main rendering logic (if any) or before it’s passed to subsequent callbacks.after_render(self, sim, image)
: Executed for each callback. Receives the simulator instance and the (potentially modified bybefore_render
)image
. Callbacks can further process theimage
.The final
image
is returned.
# Structure of MinecraftSim.render() def render(self) -> None: image = self.obs['image'] for callback in self.callbacks: image = callback.before_render(self, image) # Hook before rendering modifications # ! core logic (currently, core logic is minimal, focus is on callbacks) for callback in self.callbacks: image = callback.after_render(self, image) # Hook after rendering modifications return image
Hint
Use Cases for
render
callbacks:Visualization Augmentation: Use
before_render
orafter_render
to draw debug information, agent stats, or highlight important elements on the frame.Image Preprocessing for Display: Resize or format the image for specific display requirements.
close()
: Cleans up and closes the environment.before_close(self, sim)
: Executed for each callback before the underlying environment is closed. Useful for saving final data or logs.The core environment close (
self.env.close()
) is called.after_close(self, sim)
: Executed for each callback after the underlying environment has been closed. Useful for final cleanup tasks that depend on the environment being closed.The status from
self.env.close()
is returned.
# Structure of MinecraftSim.close() def close(self) -> None: for callback in self.callbacks: callback.before_close(self) # Hook before closing close_status = self.env.close() # Main environment close for callback in self.callbacks: callback.after_close(self) # Hook after closing return close_status
Hint
Use Cases for
close
callbacks:Final Data Saving: Save recorded trajectories, statistics, or model checkpoints in
before_close
.Resource Release: Release any resources acquired by callbacks during the simulation.
Callbacks Base Class#
Callbacks are classes that inherit from MinecraftCallback
and can override any of its methods to inject custom logic at different points in the simulator’s lifecycle. All callback methods receive the simulator instance (sim
) as their first argument, allowing them to access and potentially modify the simulator’s state or data.
The base MinecraftCallback
class defines the following methods, all of which simply pass through the data by default:
class MinecraftCallback:
def before_step(self, sim, action):
"""Called before `env.step()`.
Args:
sim: The MinecraftSim instance.
action: The action to be taken.
Returns:
The potentially modified action.
"""
return action
def after_step(self, sim, obs, reward, terminated, truncated, info):
"""Called after `env.step()`.
Args:
sim: The MinecraftSim instance.
obs: The observation from the environment.
reward: The reward from the environment.
terminated: The terminated flag from the environment.
truncated: The truncated flag from the environment.
info: The info dictionary from the environment.
Returns:
A tuple of (obs, reward, terminated, truncated, info), potentially modified.
"""
return obs, reward, terminated, truncated, info
def before_reset(self, sim, reset_flag: bool) -> bool:
"""Called before `env.reset()`.
Args:
sim: The MinecraftSim instance.
reset_flag: Boolean indicating if a hard reset should occur.
Returns:
Boolean indicating if the hard reset should still occur.
"""
return reset_flag
def after_reset(self, sim, obs, info):
"""Called after `env.reset()` and initial frame skipping.
Args:
sim: The MinecraftSim instance.
obs: The initial observation.
info: The initial info dictionary.
Returns:
A tuple of (obs, info), potentially modified.
"""
return obs, info
def before_close(self, sim):
"""Called before `env.close()`.
Args:
sim: The MinecraftSim instance.
"""
return
def after_close(self, sim):
"""Called after `env.close()`.
Args:
sim: The MinecraftSim instance.
"""
return
def before_render(self, sim, image):
"""Called before the main rendering logic in `sim.render()`.
Args:
sim: The MinecraftSim instance.
image: The current image to be rendered.
Returns:
The potentially modified image.
"""
return image
def after_render(self, sim, image):
"""Called after the main rendering logic in `sim.render()`.
Args:
sim: The MinecraftSim instance.
image: The image after initial rendering/modifications.
Returns:
The potentially modified image.
"""
return image
By implementing one or more of these methods in a custom callback class, users can precisely control and extend the behavior of the MineStudio simulator.