Design Principles#
Simulator Lifecycle and Callback Integration#
The MineStudio simulator follows a standard reinforcement learning environment lifecycle, including reset, step, render, and close methods. A key design principle is the integration of a flexible callback system, allowing users to hook into these lifecycle methods to customize behavior without modifying the core simulator code.
Callbacks are executed in the order they are provided in the callbacks list during MinecraftSim initialization.
reset(): Initializes or resets the environment to a starting state.before_reset(self, sim, reset_flag: bool) -> bool: Executed for each callback before the main reset logic. It receives the simulator instance (sim) and areset_flag. A callback can returnFalseto potentially suppress the underlyingself.env.reset()call (e.g., for a custom fast reset). Thereset_flagpassed to subsequent callbacks is the result of the previous one.The core environment reset (
self.env.reset()) is called ifreset_flagremainsTrueafter allbefore_resetcalls.A fixed number of no-op actions (
self.num_empty_frames) are then performed to skip initial loading frames.The observation and info are wrapped by
_wrap_obs_info.after_reset(self, sim, obs, info): Executed for each callback after the main reset logic and initial frame skipping. It receives the simulator instance, the initial observation (obs), and info dictionary (info). Callbacks can modifyobsandinfohere. The modifiedobsandinfoare passed to subsequent callbacks.The final
obsandinfoare returned.
# Simplified structure of MinecraftSim.reset() def reset(self) -> Tuple[np.ndarray, Dict]: reset_flag = True for callback in self.callbacks: reset_flag = callback.before_reset(self, reset_flag) # Hook before reset if reset_flag: # Main environment reset self.env.reset() self.already_reset = True for _ in range(self.num_empty_frames): # Skip initial frames action = self.env.action_space.no_op() obs, reward, done, info = self.env.step(action) obs, info = self._wrap_obs_info(obs, info) # Wrap observation and info for callback in self.callbacks: # Hook after reset, can modify obs and info obs, info = callback.after_reset(self, obs, info) self.obs, self.info = obs, info # Update internal state return obs, info
Hint
Use Cases for
resetcallbacks:Custom Initialization: Use
after_resetto send commands (e.g.,/time set day,/give), set player properties, or log initial state.Fast Reset: Implement
before_resetto returnFalseand handle resetting the agent’s state (e.g., teleport, clear inventory) without a full environment reload.after_resetcan then finalize this custom reset.Observation/Info Augmentation: Add task-specific information or modify the initial observation in
after_reset.
step(action): Executes one time-step in the environment.If
action_typeis'agent', the inputactionis first converted to the environment’s action format usingagent_action_to_env_action.before_step(self, sim, action): Executed for each callback. It receives the simulator instance and the (potentially converted)action. Callbacks can modify theactionbefore it’s passed to the environment. The modifiedactionis passed to subsequent callbacks.The core environment step (
self.env.step(action.copy())) is performed.terminatedandtruncatedflags are set (both todonein the current implementation).The observation and info are wrapped by
_wrap_obs_info.after_step(self, sim, obs, reward, terminated, truncated, info): Executed for each callback. It receives the simulator instance and the results fromself.env.step(). Callbacks can modify these values. The modified values are passed to subsequent callbacks.The final
obs,reward,terminated,truncated, andinfoare returned.
# Simplified structure of MinecraftSim.step() def step(self, action: Dict[str, Any]) -> Tuple[np.ndarray, float, bool, bool, Dict[str, Any]]: if self.action_type == 'agent': env_action = self.agent_action_to_env_action(action) # ... action dictionary manipulation ... action.update(env_action) for callback in self.callbacks: action = callback.before_step(self, action) # Hook before step obs, reward, done, info = self.env.step(action.copy()) # Main environment step terminated, truncated = done, done # Determine termination obs, info = self._wrap_obs_info(obs, info) # Wrap observation and info for callback in self.callbacks: # Hook after step, can modify results obs, reward, terminated, truncated, info = callback.after_step(self, obs, reward, terminated, truncated, info) self.obs, self.info = obs, info # Update internal state return obs, reward, terminated, truncated, info
Hint
Use Cases for
stepcallbacks:Action Masking/Modification: Change or restrict actions in
before_step.Custom Reward Shaping: Modify the
rewardinafter_stepbased onobsorinfo.Trajectory Recording: Log
obs,action,reward,infoinafter_step.Early Termination: Modify
terminatedortruncatedflags inafter_stepbased on custom conditions.
render(): Renders the current environment observation.Retrieves the current observation image (
self.obs['image']).before_render(self, sim, image): Executed for each callback. Receives the simulator instance and the currentimage. Callbacks can modify theimage(e.g., add overlays, annotations) before the main rendering logic (if any) or before it’s passed to subsequent callbacks.after_render(self, sim, image): Executed for each callback. Receives the simulator instance and the (potentially modified bybefore_render)image. Callbacks can further process theimage.The final
imageis returned.
# Structure of MinecraftSim.render() def render(self) -> None: image = self.obs['image'] for callback in self.callbacks: image = callback.before_render(self, image) # Hook before rendering modifications # ! core logic (currently, core logic is minimal, focus is on callbacks) for callback in self.callbacks: image = callback.after_render(self, image) # Hook after rendering modifications return image
Hint
Use Cases for
rendercallbacks:Visualization Augmentation: Use
before_renderorafter_renderto draw debug information, agent stats, or highlight important elements on the frame.Image Preprocessing for Display: Resize or format the image for specific display requirements.
close(): Cleans up and closes the environment.before_close(self, sim): Executed for each callback before the underlying environment is closed. Useful for saving final data or logs.The core environment close (
self.env.close()) is called.after_close(self, sim): Executed for each callback after the underlying environment has been closed. Useful for final cleanup tasks that depend on the environment being closed.The status from
self.env.close()is returned.
# Structure of MinecraftSim.close() def close(self) -> None: for callback in self.callbacks: callback.before_close(self) # Hook before closing close_status = self.env.close() # Main environment close for callback in self.callbacks: callback.after_close(self) # Hook after closing return close_status
Hint
Use Cases for
closecallbacks:Final Data Saving: Save recorded trajectories, statistics, or model checkpoints in
before_close.Resource Release: Release any resources acquired by callbacks during the simulation.
Callbacks Base Class#
Callbacks are classes that inherit from MinecraftCallback and can override any of its methods to inject custom logic at different points in the simulator’s lifecycle. All callback methods receive the simulator instance (sim) as their first argument, allowing them to access and potentially modify the simulator’s state or data.
The base MinecraftCallback class defines the following methods, all of which simply pass through the data by default:
class MinecraftCallback:
def before_step(self, sim, action):
"""Called before `env.step()`.
Args:
sim: The MinecraftSim instance.
action: The action to be taken.
Returns:
The potentially modified action.
"""
return action
def after_step(self, sim, obs, reward, terminated, truncated, info):
"""Called after `env.step()`.
Args:
sim: The MinecraftSim instance.
obs: The observation from the environment.
reward: The reward from the environment.
terminated: The terminated flag from the environment.
truncated: The truncated flag from the environment.
info: The info dictionary from the environment.
Returns:
A tuple of (obs, reward, terminated, truncated, info), potentially modified.
"""
return obs, reward, terminated, truncated, info
def before_reset(self, sim, reset_flag: bool) -> bool:
"""Called before `env.reset()`.
Args:
sim: The MinecraftSim instance.
reset_flag: Boolean indicating if a hard reset should occur.
Returns:
Boolean indicating if the hard reset should still occur.
"""
return reset_flag
def after_reset(self, sim, obs, info):
"""Called after `env.reset()` and initial frame skipping.
Args:
sim: The MinecraftSim instance.
obs: The initial observation.
info: The initial info dictionary.
Returns:
A tuple of (obs, info), potentially modified.
"""
return obs, info
def before_close(self, sim):
"""Called before `env.close()`.
Args:
sim: The MinecraftSim instance.
"""
return
def after_close(self, sim):
"""Called after `env.close()`.
Args:
sim: The MinecraftSim instance.
"""
return
def before_render(self, sim, image):
"""Called before the main rendering logic in `sim.render()`.
Args:
sim: The MinecraftSim instance.
image: The current image to be rendered.
Returns:
The potentially modified image.
"""
return image
def after_render(self, sim, image):
"""Called after the main rendering logic in `sim.render()`.
Args:
sim: The MinecraftSim instance.
image: The image after initial rendering/modifications.
Returns:
The potentially modified image.
"""
return image
By implementing one or more of these methods in a custom callback class, users can precisely control and extend the behavior of the MineStudio simulator.