Design Principles#

Simulator Lifecycle#

The simulator lifecycle is divided into three stages: reset, step, and close.

  • reset: This method is called when the environment is initialized. It returns the initial observation and information. Our simulator wrapper’s reset method code looks like this:

    def reset(self):
        reset_flag = True
        for callback in self.callbacks:
            reset_flag = callback.before_reset(self, reset_flag)
        ... # some other code
        if reset_flag:
            obs, info = self.env.reset()
        else:
            obs, info = ...
        for callback in self.callbacks:
            obs, info = callback.after_reset(self, obs, info)
        return obs, info
    

    Hint

    We can use callbacks to preprocess the obs or info before it is returned to the agent.

    For example, we can add task information to the observation when the environment is reset, so that the agent knows what task it is going to perform.

    Besides, we can implement fast reset by suppressing the internal environment reset.

  • step: This method is called when the agent takes an action. It returns the observation, reward, termination status, and information. The step method code looks like this:

    def step(self, action):
        for callback in self.callbacks:
            action = callback.before_step(self, action)
        obs, reward, terminated, truncated, info = self.env.step(action.copy()) 
        ... # some other code
        for callback in self.callbacks:
            obs, reward, terminated, truncated, info = callback.after_step(
                self, obs, reward, terminated, truncated, info
            )
        return obs, reward, terminated, truncated, info
    

    Hint

    We can use callbacks to preprocess the action before it is passed to the environment. For example, we can mask the action that we do not want to use.

    Or we can use callbacks to post-process the observation, reward, termination status, and information before the environment returns them.

    The callbacks can be sequentially executed in the order they are added to the simulator.

  • close: This method is called when the environment is closed. The close method code looks like this:

    def close(self):
        for callback in self.callbacks:
            callback.before_close(self)
        close_status = self.env.close()
        for callback in self.callbacks:
            callback.after_close(self)
        return close_status
    

    Hint

    We can use callbacks to do some cleanup work before the environment is closed. For example, we can save the trajectories or doing some logging.

Callbacks#

Callbacks are used to customize the environment. All the callbacks are optional, and you can use them in any combination.

The structure of a callback is as follows:

class MinecraftCallback:
    
    def before_step(self, sim, action):
        return action
    
    def after_step(self, sim, obs, reward, terminated, truncated, info):
        return obs, reward, terminated, truncated, info
    
    def before_reset(self, sim, reset_flag: bool) -> bool: # whether need to call env reset
        return reset_flag
    
    def after_reset(self, sim, obs, info):
        return obs, info
    
    def before_close(self, sim):
        return
    
    def after_close(self, sim):
        return
    
    def before_render(self, sim):
        return
    
    def after_render(self, sim):
        return