Offline Training#

Pre-training is a crucial approach for equipping policy models with diverse behaviors, as demonstrated in VPT. MineStudio supports pre-training with offline data, enabling users to easily perform pre-training through a straightforward configuration file.

Note

The MineStudio offline module is built on PyTorch Lightning, providing high flexibility and enabling users to customize it to suit their specific needs.

Quick Start#

Offline Training with MineStudio

Basic Arguments#

minestudio.offline.trainer.MineLightning is the core class for offline training. It is a subclass of lightning.LightningModule and provides a simple interface for users to customize their training process.

Arguments	Description
`mine_policy`	The policy model to be trained.
`callbacks`	A list of objective callbacks to be used during training.
`hyperparameters`	A dictionary of hyperparameters to be logged to `wandb`.
`log_freq`	The frequency at which logs are uploaded to `wandb`.
`learning_rate`	The learning rate for the optimizer.
`weight_decay`	The weight decay for the optimizer.
`warmup_steps`	The number of warm-up steps for the learning rate scheduler. It is important to train transformer-like networks.

Note

We use AdamW as the default optimizer with a linear learning rate scheduler for warmup stage.

Long-Trajectory Training

Due to our advanced data structure, the offline trainer seamlessly supports long-trajectory training. By setting episode_continuous_batch=True when creating the data module and implementing a memory-based policy, such as a TransformerXL-based policy, the trainer will automatically manage memory iteration for you.

Objective Callbacks#

The loss function is a key component that users often wish to customize when developing new algorithms. MineStudio standardizes this interface and offers a selection of built-in loss functions that users can utilize directly.

The objective callback template is simple:

class ObjectiveCallback:

    def __init__(self):
        ...

    def __call__(
        self, 
        batch: Dict[str, Any], 
        batch_idx: int, 
        step_name: str, 
        latents: Dict[str, torch.Tensor], 
        mine_policy: MinePolicy
    ) -> Dict[str, torch.Tensor]:
        return {
            'loss': ..., 
            'other_key': ...,
        }

Hint

latents will be returned by the MinePolicy object, so users can pass any objective-related information to the callback via latents variable.

Warning

loss term will be added to the final loss function, and all other keys will only be logged to the wandb or other loggers.

Here are some examples of built-in objective callbacks:

Offline Training#

Quick Start#

Basic Arguments#

Objective Callbacks#

This Page