Getting Started#
Before you start, make sure you have installed MineStudio and its dependencies.
Install MineStudio
Installation
Note
If you encounter any issues during installation, please open an issue on GitHub.
Welcome to MineStudio, please follow the tutorial below for installation.
Install JDK 8
To ensure that the Simulator runs smoothly, please make sure that JDK 8 is installed on your system. We recommend using conda to maintain an environment on Linux systems.
$ conda create -n minestudio python=3.10 -y
$ conda activate minestudio
$ conda install --channel=conda-forge openjdk=8 -y
Install MineStudio
a. Install MineStudio from the GitHub.
$ pip install git+https://github.com/CraftJarvis/MineStudio.git
b. Install MineStudio from PyPI.
$ pip install minestudio
Install the rendering tool
For users with nvidia graphics cards, we recommend installing VirtualGL; for other users, we recommend using Xvfb, which supports CPU rendering but is relatively slower.
Note
Installing rendering tools may require root permissions.
There are two options:
$ apt update
$ apt install -y xvfb mesa-utils libegl1-mesa libgl1-mesa-dev libglu1-mesa-dev
Warning
Not all graphics cards support virtualGL. If you do not have speed requirements, it is recommended to use the easier-to-install xvfb rendering tool.
You need to download the following sources:
$ apt update
$ apt install -y xvfb mesa-utils libegl1-mesa libgl1-mesa-dev libglu1-mesa-dev
Install the downloaded package.
$ dpkg -i virtualgl_3.1_amd64.deb
Shutdown the display manager and configure VirtualGL.
$ service gdm stop
Configure VirtualGL.
$ /opt/VirtualGL/bin/vglserver_config
Note
First choose 1,then Yes, No, No, No,finally enter X
Start the display manager.
$ service gdm start
Start the VirtualGL server.
$ bash vgl_entrypoint.sh
Warning
Each time the system is restarted, it may be necessary to run vgl_entrypoint.sh
.
Configure the environment variables.
$ export PATH="${PATH}:/opt/VirtualGL/bin"
$ export LD_LIBRARY_PATH="/usr/lib/libreoffice/program:${LD_LIBRARY_PATH}"
$ export VGL_DISPLAY="egl"
$ export VGL_REFRESHRATE="$REFRESH"
$ export DISPLAY=:1
Verify by running simulator
Hint
The first time you run it, the script will ask whether to download the compiled model from Hugging Face; just choose Y.
If you are using Xvfb, run the following command:
$ python -m minestudio.simulator.entry
If you are using VirtualGL, run the following command:
$ MINESTUDIO_GPU_RENDER=1 python -m minestudio.simulator.entry
If you see the following output, the installation is successful.
Speed Test Status:
Average Time: 0.03
Average FPS: 38.46
Total Steps: 50
Speed Test Status:
Average Time: 0.02
Average FPS: 45.08
Total Steps: 100
MineStudio Libraries Quickstart#
Click on the dropdowns for your desired library to get started:
Simulator: Customizable Minecraft Environment
Here is a minimal example of how to use the simulator:
from minestudio.simulator import MinecraftSim
sim = MinecraftSim(action_type="env")
obs, info = sim.reset()
for _ in range(100):
action = sim.action_space.sample()
obs, reward, terminated, truncated, info = sim.step(action)
sim.close()
Also, you can customize the environment by chaining multiple callbacks. Here is an example:
import numpy as np
from minestudio.simulator import MinecraftSim
from minestudio.simulator.callbacks import (
SpeedTestCallback,
RecordCallback,
SummonMobsCallback,
MaskActionsCallback,
RewardsCallback,
CommandsCallback,
FastResetCallback
)
sim = MinecraftSim(
action_type="env",
callbacks=[
SpeedTestCallback(50),
SummonMobsCallback([{'name': 'cow', 'number': 10, 'range_x': [-5, 5], 'range_z': [-5, 5]}]),
MaskActionsCallback(inventory=0, camera=np.array([0., 0.])),
RecordCallback(record_path="./output", fps=30),
RewardsCallback([{
'event': 'kill_entity',
'objects': ['cow', 'sheep'],
'reward': 1.0,
'identity': 'kill sheep or cow',
'max_reward_times': 5,
}]),
CommandsCallback(commands=[
'/give @p minecraft:iron_sword 1',
'/give @p minecraft:diamond 64',
]),
FastResetCallback(
biomes=['mountains'],
random_tp_range=1000,
)
]
)
obs, info = sim.reset()
print(sim.action_space)
for i in range(100):
action = sim.action_space.sample()
obs, reward, terminated, truncated, info = sim.step(action)
sim.close()
Data: Flexible Data Structures and Fast Dataloaders
Here is a minimal example to show how we load a trajectory from the dataset.
from minestudio.data import load_dataset
dataset = load_dataset(
mode='raw',
dataset_dirs=['6xx', '7xx', '8xx', '9xx', '10xx'],
enable_video=True,
enable_action=True,
frame_width=224,
frame_height=224,
win_len=128,
split='train',
split_ratio=0.9,
verbose=True
)
item = dataset[0]
print(item.keys())
You may see the output like this:
[08:14:15] [Kernel] Driver video load 15738 episodes.
[08:14:15] [Kernel] Driver action load 15823 episodes.
[08:14:15] [Kernel] episodes: 15655, frames: 160495936.
dict_keys(['text', 'timestamp', 'episode', 'progress', 'env_action', 'agent_action', 'env_prev_action', 'agent_prev_action', 'image', 'mask'])
Hint
Please note that the dataset_dirs
parameter here is a list that can contain multiple dataset directories. In this example, we have loaded five dataset directories.
If an element in the list is one of 6xx
, 7xx
, 8xx
, 9xx
, or 10xx
, the program will automatically download it from Hugging Face, so please ensure your network connection is stable and you have enough storage space.
If an element in the list is a directory like /nfs-shared/data/contractors/dataset_6xx
, the program will load data directly from that directory.
You can also mix the two types of elements in the list.
Learn more about Raw Dataset
Alternatively, you can also load trajectories that have specific events, for example, loading all trajectories that contain the kill entity
event.
from minestudio.data import load_dataset
dataset = load_dataset(
mode='event',
dataset_dirs=['7xx'],
enable_video=True,
enable_action=True,
frame_width=224,
frame_height=224,
win_len=128,
split='train',
split_ratio=0.9,
verbose=True,
event_regex='minecraft.kill_entity:.*'
)
item = dataset[0]
print(item.keys())
You may see the output like this:
[08:19:14] [Kernel] Driver video load 4617 episodes.
[08:19:14] [Kernel] Driver action load 4681 episodes.
[08:19:14] [Kernel] episodes: 4568, frames: 65291168.
[08:19:14] [Event Kernel] Number of loaded events: 58.
[08:19:14] [Event Dataset] Regex: minecraft.kill_entity:.*, Number of events: 58, number of items: 19652
dict_keys(['text', 'env_action', 'agent_action', 'env_prev_action', 'agent_prev_action', 'image', 'mask'])
Learn more about Event Dataset
Models: Policy Template and Baselines
Here is an example that shows how to load the OpenAI’s VPT policy in the Minecraft environment.
from minestudio.simulator import MinecraftSim
from minestudio.simulator.callbacks import RecordCallback
from minestudio.models import load_vpt_policy, VPTPolicy
# load the policy from the local model files
policy = load_vpt_policy(
model_path="/path/to/foundation-model-2x.model",
weights_path="/path/to/foundation-model-2x.weights"
).to("cuda")
# or load the policy from the Hugging Face model hub
policy = VPTPolicy.from_pretrained("CraftJarvis/MineStudio_VPT.rl_from_early_game_2x").to("cuda")
policy.eval()
env = MinecraftSim(
obs_size=(128, 128),
callbacks=[RecordCallback(record_path="./output", fps=30, frame_type="pov")]
)
memory = None
obs, info = env.reset()
for i in range(1200):
action, memory = policy.get_action(obs, memory, input_shape='*')
obs, reward, terminated, truncated, info = env.step(action)
env.close()
Hint
In this example, the recorded video will be saved in the ./output
directory.
Offline: Pre-Training Policy with Offline Data
Tutorial: Fine-tuning VPT to a Hunter
Fine-tune a VPT policy in MineStudio is really simple.
The following code snippet shows how to finetune a VPT policy to hunt animals in Minecraft using offline data.
Import some dependencies:
import lightning as L from lightning.pytorch.loggers import WandbLogger from lightning.pytorch.callbacks import LearningRateMonitor # below are MineStudio dependencies from minestudio.data import MineDataModule from minestudio.offline import MineLightning from minestudio.models import load_vpt_policy, VPTPolicy from minestudio.offline.mine_callbacks import BehaviorCloneCallback from minestudio.offline.lightning_callbacks import SmartCheckpointCallback, SpeedMonitorCallback
Configure the policy model and the training process:
policy = VPTPolicy.from_pretrained("CraftJarvis/MineStudio_VPT.foundation_model_2x") mine_lightning = MineLightning( mine_policy=policy, learning_rate=0.00004, warmup_steps=2000, weight_decay=0.000181, callbacks=[BehaviorCloneCallback(weight=0.01)] )
Configure the data module that contains all the
kill_entity
trajectory segments:episode_continuous_batch = True mine_data = MineDataModule( data_params=dict( mode='event', dataset_dirs=['10xx'], win_len=128, frame_width=128, frame_height=128, event_regex="minecraft.kill_entity:.*" bias=16, min_nearby=64, ) batch_size=8, num_workers=8, prefetch_factor=4, split_ratio=0.9, shuffle_episodes=True, episode_continuous_batch=episode_continuous_batch, )
Warning
If
episode_continuous_batch=True
, then the dataloader will automatically use our distributed batch sampler. When configuring thetrainer
, we need to setuse_distributed_sampler=False
.Configure the
trainer
with your preferred PyTorch Lightning callbacks:L.Trainer( logger=WandbLogger(project="minestudio-vpt"), devices=8, precision="bf16", strategy='ddp_find_unused_parameters_true', use_distributed_sampler=not episode_continuous_batch, gradient_clip_val=1.0, callbacks=[ LearningRateMonitor(logging_interval='step'), SpeedMonitorCallback(), SmartCheckpointCallback( dirpath='./weights', filename='weight-{epoch}-{step}', save_top_k=-1, every_n_train_steps=2000, save_weights_only=True, ), SmartCheckpointCallback( dirpath='./checkpoints', filename='ckpt-{epoch}-{step}', save_top_k=1, every_n_train_steps=2000+1, save_weights_only=False, ) ] ).fit(model=mine_lightning, datamodule=mine_data)
Online: Finetuning Policy via Online Interaction
We provide a simple example in online/run
. You can fine-tune VPT to complete the task of killing cows by directly running:
cd online/run
bash run.sh
Specifically, this process includes several important configurations:
Policy Generator
which does not accept parameters and directly returns MinePolicy
.As an example:
def policy_generator():
from minestudio.models.openai_vpt.body import load_openai_policy
model_path = 'pretrained/foundation-model-2x.model'
weights_path = 'pretrained/bc-from-early-game-2x.weights'
policy = load_openai_policy(model_path, weights_path)
return policy
Environment Generator
which does not accept parameters and directly returns MinePolicy
. As an example:
def env_generator():
from minestudio.simulator import MinecraftSim
from minestudio.simulator.callbacks import (
SummonMobsCallback,
MaskActionsCallback,
RewardsCallback,
CommandsCallback,
JudgeResetCallback,
FastResetCallback
)
sim = MinecraftSim(
obs_size=(128, 128),
preferred_spawn_biome="plains",
action_type = "agent",
timestep_limit=1000,
callbacks=[
SummonMobsCallback([{'name': 'cow', 'number': 10, 'range_x': [-5, 5], 'range_z': [-5, 5]}]),
MaskActionsCallback(inventory=0),
RewardsCallback([{
'event': 'kill_entity',
'objects': ['cow'],
'reward': 1.0,
'identity': 'chop_tree',
'max_reward_times': 30,
}]),
CommandsCallback(commands=[
'/give @p minecraft:iron_sword 1',
'/give @p minecraft:diamond 64',
'/effect @p 5 9999 255 true',
]),
FastResetCallback(
biomes=['plains'],
random_tp_range=1000,
),
JudgeResetCallback(600),
]
)
return sim
Config which provide the hyper-parameters for online training:
online_dict = {
"trainer_name": "PPOTrainer",
"detach_rollout_manager": True,
"rollout_config": {
"num_rollout_workers": 2,
"num_gpus_per_worker": 1.0,
"num_cpus_per_worker": 1,
"fragment_length": 256,
"to_send_queue_size": 6,
"worker_config": {
"num_envs": 12,
"batch_size": 6,
"restart_interval": 3600, # 1h
"video_fps": 20,
"video_output_dir": "output/videos",
},
"replay_buffer_config": {
"max_chunks": 4800,
"max_reuse": 2,
"max_staleness": 2,
"fragments_per_report": 40,
"fragments_per_chunk": 1,
"database_config": {
"path": "output/replay_buffer_cache",
"num_shards": 8,
},
},
"episode_statistics_config": {},
},
"train_config": {
"num_workers": 2,
"num_gpus_per_worker": 1.0,
"num_iterations": 4000,
"vf_warmup": 0,
"learning_rate": 0.00002,
"anneal_lr_linearly":
"weight_decay": 0.04,
"adam_eps": 1e-8,
"batch_size_per_gpu": 1,
"batches_per_iteration": 200,
"gradient_accumulation": 10,
"epochs_per_iteration": 1,
"context_length": 64,
"discount": 0.999,
"gae_lambda": 0.95,
"ppo_clip": 0.2,
"clip_vloss": False,
"max_grad_norm": 5,
"zero_initial_vf": True,
"ppo_policy_coef": 1.0,
"ppo_vf_coef": 0.5,
"kl_divergence_coef_rho": 0.2,
"entropy_bonus_coef": 0.0,
"coef_rho_decay": 0.9995,
"log_ratio_range": 50,
"normalize_advantage_full_batch": True,
"use_normalized_vf": True,
"num_readers": 4,
"num_cpus_per_reader": 0.1,
"prefetch_batches": 2,
"save_interval": 10,
"keep_interval": 40,
"record_video_interval": 2,
"fix_decoder": False,
"resume": None,
"resume_optimizer": True,
"save_path": "output"
},
"logger_config": {
"project": "minestudio_online",
"name": "bow_cow"
},
}
online_config = OmegaConf.create(online_dict)
After preparing all the above content, run:
from minestudio.online.rollout.start_manager import start_rolloutmanager
from minestudio.online.trainer.start_trainer import start_trainer
start_rolloutmanager(policy_generator, env_generator, online_cfg)
start_trainer(policy_generator, env_generator, online_cfg)
to start online training.
We use wandb
to log and monitor the progress of the run. The corresponding parameters passed to wandb
are in config.logger_config
. When save_path
is None
, the checkpoint will be saved to Ray’s working directory at ~/ray_results
.
Inference: Parallel Inference and Record Demonstrations
Here is a minimal example of how to use the inference framework:
import ray
from minestudio.inference import EpisodePipeline, MineGenerator, InfoBaseFilter
from functools import partial
from minestudio.models import load_vpt_policy
from minestudio.simulator import MinecraftSim
if __name__ == '__main__':
ray.init()
env_generator = partial(
MinecraftSim,
obs_size = (128, 128),
preferred_spawn_biome = "forest",
) # generate the environment
agent_generator = lambda: VPTPolicy.from_pretrained("CraftJarvis/MineStudio_VPT.rl_from_early_game_2x") # generate the agent
worker_kwargs = dict(
env_generator = env_generator,
agent_generator = agent_generator,
num_max_steps = 12000, # provide the maximum number of steps
num_episodes = 2, # provide the number of episodes for each worker
tmpdir = "./output",
image_media = "h264",
) # provide the worker kwargs
pipeline = EpisodePipeline(
episode_generator = MineGenerator(
num_workers = 8, # the number of workers
num_gpus = 0.25, # the number of gpus
max_restarts = 3, # the maximum number of restarts for failed workers
**worker_kwargs,
),
episode_filter = InfoBaseFilter(
key = "mine_block",
val = "diamond_ore",
num = 1,
), # InfoBaseFilter will label episodes mine more than 1 diamond_ore
)
summary = pipeline.run()
print(summary)
Benchmark: Benchmarking and Evaluation
Papers#
Our libraries directly support models from the following papers: