Convertion#
We provide a convertion script that allows users to convert the raw data to the MineStudio format.
Warning
It is essential to perform the conversion to ensure that our engineering efforts on the data can be effectively utilized.
Prepare Raw Trajectories#
The raw data should contain video
and action
directories. The video
contains a list of mp4-format
video files, and the action
contains a list of dict-format
action files. The video
and action
files should have the same length and the same name.
/path/to/raw_episodes/videos
directory:video = [ 'episode_0001.mp4', 'episode_0002.mp4', 'episode_0003.mp4', ... ]
/path/to/raw_episodes/actions
directory:action = [ 'episode_0001.pkl', 'episode_0002.pkl', 'episode_0003.pkl', ... ]
Note
Each action file is a
dict
object that contains the following keys:dict_keys(['back', 'drop', 'forward', 'hotbar.1', 'hotbar.2', 'hotbar.3', 'hotbar.4', 'hotbar.5', 'hotbar.6', 'hotbar.7', 'hotbar.8', 'hotbar.9', 'inventory', 'jump', 'left', 'right', 'sneak', 'sprint', 'camera', 'attack', 'use'])
The shape of
attack
:(4376,)
The shape of
camera
:(4376, 2)
Convert Raw Trajectories to MineStudio format#
convert
action
to MineStudio format:python -m minestudio.data.minecraft.tools.convert_lmdb \ --num-workers 4 \ --input-dir '/path/to/raw_episodes/actions' \ --action-dir '/path/to/raw_episodes/actions' \ --output-dir '/path/to/output/dataset' \ --source-type 'action'
convert
video
to MineStudio format:python -m minestudio.data.minecraft.tools.convert_lmdb \ --num-workers 4 \ --input-dir '/path/to/raw_episodes/videos' \ --action-dir '/path/to/raw_episodes/actions' \ --output-dir '/path/to/output/dataset' \ --source-type 'video'
Note
num-workers
arguments specify the number of convertion workers. It also determines the number of the resulting MineStudio dataset files.
The resulting MineStudio dataset files will be stored in the /path/to/output/dataset
directory.
tree /path/to/output/dataset
├── action
│ ├── action-1000
│ │ ├── data.mdb
│ │ └── lock.mdb
│ ├── action-1500
│ │ ├── data.mdb
│ │ └── lock.mdb
│ ├── action-1904
│ │ ├── data.mdb
│ │ └── lock.mdb
│ └── action-500
│ ├── data.mdb
│ └── lock.mdb
└── video
├── video-1428
│ ├── data.mdb
│ └── lock.mdb
├── video-1903
│ ├── data.mdb
│ └── lock.mdb
├── video-476
│ ├── data.mdb
│ └── lock.mdb
└── video-952
├── data.mdb
└── lock.mdb
Check the MineStudio-Format Dataset#
You can check the generated MineStudio dataset files using the following command:
from minestudio.data import load_dataset
dataset = load_dataset(
mode='raw',
dataset_dirs=['/path/to/output/dataset'],
frame_width=224,
frame_height=224,
win_len=128,
split='train',
split_ratio=0.9,
verbose=True,
)
item = dataset[0]
print(item.keys())