Convertion#

We provide a convertion script that allows users to convert the raw data to the MineStudio format.

Warning

It is essential to perform the conversion to ensure that our engineering efforts on the data can be effectively utilized.

Prepare Raw Trajectories#

The raw data should contain video and action directories. The video contains a list of mp4-format video files, and the action contains a list of dict-format action files. The video and action files should have the same length and the same name.

  • /path/to/raw_episodes/videos directory:

    video = [
        'episode_0001.mp4',
        'episode_0002.mp4',
        'episode_0003.mp4',
        ...
    ]
    
  • /path/to/raw_episodes/actions directory:

    action = [
        'episode_0001.pkl',
        'episode_0002.pkl',
        'episode_0003.pkl',
        ...
    ]
    

    Note

    Each action file is a dict object that contains the following keys:

    dict_keys(['back', 'drop', 'forward', 'hotbar.1', 'hotbar.2', 'hotbar.3', 'hotbar.4', 'hotbar.5', 'hotbar.6', 'hotbar.7', 'hotbar.8', 'hotbar.9', 'inventory', 'jump', 'left', 'right', 'sneak', 'sprint', 'camera', 'attack', 'use'])
    

    The shape of attack: (4376,)

    The shape of camera: (4376, 2)

Convert Raw Trajectories to MineStudio format#

  • convert action to MineStudio format:

    python -m minestudio.data.minecraft.tools.convert_lmdb \
           --num-workers 4 \
           --input-dir '/path/to/raw_episodes/actions' \
           --action-dir '/path/to/raw_episodes/actions' \
           --output-dir '/path/to/output/dataset' \
           --source-type 'action'
    
  • convert video to MineStudio format:

    python -m minestudio.data.minecraft.tools.convert_lmdb \
           --num-workers 4 \
           --input-dir '/path/to/raw_episodes/videos' \
           --action-dir '/path/to/raw_episodes/actions' \
           --output-dir '/path/to/output/dataset' \
           --source-type 'video'
    

Note

num-workers arguments specify the number of convertion workers. It also determines the number of the resulting MineStudio dataset files.

The resulting MineStudio dataset files will be stored in the /path/to/output/dataset directory.

tree /path/to/output/dataset
├── action
│   ├── action-1000 
│   │   ├── data.mdb
│   │   └── lock.mdb
│   ├── action-1500
│   │   ├── data.mdb
│   │   └── lock.mdb
│   ├── action-1904
│   │   ├── data.mdb
│   │   └── lock.mdb
│   └── action-500
│       ├── data.mdb
│       └── lock.mdb
└── video
    ├── video-1428  
    │   ├── data.mdb
    │   └── lock.mdb
    ├── video-1903
    │   ├── data.mdb
    │   └── lock.mdb
    ├── video-476
    │   ├── data.mdb
    │   └── lock.mdb
    └── video-952
        ├── data.mdb
        └── lock.mdb

Check the MineStudio-Format Dataset#

You can check the generated MineStudio dataset files using the following command:

from minestudio.data import load_dataset

dataset = load_dataset(
    mode='raw', 
    dataset_dirs=['/path/to/output/dataset'], 
    frame_width=224, 
    frame_height=224,
    win_len=128, 
    split='train', 
    split_ratio=0.9, 
    verbose=True,
)
item = dataset[0]
print(item.keys())