# Data We design a trajectory structure for storing Minecraft data. Based on this data structure, users are able to store and retrieve arbitray trajectory segment in an efficient way. ```{toctree} :caption: MineStudio Data dataset-raw dataset-event visualization convertion ``` ## Quick Start ````{include} quick-data.md ```` ## Data Structure We classify and save the data according to its corresponding modality, with each modality's data being a sequence over time. Sequences from different modalities can be aligned in chronological order. For example, the "action" modality data stores the mouse and keyboard actions taken at each time step of the trajectory; the "video" modality data stores the observations returned by the environment at each time step of the trajectory. ```{note} The data of different modalities is stored independently. The benefits are: (1) Users can selectively read data from different modalities according to their requirements; (2) Users are easily able to add new modalities to the dataset without affecting the existing data. ``` For the sequence data of each modality, we store it in segments, with each segment having a fixed length (e.g., 32), which facilitates the reading and storage of the data. ```{note} For video data, the efficiency of random access is usually low because decoding is required during the reading process. An extreme case would be to save it as individual images, which would allow for high read efficiency but take up a large amount of storage space. We adopt a compromise solution by saving the video data in video segments, which allows for relatively high read efficiency while not occupying too much storage space. When user wants to read a sequence of continuous frames, we only need to retrieve the corresponding segments and decode them. ``` ```{image} ./read_video_fig.png :width: 80% ``` ````{dropdown} Learn more about the details Segmented sequence data is stored in individual [lmdb](https://lmdb.readthedocs.io/en/release/) files, each of which contains the following metadata: ```python { "__num_episodes__": int, # the total number of episodes in this lmdb file "__num_total_frames__": int, # the total number of frames in this lmdb file "__chunk_size__": int, # the length of each segment (e.g. 32) "__chunk_infos__": dict # save the information of the episode part in this lmdb file, e.g. the start and end index, episode name. } ``` Once you know the episode name and which segment you want to read, you can identify the corresponding segment bytes in the lmdb file and decode it to get the data. ```python with lmdb_handler.begin() as txn: key = str((episode_idx, chunk_id)).encode() chunk_bytes = txn.get(key) ``` ```{hint} In fact, you don't need to worry about these at all, as we have packaged these operations for you. You just need to call corresponding API. The class that is responsible for managing these details is `minestudio.data.minecraft.core.LMDBDriver`. ``` With ``LMDBDriver``, you can do these operations to a lmdb file: - Get the trajectory list: ```python trajectory_list = lmdb_driver.get_trajectory_list() ``` - Get the total frames of several trajectories: ```python lmdb_driver.get_total_frames([ "trajectory_1", "trajectory_2", "trajectory_3" ]) ``` - Read a sequence of frames from a trajectory: ```python frames, mask = lmdb_driver.read_frames( eps="trajectory_1", start_frame=11, win_len=33, merge_fn=merge_fn, extract_fn=extract_fn, padding_fn=padding_fn, ) ``` ```{note} ``merge_fn``, ``extract_fn``, and ``padding_fn`` are functions that are used to process the data and are specific to the data modality. ``` ```` ### Built-in Modalities We provide the following built-in modalities for users to store data: | Modality | Description | Data Format | | --- | --- | --- | | video | Observations returned by the environment | np.ndarry | | action | Mouse and keyboard actions | Dict | | contractor info | Information of the contractor | Dict | | segment info | Information of the segment | Dict | ````{admonition} Video and Segmentation Visualization :class: dropdown admonition-youtube ```{youtube} QYBUxus3esI ``` ````