Data Processing Callbacks#
MineStudio employs a flexible callback mechanism to handle the loading, conversion, and visualization of data across different modalities. This design aims to achieve separation of concerns, decoupling data processing logic from the core data loading framework. Users can easily extend the system’s functionality for custom raw data formats or new data modalities by implementing specific callback classes, without needing to modify the core code.
1. Design Philosophy#
The core advantages of the callback mechanism are:
Decoupling: Separates the processing logic for specific modalities (e.g., decoding, transformation, augmentation, visualization) from generic data loaders (
RawDataset
,EventDataset
) and data conversion tools (ConvertManager
). This makes the core framework more versatile and stable.Extensibility: Users can easily add support for new data modalities or custom data formats by simply implementing the corresponding callback interfaces.
Customizability: Users can tailor the processing of existing modalities to their specific needs, such as modifying data augmentation pipelines, changing how visual information is presented, or adjusting the details of data conversion.
Code Reusability: Common callback logic (like LMDB reading/writing) can be implemented in base classes, while specific modality callbacks focus on their unique processing tasks.
MineStudio defines three main base callback classes, serving runtime data loading, raw data format conversion, and data visualization respectively:
ModalKernelCallback
: Defines how to process specific modality data read from LMDB during data loading (e.g., within aDataset
’s__getitem__
).ModalConvertCallback
: Defines how to convert user’s raw data files (e.g.,.mp4
videos,.jsonl
action sequences) into the LMDB format used by MineStudio.DrawFrameCallback
: Defines how to draw modality-specific information onto video frames during data visualization.
2. Detailed Explanation of Core Callback Types#
The following provides a detailed introduction to these three core callback types and the key methods that need to be implemented.
2.1. ModalKernelCallback
#
ModalKernelCallback
(defined in minestudio.data.minecraft.callbacks.callback.ModalKernelCallback
) is used by ModalKernel
and KernelManager
during the data loading and processing pipeline. It is responsible for handling single data chunks or sequences of data chunks read from LMDB and transforming them into the format required for model training.
Main Responsibilities:
Decode raw byte data read from LMDB.
Merge multiple data chunks (if necessary).
Slice data according to a given time window and frame skipping parameters.
Pad data to meet fixed length requirements.
Perform data post-processing, such as data augmentation, format conversion, etc.
Key Methods to Implement/Override:
__init__(self, read_bias: int = 0, win_bias: int = 0)
:Constructor.
read_bias
andwin_bias
are used to adjust the starting position of the window when reading data.
name(self) -> str
(property):Returns the name of the modality handled by this callback (e.g.,
"image"
,"action"
). This is typically inferred automatically from the class name.
filter_dataset_paths(self, dataset_paths: List[Union[str, Path]]) -> List[Path]
:(Optional) Filters the list of dataset paths provided to
ModalKernel
. By default, it looks for subdirectories matching the modality name.
do_decode(self, chunk: bytes, **kwargs) -> Any
:[Core] Decodes a single raw byte data chunk
chunk
read from LMDB into its original format (e.g.,np.ndarray
for images,dict
for actions).
do_merge(self, chunk_list: List[bytes], **kwargs) -> Union[List, Dict]
:[Core] Merges multiple decoded data chunks
chunk_list
into a single data structure. This is crucial for sequential data that spans multiple chunks.
do_slice(self, data: Union[List, Dict], start: int, end: int, skip_frame: int, **kwargs) -> Union[List, Dict]
:[Core] Extracts a subsequence from the merged data
data
based onstart
(start frame index),end
(end frame index), andskip_frame
(frame skip count).
do_pad(self, data: Union[List, Dict], pad_len: int, pad_pos: Literal["left", "right"], **kwargs) -> Tuple[Union[List, Dict], np.ndarray]
:[Core] If the sliced data length is less than
pad_len
, pads it at the position specified bypad_pos
("left"
or"right"
). Also returns a mask indicating which frames are valid and which are padded.
do_postprocess(self, data: Dict, **kwargs) -> Dict
:(Optional) Performs post-processing on the finally processed data, such as applying data augmentations, converting to PyTorch tensors, etc.
2.2. ModalConvertCallback
#
ModalConvertCallback
(defined in minestudio.data.minecraft.callbacks.callback.ModalConvertCallback
) is used by ConvertManager
and ConvertWorker
during the data preprocessing stage. It is responsible for converting user-provided raw trajectory data (e.g., video files, action logs) into MineStudio’s LMDB database format.
Main Responsibilities:
Discover and load raw data files from specified input directories.
Convert raw data file content into a sequence of byte chunks suitable for storage in LMDB.
(Optional) Generate frame skip flags for skipping certain frames during conversion.
Key Methods to Implement/Override:
__init__(self, input_dirs: List[str], chunk_size: int)
:Constructor.
input_dirs
is a list of directories containing raw data files, andchunk_size
defines the number of frames (or other units) contained in each LMDB data chunk.
load_episodes(self) -> Dict[str, List[Tuple[str, str]]]
:[Core] Scans
self.input_dirs
, discovers all raw data files, and organizes them into a dictionary. The dictionary keys are episode IDs, and the values are lists of file paths (or other metadata) associated with that episode.The returned structure is typically
OrderedDict[eps_id, List[Tuple[modal_name, file_path]]]
or similar, indicating which modality files each episode contains.
do_convert(self, eps_id: str, skip_frames: List[List[bool]], modal_file_path: List[Union[str, Path]]) -> Tuple[List, List]
:[Core] Performs the actual conversion operation for a single episode (
eps_id
) and its corresponding raw file paths (modal_file_path
).skip_frames
is an optional list of frame skip flags.This method should read the raw files, process their content and split it into multiple data chunks (each corresponding to
chunk_size
frames), and then encode each chunk into a byte string.Returns a tuple containing
(key_list, chunk_list)
.key_list
are the keys for each chunk (usually frame numbers or timestamps), andchunk_list
are the corresponding encoded byte data chunks.
gen_frame_skip_flags(self, file_name: str) -> List[bool]
:(Optional) Generates a boolean list for a given raw data file, indicating which frames should be skipped during conversion. Can return
None
or a list of allFalse
if no frames need to be skipped.
2.3. DrawFrameCallback
#
DrawFrameCallback
(defined in minestudio.data.minecraft.callbacks.callback.DrawFrameCallback
) is used during data visualization to draw modality-specific information onto video frames. For example, displaying action data as text or overlaying segmentation masks on images.
Main Responsibilities:
Receive a batch of video frames and corresponding modality data.
Draw the modality data onto the corresponding video frames in graphical or textual form.
Key Methods to Implement/Override:
draw_frames(self, frames: Union[np.ndarray, List], infos: Dict, sample_idx: int, **kwargs) -> np.ndarray
:[Core] This method receives a batch of video frames
frames
(usually a NumPy array of shape(B, T, H, W, C)
or(T, H, W, C)
, where B is batch size, T is sequence length) and a dictionaryinfos
containing the corresponding modality data.sample_idx
indicates the current sample in the batch.The keys of the
infos
dictionary are modality names (e.g.,"action"
,"segmentation"
), and the values are the data for that modality.This method needs to iterate through the frames and corresponding
infos
, drawing the information onto the frames (e.g., using OpenCV drawing functions).Returns the video frames with the information drawn on them (NumPy array).
3. Examples of Built-in Callbacks#
MineStudio provides a series of built-in callback implementations for common Minecraft data modalities. These implementations are located in the minestudio.data.minecraft.callbacks
directory.
3.1. Image Callbacks#
ImageKernelCallback
(inimage.py
):Purpose: Processes image data loaded from LMDB.
do_decode
: Decodes byte strings into NumPy image arrays.do_merge
: Concatenates a list of image chunks into a video sequence (NumPy array(T, H, W, C)
).do_slice
: Extracts a specified range of frames.do_pad
: Pads the frame sequence.do_postprocess
: Optionally appliesVideoAugmentation
for data augmentation and converts images to PyTorch tensors.Parameters:
frame_width
,frame_height
,enable_video_aug
(whether to enable video augmentation),image_format
(e.g., “CHW”, “HWC”).
ImageConvertCallback
(inimage.py
):Purpose: Converts raw video files (e.g.,
.mp4
) or image sequence directories to LMDB format.load_episodes
: Scans input directories for video files.do_convert
: Uses OpenCV to read video frames, encoding everychunk_size
frames into a JPEG byte string (or other format) as one LMDB data chunk.Parameters:
input_dirs
,chunk_size
,thread_pool
(number of threads for parallel video encoding).
3.2. Action Callbacks#
ActionKernelCallback
(inaction.py
):Purpose: Processes action data (usually a sequence of dictionaries) loaded from LMDB.
do_decode
: Decodes JSON-encoded byte strings into action dictionaries.do_merge
: Merges a list of action dictionaries.do_slice
: Extracts a specified range of actions.do_pad
: Pads the action sequence (usually with zero actions or the last valid action).do_postprocess
: Optionally includes the previous frame’s action (enable_prev_action
).Parameters:
enable_prev_action
,prev_action_pad_val
,read_bias
,win_bias
.
VectorActionKernelCallback
(inaction.py
, inherits fromActionKernelCallback
):Purpose: Specifically handles vectorized action representations.
do_postprocess
: Builds uponActionKernelCallback
to convert dictionary-form actions into fixed-dimension vectors, or vice-versa.Provides
vector_to_action
andaction_to_vector
methods.Parameters:
action_chunk_size
(similar tochunk_size
, but specific to context length for action vectorization),return_type
(“vector” or “dict”).
ActionDrawFrameCallback
(inaction.py
):Purpose: Draws textual representations of action information onto video frames.
draw_frames
: Iterates through action data, formats action key-value pairs for each timestep into strings, and draws them onto frames at specified positions using OpenCV’sputText
.Parameters:
start_point
(starting coordinates for text drawing).
ActionConvertCallback
(inaction.py
):Purpose: Converts raw action files (typically
.jsonl
files, where each line is a JSON object representing one frame’s actions) to LMDB.load_episodes
: Scans input directories for.jsonl
files.do_convert
: Reads.jsonl
files, and everychunk_size
actions are JSON-encoded and stored as one LMDB data chunk.Parameters:
input_dirs
,chunk_size
,action_transformer_kwargs
(parameters for initializingActionTransformer
, allowing action transformation and filtering).
3.3. MetaInfo Callbacks#
MetaInfoKernelCallback
(inmeta_info.py
):Purpose: Processes game metadata, such as player position, health, time, etc.
do_decode
: Decodes JSON-encoded byte strings into metainfo dictionaries.do_merge
,do_slice
,do_pad
: Similar toActionKernelCallback
for handling sequences of dictionaries.Parameters: No special constructor parameters, relies on the base class.
MetaInfoDrawFrameCallback
(inmeta_info.py
):Purpose: Draws metainfo onto video frames.
draw_frames
: Formats specific key-value pairs from the metainfo dictionary and draws them onto frames.Parameters:
start_point
.
MetaInfoConvertCallback
(inmeta_info.py
):Purpose: Converts raw metainfo files (typically
.jsonl
or.pkl
files containing metainfo dictionaries) to LMDB.load_episodes
: Scans input directories for metainfo files.do_convert
: Reads files, and everychunk_size
frames of metainfo are JSON-encoded and stored as one LMDB data chunk.Parameters:
input_dirs
,chunk_size
.
3.4. Segmentation Callbacks#
SegmentationKernelCallback
(insegmentation.py
):Purpose: Processes image segmentation mask data.
do_decode
: Decodes byte strings (possibly RLE-encoded masks or other compressed formats) into segmentation masks (NumPy arrays).do_merge
,do_slice
,do_pad
: Performs corresponding processing on sequences of segmentation masks.do_postprocess
: May include resizing masks, remapping class IDs, etc.Parameters:
frame_width
,frame_height
(target frame dimensions),seg_re_map
(class ID remapping dictionary).
SegmentationDrawFrameCallback
(insegmentation.py
):Purpose: Overlays segmentation masks or related information (like target points) onto video frames.
draw_frames
: Can draw segmentation masks with different colors or highlight specific objects.Parameters:
start_point
,draw_point
,draw_mask
,draw_event
,draw_frame_id
,draw_frame_range
, and a color listCOLORS
.
SegmentationConvertCallback
(insegmentation.py
):Purpose: Converts raw segmentation data files (e.g.,
.pkl
files containing RLE-encoded masks) to LMDB.load_episodes
: Scans input directories for segmentation data files.do_convert
: Reads raw segmentation data, encodes it (if necessary), and stores everychunk_size
frames of segmentation data as one LMDB data chunk.Parameters:
input_dirs
,chunk_size
.
Through these callbacks, MineStudio provides a powerful and flexible framework for handling various Minecraft data. Users can modify these built-in callbacks or create their own callback implementations for entirely new data types.