Quick benchmark#
This document outlines the structure and workflow of the benchmark module, a comprehensive framework designed for evaluating agent performance in Minecraft environments.
Code structure#
Below is the structure of the benchmark module, which organizes task definitions and testing scripts for evaluation:
benchmark/
├── task_configs/
│ ├── simple/
│ │ └── Task definitions for simple tasks.
│ ├── hard/
│ └── Task definitions for complex tasks.
├── test_pipeline.py
│ └── Example script for parallelized and batched task execution.
├── test.py
│ └── Example script for running batch tests.
├── utility/
│ └── Functionality for input reading and callback features.
Workflow Overview#
Task Configuration#
Tasks are defined in YAML files located in the task_configs/
directory. under the appropriate difficulty subdirectory (e.g., simple/
or hard/
).
Example YAML:
custom_init_commands:
- /give @s minecraft:water_bucket 3
- /give @s minecraft:stone 64
- /give @s minecraft:dirt 64
- /give @s minecraft:shovel{Enchantments:[{id:"minecraft:efficiency",lvl:1}]} 1
text: Build a waterfall in your Minecraft world.
Key Elements of the YAML File:
custom_init_commands
:Specifies commands to initialize the Minecraft environment for the task.
Examples:
/give @s minecraft:water_bucket 3
: Gives the agent three water buckets./give @s minecraft:stone 64
: Provides a stack of stone blocks.
These commands ensure the agent has the necessary tools and resources to perform the task.
text
:Provides a natural language description of the task.
Example:
"Build a waterfall in your Minecraft world."
Running Tests#
Individual or Small-Scale Tests:
Use
test.py
for running specific tasks or testing new configurations.$ python test.py
Batch Testing with Parallelization:
Use
test_pipeline.py
for executing tasks in parallel.$ python test_pipeline.py
An Example: test.py
#
This script demonstrates how to evaluate tasks using YAML-based configurations. Below is an outline of its workflow:
Task Setup:
Load configuration files from
task_configs/simple
.Parse YAML files into callbacks using
convert_yaml_to_callbacks
.
Environment Initialization:
Use
MinecraftSim
to create a simulation environment.Add callbacks:
RecordCallback
: Saves video frames for evaluation.CommandsCallback
: Initializes the environment.TaskCallback
: Implements task-specific behavior.
Task Execution:
Reset the environment and run the task for multiple steps.
Save observations, actions, and outputs for analysis.
Result Storage:
Videos and logs are saved in the
output/
directory.
commands_callback, task_callback = convert_yaml_to_callbacks("./task_configs/simple/build_waterfall.yaml")
env = MinecraftSim(
obs_size=(128, 128),
callbacks=[
RecordCallback(record_path=f"./output/", fps=30, frame_type="pov"),
CommandsCallback(commands_callback),
TaskCallback(task_callback),
]
)
policy = load_vpt_policy(
model_path="/nfs-shared/jarvisbase/pretrained/foundation-model-2x.model",
weights_path="/nfs-shared/jarvisbase/pretrained/foundation-model-2x.weights"
).to("cuda")
obs, info = env.reset()
for i in range(12000):
action, memory = policy.get_action(obs, memory, input_shape='*')
obs, reward, terminated, truncated, info = env.step(action)
env.close()