# Quick benchmark

This document outlines the structure and workflow of the benchmark module, a comprehensive framework designed for evaluating agent performance in Minecraft environments.

## Code structure

Below is the structure of the benchmark module, which organizes task definitions and testing scripts for evaluation:

```plaintext
benchmark/
    ├── task_configs/ 
    │   ├── simple/ 
    │   │   └── Task definitions for simple tasks.
    │   ├── hard/
    │       └── Task definitions for complex tasks.
    ├── test_pipeline.py
    │   └── Example script for parallelized and batched task execution.
    ├── test.py
    │   └── Example script for running batch tests.
    ├── utility/
    │   └── Functionality for input reading and callback features.
```


## Workflow Overview

### Task Configuration

Tasks are defined in YAML files located in the `task_configs/` directory. under the appropriate difficulty subdirectory (e.g., `simple/` or `hard/`).
 Example YAML:

```yaml
custom_init_commands: 
- /give @s minecraft:water_bucket 3
- /give @s minecraft:stone 64
- /give @s minecraft:dirt 64
- /give @s minecraft:shovel{Enchantments:[{id:"minecraft:efficiency",lvl:1}]} 1
text: Build a waterfall in your Minecraft world.
```

Key Elements of the YAML File:

1. **`custom_init_commands`**:
   - Specifies commands to initialize the Minecraft environment for the task.
   - Examples:
     - `/give @s minecraft:water_bucket 3`: Gives the agent three water buckets.
     - `/give @s minecraft:stone 64`: Provides a stack of stone blocks.
   - These commands ensure the agent has the necessary tools and resources to perform the task.

2. **`text`**:
   - Provides a natural language description of the task.
   - Example: `"Build a waterfall in your Minecraft world."`


### Running Tests

1. **Individual or Small-Scale Tests**:
   - Use `test.py` for running specific tasks or testing new configurations.
     ```console
     $ python test.py
     ```

2. **Batch Testing with Parallelization**:
   - Use `test_pipeline.py` for executing tasks in parallel.
     ```console
     $ python test_pipeline.py
     ```
---

#### An Example: `test.py`

This script demonstrates how to evaluate tasks using YAML-based configurations. Below is an outline of its workflow:

1. **Task Setup**:
   - Load configuration files from `task_configs/simple`.
   - Parse YAML files into callbacks using `convert_yaml_to_callbacks`.

2. **Environment Initialization**:
   - Use `MinecraftSim` to create a simulation environment.
   - Add callbacks:
     - `RecordCallback`: Saves video frames for evaluation.
     - `CommandsCallback`: Initializes the environment.
     - `TaskCallback`: Implements task-specific behavior.

3. **Task Execution**:
   - Reset the environment and run the task for multiple steps.
   - Save observations, actions, and outputs for analysis.

4. **Result Storage**:
   - Videos and logs are saved in the `output/` directory.


```python
commands_callback, task_callback = convert_yaml_to_callbacks("./task_configs/simple/build_waterfall.yaml")
env = MinecraftSim(
   obs_size=(128, 128), 
   callbacks=[
         RecordCallback(record_path=f"./output/", fps=30, frame_type="pov"),
         CommandsCallback(commands_callback),
         TaskCallback(task_callback),
   ]
)
policy = load_vpt_policy(
   model_path="/nfs-shared/jarvisbase/pretrained/foundation-model-2x.model",
   weights_path="/nfs-shared/jarvisbase/pretrained/foundation-model-2x.weights"
).to("cuda")

obs, info = env.reset()
for i in range(12000):
   action, memory = policy.get_action(obs, memory, input_shape='*')
   obs, reward, terminated, truncated, info = env.step(action)
env.close()
```