# Benchmark

This tutorial provides an overview of the codebase for automating and batch-testing tasks in the MineStudio benchmark. It includes the structure, purpose, and main functionalities of the framework.


```{toctree}
:caption: MineStudio Benchmark

quick-benchmark
automatic-evaluation
```

---

## Overview


The MineStudio benchmark is a comprehensive framework for evaluating agent performance across a wide range of Minecraft-based tasks. It offers the following key features:  

- **Diverse Task Support**: Evaluate agents on tasks such as building, mining, crafting, collecting, and more.  
- **Game Mode Variability**: Includes both simple and hard game modes to test agents under varying levels of difficulty.  
- **Batch Task Execution**: Run multiple tasks simultaneously and record task completion videos for analysis.  
- **VLM-Based Evaluation**: Leverage Vision-Language Models to analyze and score task videos.  

---

## How to Use

1. **Run Batch Tests**:
   - Use `test_pipeline.py` or `test.py` to execute tasks.
   - Ensure your environment supports GPU acceleration for optimal performance.

2. **Analyze Results**:
   - Review generated videos and metrics in the `eval_video` folder.
   - Use criteria files to score and validate task completion.