Benchmark#

This tutorial provides an overview of the codebase for automating and batch-testing tasks in the MineStudio benchmark. It includes the structure, purpose, and main functionalities of the framework.


Overview#

The MineStudio benchmark is a comprehensive framework for evaluating agent performance across a wide range of Minecraft-based tasks. It offers the following key features:

  • Diverse Task Support: Evaluate agents on tasks such as building, mining, crafting, collecting, and more.

  • Game Mode Variability: Includes both simple and hard game modes to test agents under varying levels of difficulty.

  • Batch Task Execution: Run multiple tasks simultaneously and record task completion videos for analysis.

  • VLM-Based Evaluation: Leverage Vision-Language Models to analyze and score task videos.


How to Use#

  1. Run Batch Tests:

    • Use test_pipeline.py or test.py to execute tasks.

    • Ensure your environment supports GPU acceleration for optimal performance.

  2. Analyze Results:

    • Review generated videos and metrics in the eval_video folder.

    • Use criteria files to score and validate task completion.