# Launching Online Training: The `run.py` Script The primary entry point for initiating an online training session within MineStudio is the `run.py` script, typically located at `minestudio/online/run/run.py`. This script serves as the orchestrator, setting up the environment and launching the distributed components necessary for the agent to learn and interact with the Minecraft world. ## Core Responsibilities The `run.py` script is meticulously designed with several key responsibilities: 1. **Configuration Management**: At its heart, the script is responsible for loading and interpreting the training configuration. This configuration dictates every aspect of the training session, from the specifics of the Minecraft environment to the neural network architecture of the policy and the various hyperparameters that guide the learning process. MineStudio leverages Hydra for sophisticated and flexible configuration management, allowing users to easily switch between different predefined setups (e.g., `gate_kl`, `another_setup`) or customize their own. 2. **Service Orchestration**: Once the configuration is loaded, `run.py` takes on the role of a conductor, initializing and starting the essential Ray actors that form the backbone of the online training pipeline. The two most critical actors are the `RolloutManager`, which oversees the collection of experience data from the environment, and the `Trainer` (e.g., `PPOTrainer`), which is responsible for optimizing the agent's policy based on the collected data. 3. **Training Lifecycle Initiation**: While the intricate, step-by-step training loop resides within the `Trainer` actor, `run.py` is the catalyst that sets this entire process in motion. It ensures all components are correctly initialized and interconnected before signaling the `Trainer` to begin its work. ## Execution Flow: A Step-by-Step Breakdown The execution of `minestudio/online/run/run.py` follows a logical sequence to ensure a smooth start to the training process: ### 1. Configuration Loading and Parsing The journey begins with loading the appropriate configuration: * A `config_name` (e.g., `"gate_kl"`) is typically provided as an argument or set as a default. This name directly corresponds to a Python configuration file nestled within the `minestudio/online/run/config/` directory. * The script dynamically imports the Python module associated with this `config_name` (for instance, `minestudio.online.run.config.gate_kl`). * From this imported module, several key components are extracted: * `env_generator`: A callable (usually a function or a class) that, when invoked, returns a new instance of the Minecraft environment (`MinecraftSim`). This allows for fresh environments to be created as needed by the rollout workers. * `policy_generator`: Similar to the `env_generator`, this is a callable that produces an instance of the agent's policy model (e.g., `MinePolicy`). * `online_dict`: A Python dictionary containing all the specific settings for the online training session. This dictionary is then seamlessly converted into an `OmegaConf` object (commonly named `online_cfg`), which provides a structured and powerful way to access configuration values. * For logging and reproducibility, the entire content of the chosen configuration file is often read into a string variable (e.g., `whole_config`). This allows the `Trainer` to save the exact configuration used for a particular training run. ```{note} The use of Hydra and OmegaConf provides a highly flexible system. Users can override configuration parameters directly from the command line, making experimentation and fine-tuning more accessible without needing to modify the core configuration files for every small change. ``` ### 2. Ray Initialization (Prerequisite) It's crucial to understand that before `run.py` can launch any Ray actors, the Ray environment itself must be initialized. This typically involves: * A call to `ray.init()`. This might be done with specific arguments, such as a `namespace` (e.g., "online"), to logically group actors and services within a Ray cluster. * This initialization step connects the script to an existing Ray cluster or starts a new one if running locally. In many production or research setups, helper scripts (like `start_headnode.sh`) or cluster management tools handle the setup of the Ray cluster. The `run.py` script then simply connects to this pre-existing infrastructure. ### 3. Launching the Rollout Manager With the configuration in place and Ray ready, the script proceeds to start the experience collection machinery: * The function `start_rolloutmanager(policy_generator, env_generator, online_cfg)` is invoked. * This function, typically residing in `minestudio.online.rollout.start_manager` (or a similar utility module), is tasked with: * Creating and launching the `RolloutManager` as a Ray actor. * The `RolloutManager`, upon its own initialization, will use the provided `policy_generator`, `env_generator`, and the relevant sections of `online_cfg` (specifically `online_cfg.rollout_config` and `online_cfg.env_config`) to configure and spawn its distributed fleet of `RolloutWorkerWrapper` actors. These wrappers, in turn, manage individual `RolloutWorker` instances and `EnvWorker` processes. ### 4. Launching the Trainer Once the `RolloutManager` is operational and ready to supply data, the `Trainer` is brought online: * The function `start_trainer(policy_generator, env_generator, online_cfg, whole_config)` is called. * This function, often found in `minestudio.online.trainer.start_trainer`, handles the setup of the learning component: * It configures the Ray Train environment. This includes specifying details like the number of training workers (if distributed training is used), GPU allocation per worker, and other scaling parameters, usually derived from `online_cfg.train_config.scaling_config`. * It instantiates the chosen trainer class. The specific class (e.g., `PPOTrainer`) is determined by `online_cfg.trainer_name`. * The trainer is initialized with the `policy_generator`, `env_generator`, a handle to the now-running `RolloutManager` actor (which it can obtain by calling a utility like `get_rollout_manager`), and the detailed training configurations from `online_cfg.train_config`. The `whole_config` string is also passed along for logging. * Crucially, the trainer's main training method (commonly `train()` or `fit()`) is then invoked. This call is typically blocking and marks the beginning of the actual iterative process of sampling data and updating the policy. ### 5. Sustained Operation After successfully launching the `RolloutManager` and the `Trainer` actors, the `run.py` script itself might appear to do very little. Often, it will enter a long `time.sleep()` loop or a similar mechanism to keep the main process alive. ```{tip} This behavior is characteristic of actor-based distributed systems. The primary work of data collection and model training is performed asynchronously by the Ray actors running in the background, potentially across multiple processes or even multiple machines in a cluster. The `run.py` script's main role after initialization is to ensure these background services remain operational. ``` In summary, `run.py` is the conductor of the online training orchestra. It doesn't play an instrument itself during the main performance but is indispensable for selecting the music (configuration), assembling the musicians (Ray actors), and cuing them to start. The complex harmonies of learning and interaction then unfold within the dedicated `Trainer` and `RolloutManager` components.