Projects

JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse

MCU: An Evaluation Framework for Open-Ended Game Agents

ROCKET- 1: Master Open-World Interaction with Visual-Temporal Context Prompting

OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents

JARVIS-1: Open-world Multi-task Agents with Memory-Augmented Multimodal Language Models

GROOT: Learning to Follow Instructions by Watching Gameplay Videos