Talk

Building Open-Ended Embodied Agents with Internet-Scale Knowledge and Large Language Models

avatar
Guanzhi Wang
2023/10/19
Embodied AI
LLM-Agents
Robotic Learning
main image for this paper

Abstract

MineDojo: Autonomous agents have made great strides in specialist domains like Atari games and Go. However, they typically learn tabula rasa in isolated environments with limited and manually conceived objectives, thus failing to generalize across a wide spectrum of tasks and capabilities. Inspired by how humans continually learn and adapt in the open world, we advocate a trinity of ingredients for building generalist agents: 1) an environment that supports a multitude of tasks and goals, 2) a large-scale database of multimodal knowledge, and 3) a flexible and scalable agent architecture. We introduce MineDojo, a new framework built on the popular Minecraft game that features a simulation suite with thousands of diverse open-ended tasks and an internet-scale knowledge base with Minecraft videos, tutorials, wiki pages, and forum discussions. Using MineDojo’s data, we propose a novel agent learning algorithm that leverages large pre-trained video-language models as a learned reward function. Our agent is able to solve a variety of open-ended tasks specified in free-form language without any manually designed dense shaping reward.

Speaker

Guanzhi Wang , a third-year Ph.D. student at Caltech, advised by Prof. Georgia Gkioxari and Prof. Yisong Yue. He is also a research intern at the NVIDIA GEAR lab, working with Dr. Jim Fan and Prof. Yuke Zhu. He obtained his M.S. degree from Stanford University, advised by Prof. Fei-Fei Li, Prof. Yuke Zhu, Dr. Jim Fan and Dr. Shyamal Buch. He obtained his B.S. degree from the Hong Kong University of Science and Technology, where he has been lucky to work with Prof. Chi-Keung Tang and Prof. Yu-Wing Tai. His research interests lie in the area of foundation models, robotics, and embodied agents. He is passionate about building embodied foundation agents that are generally capable to discover and pursue complex and open-ended objectives, and understand how the world works through massive pre-trained knowledge.

Video

Extra Details

Speaker Website / Paper Link / Paper Code/ Paper Project Page