Learning Compositional Models of the World

Abstract

To construct intelligent embodied agents, it is essential for these agents to generalize to settings beyond those they have encountered before. This concept is exemplified through the use of compositional generative models that represent parts of the world, enabling generalization to scenarios where no prior data is available. The approach involves compositional generative modeling, which allows generative models to function outside their training distribution by constructing complex generative models from smaller, constituent components. The discussion begins with an introduction to energy-based models, demonstrating their role in facilitating compositional generative modeling. These compositional models enable the synthesis of complex plans for novel tasks during inference. Additionally, the applicability of compositionality is extended to multiple foundation models trained on various types of Internet data. This enables the creation of decision-making systems capable of hierarchical planning and solving long-horizon problems in a zero-shot manner.

Speaker

Yilun Du , a Senior Research Scientist at Google DeepMind. He will join Harvard University as an Assistant Professor in the Kempner Institute and the Department of Computer Science starting in Fall 2025.He completed a PhD in Electrical Engineering and Computer Science at MIT, under the supervision of Prof. Leslie Kaelbling, Prof. Tomas Lozano-Perez, and Prof. Joshua B. Tenenbaum. Prior to that, He earned a bachelor’s degree from MIT, served as a Research Fellow at OpenAI, and worked as an intern and visiting researcher at FAIR and Google DeepMind. Additionally, He was awarded a gold medal at the International Biology Olympiad. Their research focuses on generative models, decision-making, robot learning, embodied agents, and the application of these tools in scientific domains.

Video

Extra Details

Speaker Website / Paper Link / Paper Code / Paper Project Page