Abstract
OSWorld is a novel, scalable environmentdesigned to evaluate autonomous digital agents across diverse real-worldcomputer tasks. Supporting multiple operating systems like Ubuntu, Windows, andmacOS, OSWorld enables comprehensive, execution-based evaluations of agents ininteractive settings involving web and desktop applications. Our benchmarkincludes 369 tasks derived from actual use cases, highlighting the currentlimitations of state-of-the-art agents, which achieve only a 12.24% successrate compared to humans’ 72.36%. This platform provides crucial insights foradvancing multimodal agent development. Resources are publicly available toencourage further exploration in this promising field.
Speaker
Tianbao Xie is currently a second-year PhDstudent at the University of Hong Kong, where he is advised by Tao Yu(primary), Lingpeng Kong, and Ben Kao. His primary research interests lie inArtificial Intelligence and Natural Language Processing, with a particularfocus on developing large-scale neural-symbolic AI systems and autonomousagents.
Video
Extra Details
Speaker Website / Paper Link / Paper Code/ Paper Project Page