Prompting—The New API for Interactive Visual Intelligence

Abstract

Originating from natural language processing, the new paradigm of prompting has recently swept through the computer vision community, which brought disruptive changes to a wide spectrum of domains such as image classification and image generation. In comparison to the traditional fixed-once-learned architecture, like a linear classifier trained to recognize a specific set of categories, prompting offers greater flexibility and more opportunities for novel applications. This is because prompting allows the model to perform new tasks, such as recognizing unseen categories, by tuning textual instructions or modifying a small number of parameters in the model’s input space while keeping most of the pre-trained parameters untouched. This paradigm significantly pushes human-AI interaction to unprecedented levels. This talk will discuss how to prompt vision foundation models like vision-language models for generic visual perception tasks, e.g., classification and detection.

Speaker

Dr. Kaiyang Zhou is an Assistant Professor at the Department of Computer Science, Hong Kong Baptist University. He was previously a postdoc at Nanyang Technological University, Singapore. He received his PhD in computer science from the University of Surrey, UK. His research lies at the intersection of computer vision and machine learning, and has been published at top-tier journals and conferences in relevant fields such as TPAMI, IJCV, and CVPR. He is an associate editor of the International Journal of Computer Vision (IJCV), and the creator of the popular open-source software Torchreid for person re-identification.

Video

Extra Details

Speaker Website / Paper Link / Paper Code/ Paper Project Page