Abstract
Multi-view stereo reconstruction (MVS) in the wild requires to first estimate the camera parameters e.g. intrinsic and extrinsic parameters. These are usually tedious and cumbersome to obtain, yet they are mandatory to triangulate corresponding pixels in 3D space, which is the core of all best performing MVS algorithms. In this work, we take an opposite stance and introduce DUSt3R, a radically novel paradigm for Dense and Unconstrained Stereo 3D Reconstruction of arbitrary image collections, i.e. operating without prior information about camera calibration nor viewpoint poses. We cast the pairwise reconstruction problem as a regression of pointmaps, relaxing the hard constraints of usual projective camera models. We show that this formulation smoothly unifies the monocular and binocular reconstruction cases. In the case where more than two images are provided, we further propose a simple yet effective global alignment strategy that expresses all pairwise pointmaps in a common reference frame. We base our network architecture on standard Transformer encoders and decoders, allowing us to leverage powerful pretrained models. Our formulation directly provides a 3D model of the scene as well as depth information, but interestingly, we can seamlessly recover from it, pixel matches, relative and absolute camera. Exhaustive experiments on all these tasks showcase that the proposed DUSt3R can unify various 3D vision tasks and set new SoTAs on monocular/multi-view depth estimation as well as relative pose estimation. In summary, DUSt3R makes geometric 3D vision tasks easy.
Speaker
Shuzhe Wang is currently in the final year of his Ph.D. studies at the Department of Computer Science, Aalto University, under the supervision of Prof. Juho Kannala. He has gained valuable experience as a research intern at NAVER LABS Europe in Grenoble, France, where he worked alongside Dr. Jerome Revaud and Dr. Boris Chidlovskii. Before his time in France, Wang held a position as a visiting researcher at the Computer Vision and Geometry Group at ETH Zurich in 2022, advised by Prof. Marc Pollefeys and Dr. Daniel Barath. He completed his master’s degree in Control, Robotics, and Autonomous Systems at the Department of Electrical Engineering and Automation, Aalto University. Wang’s research interests are centered around geometric computer vision and 3D scene understanding, with a particular focus on visual localization, image matching, 3D reconstruction, and 3D visual grounding.
Video
Extra Details
Speaker Website / Paper Link / Project Video/ Paper Project Page / Paper Code