[PDF][PDF] Real-time 3D visual SLAM with a hand-held RGB-D camera

N Engelhard, F Endres, J Hess, J Sturm… - Proc. of the RGB-D …, 2011 - cvai.cit.tum.de
Proc. of the RGB-D Workshop on 3D Perception in Robotics at the European …, 2011cvai.cit.tum.de
The practical applications of 3D model acquisition are manifold. In this paper, we present
our RGB-D SLAM system, ie, an approach to generate colored 3D models of objects and
indoor scenes using the hand-held Microsoft Kinect sensor. Our approach consists of four
processing steps as illustrated in Figure 1. First, we extract SURF features from the incoming
color images. Then we match these features against features from the previous images. By
evaluating the depth images at the locations of these feature points, we obtain a set of point …
The practical applications of 3D model acquisition are manifold. In this paper, we present our RGB-D SLAM system, ie, an approach to generate colored 3D models of objects and indoor scenes using the hand-held Microsoft Kinect sensor. Our approach consists of four processing steps as illustrated in Figure 1. First, we extract SURF features from the incoming color images. Then we match these features against features from the previous images. By evaluating the depth images at the locations of these feature points, we obtain a set of point-wise 3D correspondences between any two frames. Based on these correspondences, we estimate the relative transformation between the frames using RANSAC. The third step is to improve this initial estimate using a variant of the ICP algorithm [1]. As the pair-wise pose estimates between frames are not necessarily globally consistent, we optimize the resulting pose graph in the fourth step using a pose graph solver [4]. The output of our algorithm is a globally consistent 3D model of the perceived environment, represented as a colored point cloud. The full source code of our system is available as open source [2]. With an earlier version of our system, we participated in the ROS 3D challenge organized by Willow Garage and won the first prize in the category “most useful”. Our approach is similar to the recent work of Henry et. al [5]. Our approach applies SURF instead of SIFT features. Additionally, our source code is available online. Figures 2 and 3 illustrate the quality of the resulting 3D models. For both experiments, we slowly moved the Kinect around the object and acquired around 12 RGB-D frames. Computing the model took approximately 2 seconds per frame on an Intel i7 with 2 GHz. We applied our approach also to a large variety of other objects. Videos with more results are available online [3]. We will demo our system during the RGB-D workshop. Further, we plan to evaluate our system using ground truth information in the near future. Our approach enables a robot to generate 3D models of the objects in the scene. But also applications outside of robotics are possible. For example, our system could be used by interior designers to generate models of flats and to digitally refurbish them and show them to potential customers. At the moment, we do not deal with the problem of automatic view point selection but assume instead that the user is moving the camera through the scene. a N. Engelhard, F. Endres, J. Hess, and W. Burgard are with the Autonomous Intelligent Systems Lab, Computer Science Department, University of Freiburg, Germany.{engelhar, endres, hess, burgard}@ informatik. uni-freiburg. de b J. Sturm is with the Computer Vision and Pattern Recognition Group, Computer Science Department, Technical University of Munich, Germany. sturmju@ in. tum. de
Input: stream of RGB-D images feature extraction and matching (SURF) pose estimation (RANSAC) pose refinement (ICP) pose graph optimization (HOGMAN)
Output: 3D model (colored point cloud)
cvai.cit.tum.de
以上显示的是最相近的搜索结果。 查看全部搜索结果