SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite

IEEE Conference on Computer Vision and Pattern Recognition (CVPR) oral presentation, June 2015

Shuran Song, Samuel P. Lichtenberg, Jianxiong Xiao

Although RGB-D sensors have enabled major breakthroughs for several vision tasks, such as 3D reconstruction, we haven not achieved a similar performance jump for high-level scene understanding. Perhaps one of the main reasons for this is the lack of a benchmark of reasonable size with 3D annotations for training and 3D metrics for evaluation. In this paper, we present an RGB-D benchmark suite for the goal of advancing the state-of-the-art in all major scene understanding tasks. Our dataset is captured by four different sensors and contains 10,000 RGB-D images, at a similar scale as PASCAL VOC. The whole dataset is densely annotated and includes 146,617 2D polygons and 58,657 3D bounding boxes with accurate object orientations, as well as a 3D room layout and category for scenes. This dataset enables us to train data-hungry algorithms for scene-understanding tasks, evaluate them using direct and meaningful 3D metrics, avoid overfitting to a small testing set, and study cross-sensor bias.

Shuran Song, Samuel P. Lichtenberg, and Jianxiong Xiao.
"SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite."
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) oral presentation, June 2015.

@inproceedings{Song:2015:SRA,
   author = "Shuran Song and Samuel P. Lichtenberg and Jianxiong Xiao",
   title = "{SUN} {RGB-D}: A {RGB-D} Scene Understanding Benchmark Suite",
   booktitle = "IEEE Conference on Computer Vision and Pattern Recognition (CVPR) oral
      presentation",
   year = "2015",
   month = jun
}