The more you look, the more you see: towards general object understanding through recursive refinement
Winter Conference on Applications of Computer Vision (WACV), March 2018
Abstract
Comprehensive object understanding is a central challenge in visual recognition, yet most advances with deep
neural networks reason about each aspect in isolation. In
this work, we present a unified framework to tackle this
broader object understanding problem. We formalize a
refinement module that recursively develops understanding across space and semantics — “the more it looks, the
more it sees.” More concretely, we cluster the objects
within each semantic category into fine-grained subcategories; our recursive model extracts features for each region of interest, recursively predicts the location and the
content of the region, and selectively chooses a small subset of the regions to process in the next step. Our model
can quickly determine if an object is present, followed by
its class (“Is this a person?”), and finally report fine-grained predictions (“Is this person standing?”). Our
experiments demonstrate the advantages of joint reasoning about spatial layout and fine-grained semantics. On
the PASCAL VOC dataset, our proposed model simultaneously achieves strong performance on instance segmentation, part segmentation and keypoint detection in a single efficient pipeline that does not require explicit training
for each task. One of the reasons for our strong performance is the ability to naturally leverage highly-engineered
architectures, such as Faster-RCNN, within our pipeline.
Citation
Jingyan Wang, Olga Russakovsky, and Deva Ramanan.
"The more you look, the more you see: towards general object understanding through recursive refinement."
Winter Conference on Applications of Computer Vision (WACV), March 2018.
BibTeX
@inproceedings{Wang:2018:TMY, author = "Jingyan Wang and Olga Russakovsky and Deva Ramanan", title = "The more you look, the more you see: towards general object understanding through recursive refinement", booktitle = "Winter Conference on Applications of Computer Vision (WACV)", year = "2018", month = mar }