Hierarchy results

Overview

This webpage provides visualizations for different scene graphs produced in our experiments (see Section 6 of our paper) and shows exact precision and recall statistics. Each table below typically corresponds to an experiment where first column lists different methods that were used to analyze a dataset (with our method labeled as Hierarchy). The last column provides a link to visualizations, where each scene graph produced with an automatic algorithm is presented as an interactive tree with the root node on the left. Note that you can drag the tree while holding the left mouse button down and zoom in and out with a scrolling wheel. You can also collapse or expand some of the intermediate nodes in the hierarchy by clicking on them.

Hierarchical parsing results

We evaluate how well the scene graphs predicted by our hierarchical parsing algorithm match the ground-truth data.

For each dataset, we describe the topology of the grammar in 'Labels' and 'Rules'.

Bedroom
Grammar: Labels Rules

Methods Precision Recall Visualization

Hierarchy 0.880145 0.890344 Link

Classroom
Grammar: Labels Rules

Methods Precision Recall Visualization

Hierarchy 0.767421 0.800043 Link

Library
Grammar: Labels Rules

Methods Precision Recall Visualization

Hierarchy 0.819319 0.709925 Link

Small bedroom
Grammar: Labels Rules

Methods Precision Recall Visualization

Hierarchy 0.977429 0.975109 Link

Small library
Grammar: Labels Rules

Methods Precision Recall Visualization

Hierarchy 1 1 Link

Comparison to alternative methods

We compare our method to two alternative methods

Shape only. We select the label that maximizes the geometry term for each input node.This method can be treated as an object classification method based on shape descriptor.
Flat grammar. We run our algorithm by using a flattened grammar that has only one production rule that connects all terminals directly to the axiom. Different from the first alterna- tive, this method leverages context information to label ob- jects, and it thus can be treated an implementation of context- based approach [Fisher and Hanrahan 2010],[Fisher et al. 2011],[Fisher et al. 2012],[Xu et al. 2013]

Since both alternative methods don't produce hierarchy, we only compare the performance of labeling leaf nodes (i.e. single objects).

Bedroom

Methods Precision Recall Visualization

Shape-only 0.669876 0.669876

Flat 0.653519 0.653519 Link

Hierarchy 0.751039 0.751039 Link

Classroom

Methods Precision Recall Visualization

Shape-only 0.651062 0.651062

Flat 0.729044 0.729044 Link

Hierarchy 0.766699 0.766699 Link

Library

Methods Precision Recall Visualization

Shape-only 0.762636 0.762636

Flat 0.8499 0.8499 Link

Hierarchy 0.893907 0.893907 Link

Small bedroom

Methods Precision Recall Visualization

Shape-only 0.786705 0.786705

Flat 0.829056 0.829056 Link

Hierarchy 0.956722 0.956722 Link

Small library

Methods Precision Recall Visualization

Shape-only 0.372517 0.372517

Flat 0.47015 0.47015 Link

Hierarchy 1 1 Link

Handling over-segmentation

We test whether our method is able to parse scene graphs with moderate levels of over-segmentation. In this test, the leaves of the input scene graphs are not necessarily representative of basic category objects, but instead can represent parts of objects as determined by the leaf nodes of the scene graphs originally downloaded from the Trimble 3D Warehouse.

The grammars for over-segmented datasets are identical to the corresponding datasets without over-segmentation.

Bedroom

Methods Overall Precision/Recall Leaf Precision/Recall Internal precision/recall Visualization

Shape-only N/A 0.431168/0.341712 N/A

Flat N/A 0.492733/0.492733 N/A Link

Hierarchy 0.79163/0.808069 0.611334/0.611334 0.845021/0.867613 Link

Classroom

Methods Overall Precision/Recall Leaf Precision/Recall Internal precision/recall Visualization

Shape-only N/A 0.499364/0.339412 N/A

Flat N/A 0.670745/0.670745 N/A Link

Hierarchy 0.717406/0.719436 0.722383/0.722383 0.711483/0.715906 Link

Library

Methods Overall Precision/Recall Leaf Precision/Recall Internal precision/recall Visualization

Shape-only N/A 0.718936/0.523755 N/A

Flat N/A 0.760513/0.760513 N/A Link

Hierarchy 0.733464/0.738336 0.788351/0.788351 0.576991/0.592037 Link

Parsing other datasets

We test whether our algorithm can learn a hiearchical grammar on one data set and then use it to parse a different data set. For this test, we downloaded the Sketch2Scene Bedroom dataset [Xu et al. 2013] and then parsed each of the Bedroom scene graphs using the grammar learned our Bedroom dataset.

Since the Sketch2Scene ground-truth label set is different from ours, we created a mapping from our label set to theirs so that labels predicted by our parser could be compared to their ground truth. The mapping can be found here. Sketch2Scene labels are to the left of ':', and ours are to the right.

Methods Precision Recall Visualization

Shape-only 0.471161 0.471161

Flat 0.363325 0.363325 Link

Hierarchy 0.631096 0.631096 Link

Impact of size of training set

We tested how the performance of our algorithm is affected by the size of the training set. For each scene graph in the Bedroom dataset, we trained a grammar on X% of the other scenes selected randomly (X=10%,40%,70%,100%), used that grammar to parse the scene, and then evaluated the results.

Bedroom

Fraction of training data Overall precision/recall Leaf precision/recall Internal precision/recall

0.1 0.601441/0.626581 0.348913/0.348913 0.676243/0.71443

0.4 0.817/0.819559 0.672372/0.672372 0.856617/0.865193

0.7 0.839816/0.843568 0.720561/0.720561 0.871797/0.881506

1.0 0.850199/0.857874 0.751039/0.751039 0.880145/0.890344

Library

Fraction of training data Overall precision/recall Leaf precision/recall Internal precision/recall

0.4 0.562281/0.542761 0.656502/0.656502 0.124244/0.103265

0.7 0.514331/0.565454 0.637575/0.637575 0.193314/0.286776

1.0 0.880245/0.856082 0.893907/0.893907 0.819319/0.709925

Classroom

Fraction of training data Overall precision/recall Leaf precision/recall Internal precision/recall

0.1 0.230389/0.226377 0.208231/0.208231 0.25798/0.248107

0.4 0.71131/0.732411 0.723641/0.723641 0.697448/0.742914

0.7 0.708463/0.719453 0.691966/0.691966 0.727568/0.752368

1.0 0.767035/0.781873 0.766699/0.766699 0.767421/0.800043

Bedroom-oversegmented

Fraction of training data Overall precision/recall Leaf precision/recall Internal precision/recall

0.1 0.584148/0.601218 0.262714/0.262714 0.662521/0.703916

0.4 0.761873/0.782055 0.569025/0.569025 0.816758/0.843666

0.7 0.775328/0.794054 0.57987/0.57987 0.830233/0.857611

1.0 0.79163/0.808069 0.611334/0.611334 0.845021/0.867613

Impact of individual energy terms

We ran experiments to show the impact of each energy term on the final resultsby disabling each one and re-running the first experiments.

Bedroom

Setting Overall precision/recall Leaf precision/recall Internal precision/recall
Shape descriptor off 0.714798/0.752047 0.512107/0.512107 0.777085/0.812646

Spatial relation off 0.845096/0.840544 0.7088/0.7088 0.882887/0.881999

Cardinality off 0.837763/0.859981 0.672696/0.672696 0.887713/0.892996

All on 0.850199/0.857874 0.751039/0.751039 0.880145/0.890344

Library

Setting Overall precision/recall Leaf precision/recall Internal precision/recall
Shape descriptor off 0.72535/0.732613 0.787876/0.787876 0.49497/0.519074

Spatial relation off 0.719292/0.721068 0.775664/0.775664 0.504057/0.510109

Cardinality off 0.67925/0.719658 0.769467/0.769467 0.408882/0.527195

All on 0.880245/0.856082 0.893907/0.893907 0.819319/0.709925

Classroom

Setting Overall precision/recall Leaf precision/recall Internal precision/recall
Shape descriptor off 0.568747/0.580433 0.502463/0.502463 0.644693/0.673802

Spatial relation off 0.710386/0.712051 0.729593/0.729593 0.687503/0.691045

Cardinality off 0.571335/0.649819 0.655568/0.655568 0.493855/0.642935

All on 0.767035/0.781873 0.766699/0.766699 0.767421/0.800043

Bedroom oversegmented

Setting Overall precision/recall Leaf precision/recall Internal precision/recall
Shape descriptor off 0.623883/0.647082 0.38415/0.38415 0.696833/0.727959

Spatial relation off 0.779978/0.789252 0.590994/0.590994 0.837573/0.848216

Cardinality off 0.778301/0.794799 0.55161/0.55161 0.850771/0.850106

All on 0.79163/0.808069 0.611334/0.611334 0.845021/0.867613

Methods	Precision	Recall	Visualization
Hierarchy	0.880145	0.890344	Link

Methods	Overall Precision/Recall	Leaf Precision/Recall	Internal precision/recall	Visualization
Shape-only	N/A	0.431168/0.341712	N/A
Flat	N/A	0.492733/0.492733	N/A	Link
Hierarchy	0.79163/0.808069	0.611334/0.611334	0.845021/0.867613	Link

Fraction of training data	Overall precision/recall	Leaf precision/recall	Internal precision/recall
0.1	0.601441/0.626581	0.348913/0.348913	0.676243/0.71443
0.4	0.817/0.819559	0.672372/0.672372	0.856617/0.865193
0.7	0.839816/0.843568	0.720561/0.720561	0.871797/0.881506
1.0	0.850199/0.857874	0.751039/0.751039	0.880145/0.890344

Setting	Overall precision/recall	Leaf precision/recall	Internal precision/recall
Shape descriptor off	0.714798/0.752047	0.512107/0.512107	0.777085/0.812646
Spatial relation off	0.845096/0.840544	0.7088/0.7088	0.882887/0.881999
Cardinality off	0.837763/0.859981	0.672696/0.672696	0.887713/0.892996
All on	0.850199/0.857874	0.751039/0.751039	0.880145/0.890344