Overview

This webpage provides visualizations for different scene graphs produced in our experiments (see Section 6 of our paper) and shows exact precision and recall statistics. Each table below typically corresponds to an experiment where first column lists different methods that were used to analyze a dataset (with our method labeled as Hierarchy). The last column provides a link to visualizations, where each scene graph produced with an automatic algorithm is presented as an interactive tree with the root node on the left. Note that you can drag the tree while holding the left mouse button down and zoom in and out with a scrolling wheel. You can also collapse or expand some of the intermediate nodes in the hierarchy by clicking on them.

Hierarchical parsing results

We evaluate how well the scene graphs predicted by our hierarchical parsing algorithm match the ground-truth data.

For each dataset, we describe the topology of the grammar in 'Labels' and 'Rules'.

Bedroom
Grammar: Labels Rules
MethodsPrecisionRecallVisualization
Hierarchy 0.880145 0.890344 Link

Classroom
Grammar: Labels Rules
MethodsPrecisionRecallVisualization
Hierarchy 0.767421 0.800043 Link

Library
Grammar: Labels Rules
MethodsPrecisionRecallVisualization
Hierarchy 0.819319 0.709925 Link

Small bedroom
Grammar: Labels Rules
MethodsPrecisionRecallVisualization
Hierarchy 0.977429 0.975109 Link

Small library
Grammar: Labels Rules
MethodsPrecisionRecallVisualization
Hierarchy 1 1 Link

Comparison to alternative methods

We compare our method to two alternative methods
  1. Shape only. We select the label that maximizes the geometry term for each input node.This method can be treated as an object classification method based on shape descriptor.
  2. Flat grammar. We run our algorithm by using a flattened grammar that has only one production rule that connects all terminals directly to the axiom. Different from the first alterna- tive, this method leverages context information to label ob- jects, and it thus can be treated an implementation of context- based approach [Fisher and Hanrahan 2010],[Fisher et al. 2011],[Fisher et al. 2012],[Xu et al. 2013]
Since both alternative methods don't produce hierarchy, we only compare the performance of labeling leaf nodes (i.e. single objects).

Bedroom
MethodsPrecisionRecallVisualization
Shape-only 0.669876 0.669876
Flat 0.653519 0.653519 Link
Hierarchy 0.751039 0.751039 Link

Classroom
MethodsPrecisionRecallVisualization
Shape-only 0.651062 0.651062
Flat 0.729044 0.729044 Link
Hierarchy 0.766699 0.766699 Link

Library
MethodsPrecisionRecallVisualization
Shape-only 0.762636 0.762636
Flat 0.8499 0.8499 Link
Hierarchy 0.893907 0.893907 Link

Small bedroom
MethodsPrecisionRecallVisualization
Shape-only 0.786705 0.786705
Flat 0.829056 0.829056 Link
Hierarchy 0.956722 0.956722 Link

Small library
MethodsPrecisionRecallVisualization
Shape-only 0.372517 0.372517
Flat 0.47015 0.47015 Link
Hierarchy 1 1 Link

Handling over-segmentation

We test whether our method is able to parse scene graphs with moderate levels of over-segmentation. In this test, the leaves of the input scene graphs are not necessarily representative of basic category objects, but instead can represent parts of objects as determined by the leaf nodes of the scene graphs originally downloaded from the Trimble 3D Warehouse.

The grammars for over-segmented datasets are identical to the corresponding datasets without over-segmentation.

Bedroom
MethodsOverall Precision/RecallLeaf Precision/RecallInternal precision/recallVisualization
Shape-only N/A 0.431168/0.341712 N/A
Flat N/A 0.492733/0.492733 N/A Link
Hierarchy 0.79163/0.808069 0.611334/0.611334 0.845021/0.867613 Link

Classroom
MethodsOverall Precision/RecallLeaf Precision/RecallInternal precision/recallVisualization
Shape-only N/A 0.499364/0.339412 N/A
Flat N/A 0.670745/0.670745 N/A Link
Hierarchy 0.717406/0.719436 0.722383/0.722383 0.711483/0.715906 Link

Library
MethodsOverall Precision/RecallLeaf Precision/RecallInternal precision/recallVisualization
Shape-only N/A 0.718936/0.523755 N/A
Flat N/A 0.760513/0.760513 N/A Link
Hierarchy 0.733464/0.738336 0.788351/0.788351 0.576991/0.592037 Link

Parsing other datasets

We test whether our algorithm can learn a hiearchical grammar on one data set and then use it to parse a different data set. For this test, we downloaded the Sketch2Scene Bedroom dataset [Xu et al. 2013] and then parsed each of the Bedroom scene graphs using the grammar learned our Bedroom dataset.

Since the Sketch2Scene ground-truth label set is different from ours, we created a mapping from our label set to theirs so that labels predicted by our parser could be compared to their ground truth. The mapping can be found here. Sketch2Scene labels are to the left of ':', and ours are to the right.

MethodsPrecisionRecallVisualization
Shape-only 0.471161 0.471161
Flat 0.363325 0.363325 Link
Hierarchy 0.631096 0.631096 Link

Impact of size of training set

We tested how the performance of our algorithm is affected by the size of the training set. For each scene graph in the Bedroom dataset, we trained a grammar on X% of the other scenes selected randomly (X=10%,40%,70%,100%), used that grammar to parse the scene, and then evaluated the results.

Bedroom
Fraction of training dataOverall precision/recallLeaf precision/recallInternal precision/recall
0.1 0.601441/0.626581 0.348913/0.348913 0.676243/0.71443
0.4 0.817/0.819559 0.672372/0.672372 0.856617/0.865193
0.7 0.839816/0.843568 0.720561/0.720561 0.871797/0.881506
1.0 0.850199/0.857874 0.751039/0.751039 0.880145/0.890344

Library
Fraction of training dataOverall precision/recallLeaf precision/recallInternal precision/recall
0.4 0.562281/0.542761 0.656502/0.656502 0.124244/0.103265
0.7 0.514331/0.565454 0.637575/0.637575 0.193314/0.286776
1.0 0.880245/0.856082 0.893907/0.893907 0.819319/0.709925

Classroom
Fraction of training dataOverall precision/recallLeaf precision/recallInternal precision/recall
0.1 0.230389/0.226377 0.208231/0.208231 0.25798/0.248107
0.4 0.71131/0.732411 0.723641/0.723641 0.697448/0.742914
0.7 0.708463/0.719453 0.691966/0.691966 0.727568/0.752368
1.0 0.767035/0.781873 0.766699/0.766699 0.767421/0.800043

Bedroom-oversegmented
Fraction of training dataOverall precision/recallLeaf precision/recallInternal precision/recall
0.1 0.584148/0.601218 0.262714/0.262714 0.662521/0.703916
0.4 0.761873/0.782055 0.569025/0.569025 0.816758/0.843666
0.7 0.775328/0.794054 0.57987/0.57987 0.830233/0.857611
1.0 0.79163/0.808069 0.611334/0.611334 0.845021/0.867613

Impact of individual energy terms

We ran experiments to show the impact of each energy term on the final resultsby disabling each one and re-running the first experiments.

Bedroom
SettingOverall precision/recallLeaf precision/recallInternal precision/recall
Shape descriptor off 0.714798/0.752047 0.512107/0.512107 0.777085/0.812646
Spatial relation off 0.845096/0.840544 0.7088/0.7088 0.882887/0.881999
Cardinality off 0.837763/0.859981 0.672696/0.672696 0.887713/0.892996
All on 0.850199/0.857874 0.751039/0.751039 0.880145/0.890344

Library
SettingOverall precision/recallLeaf precision/recallInternal precision/recall
Shape descriptor off 0.72535/0.732613 0.787876/0.787876 0.49497/0.519074
Spatial relation off 0.719292/0.721068 0.775664/0.775664 0.504057/0.510109
Cardinality off 0.67925/0.719658 0.769467/0.769467 0.408882/0.527195
All on 0.880245/0.856082 0.893907/0.893907 0.819319/0.709925

Classroom
SettingOverall precision/recallLeaf precision/recallInternal precision/recall
Shape descriptor off 0.568747/0.580433 0.502463/0.502463 0.644693/0.673802
Spatial relation off 0.710386/0.712051 0.729593/0.729593 0.687503/0.691045
Cardinality off 0.571335/0.649819 0.655568/0.655568 0.493855/0.642935
All on 0.767035/0.781873 0.766699/0.766699 0.767421/0.800043

Bedroom oversegmented
SettingOverall precision/recallLeaf precision/recallInternal precision/recall
Shape descriptor off 0.623883/0.647082 0.38415/0.38415 0.696833/0.727959
Spatial relation off 0.779978/0.789252 0.590994/0.590994 0.837573/0.848216
Cardinality off 0.778301/0.794799 0.55161/0.55161 0.850771/0.850106
All on 0.79163/0.808069 0.611334/0.611334 0.845021/0.867613