Supplementary Material for AutoStyle: Details of User Study

Due to the page limit of EGSR submission, we put the details of our user studies here, including the detailed setup and results.

We conducted two user studies:

Study 1 asks users to select the correct name of the style that is used to generate the style transfer result.
Study 2 asks users to select the correct style transfer result that is generated with the given input image and style.

We generate the test cases in this way:

Given a pair of input image and a style (the correct style), we first randomly select 4 other styles as the wrong styles.
Transfer the correct style to the input image, we get a correct style transfer results.
Transfer the wrong styles to the input image, we get wrong style transfer results.
We show users the input image, the correct style transfer result, and the correct style name, which is mixed with the four wrong style names as a test case for Study 1.
We show users the input image, the correct style name, and the correct style transfer result, which is mixed with the four wrong style transfer results as a test case for Study 2.

Note that all style transfer results use the top-ranked clusters of their style. We have prepared 20 pairs of input image and style, and therefore we create 20 test cases for both studies.

The studies are conducted on Amazon Mechanical Turk. We open our tasks to turkers all over the world. In each task, we expect to get users' opinions on 15 test cases. To control the quality of submissions from users, we design two methods to filter bad submissions:

5 of the 15 test cases are repeated. A submission is rejected if less than 3 of 5 repeated tests match.
We add 5 easy tests, i.e. tasks hand crafted with wrong candidates that are significantly different and easily distinguished from the correct one. A submission is rejected if less than 3 of 5 easy tests match the ground truth.

Note that a submission is accepted only if they pass the both validations above. Therefore, in each task, we ask users to work on 25 tests.

Additional User Studies for Three Styles

Using only the top-ranked clusters can fail to produce results with distinctive visual appearance. This happens to three styles in our user studies:rust, sunset, and nightclub. For these three styles, for both Study 1 and Study 2, we conduct an additional user study with lower-ranked clusters that have more distinctive visual appearance.

In the additional user studies, we expect to get users' opinions on 5 test cases. Three of them are for rust, sunset, and nightclub, with our selected lower-ranked clusters that have more distinctive visual appearance. The remaining two are beach and grass, with the exact setup in the main user study, to validate that the reproducibility of our user study results. As in the main user study, we repeat 5 tests (so here we actually repeat all those 5 tests), and we also added the same five easy tests to control the quality of submissions.

The Easy Tests Used in Our User Study

Follow this link, you will be able to see the easy tests used in our user study.

Statistics of Submissions from Amazon Mechanical Turk

The tasks are open to turkers all over the world on Amazon Mechanical Turk.

Study	# of Submissions	# of Accepted Submissions
Study 1 (Main)	83	56
Study 2 (Main)	101	60
Study 1 (Additional)	61	49
Study 2 (Additional)	61	41

Detailed Results

In the detailed results, you will see the correct and wrong style name and style transfer results, and see how many users select each of them.