B Supplementary material

Below collects links to documentation, code vignettes, and animations related to the content. The content for Chapter 4, continues with a bit of an extended analysis looking at the visual displays in the user study, the participant demographics, and a parallel analysis regressing of log time.

B.1 Thesis

Content Link
thesis repository https://tinyurl.com/bddbs6sr
thesis, pdf format https://tinyurl.com/5c54s7bf
thesis, html format https://tinyurl.com/2p8m92bs
Penguins radial tour, Figure 1.2 https://vimeo.com/676723431
Penguins grand tour, Figure 2.4 https://vimeo.com/676723441

B.3 Chapter 4, user study extended analysis

This section covers extended analysis. First, it illustrations of the different visuals are provided. Then, the participant demographics are covered. Lastly, a parallel modeling analysis on log response time is conducted.

B.3.1 Visual methods

Below illustrates the three visual methods evaluated in the user study. Data was collected from a shiny application and pre-rendered gif files were displayed based on the selected inputs. The instructional video that the participants were shown at the start of the study can be viewed at https://vimeo.com/712674984.

Examples of the application displays for PCA, grand tour, and radial tour.

Figure B.1: Examples of the application displays for PCA, grand tour, and radial tour.

B.3.2 Survey participant demographics

The target population is relatively well-educated people, as linear projections may prove difficult for generalized consumption. Hence Prolific.co participants are restricted to those with an undergraduate degree (58,700 of the 150,400 users at the study time). From this cohort, 108 performed a complete study. Of these participants, 84 submitted the post-study survey, represented in the following heatmap. All participants were compensated for their time at 7.50 per hour, with a mean time of about 16 minutes. Figure B.2 shows a heat map of the demographics for these 84 participants.

Heatmaps of survey participant demographics; counts of age group by completed education as faceted across preferred pronouns. Our sample tended to be between 18 and 35 years of age with an undergraduate or graduate degree.

Figure B.2: Heatmaps of survey participant demographics; counts of age group by completed education as faceted across preferred pronouns. Our sample tended to be between 18 and 35 years of age with an undergraduate or graduate degree.

B.3.3 Response time

As a secondary explanatory variable, response time is considered. Response time is first log-transformed to remove its right skew. The same modeling procedure is repeated for this response. 1) Compare the performance of a battery of all additive and multiplicative models. Table B.1 shows the higher-level performance of these models over increasing model complexity. 2) Select the model with the same effect terms, \(\alpha \times \beta + \gamma + \delta\), with relatively high conditional \(R^2\) without becoming overly complex from interaction. The coefficients of this model are displayed in Table B.2.

Table B.1: Model performance regressing on log response time [seconds], \(\widehat{Y_2}\) random effect models. Conditional \(R^2\) includes the random effects, while marginal does not. The model \(\alpha \times \beta + \gamma + \delta\) model is selected to examine further as it has relatively high marginal \(R^2\) while having much less complexity than the complete interaction model.
Fixed effects No. levels No. terms AIC BIC R2 cond. R2 marg. RMSE
a 1 3 <span style=" font-weight: bold; " >1448</span> <span style=" font-weight: bold; " >1475</span> 0.645 0.007 0.553
a+b+c+d 4 8 1467 1516 0.647 0.017 0.552
a*b+c+d 5 12 1474 1541 0.656 0.024 0.548
a*b*c+d 8 28 1488 1627 0.673 0.054 0.536
a*b*c*d 15 54 1537 1792 <span style=" font-weight: bold; " >0.7</span> <span style=" font-weight: bold; " >0.062</span> <span style=" font-weight: bold; " >0.523</span>
Table B.2: Model coefficients for log response time [seconds] \(\widehat{Y_2} = \alpha \times \beta + \gamma + \delta\), with factor = pca, location = 0/100%, shape = EEE, and dim = 4 held as baselines. Location = 50/50% is the fixed term with the most substantial evidence and takes less time. In contrast, the interaction term location = 50/50%:shape = EEV has the most evidence and takes much longer on average.
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 2.71 0.14 42.6 19.06 0.000 ***
Visual
Visualgrand -0.23 0.12 567.6 -1.97 0.049
Visualradial 0.16 0.12 573.5 1.34 0.181
Fixed effects
Location33/66% 0.05 0.14 40.9 0.34 0.737
Location50/50% -0.05 0.14 42.1 -0.35 0.729
ShapeEEV -0.15 0.09 8.3 -1.61 0.145
Shapebanana -0.13 0.09 8.3 -1.42 0.192
Dim6 0.14 0.08 8.3 1.90 0.093
Interactions
Visualgrand:Location33/66% 0.24 0.18 580.9 1.34 0.181
Visualradial:Location33/66% -0.24 0.18 582.4 -1.32 0.188
Visualgrand:Location50/50% 0.12 0.18 578.6 0.69 0.491
Visualradial:Location50/50% 0.05 0.18 584.4 0.25 0.800