Visualizing ordinal data

ggplot2
ordinal
R
A quick tutorial on how to visualize scalar data using ggplot2.
Author

Guilherme D. Garcia

Published

October 23, 2023

Ordinal data can be a bit tricky to visualize. The goal here is to share one type of figure I personally like to use when I analyze this type of data. Because scalar data is rarely normal, using ggplot2 functions such as stat_summary() isn’t particularly useful, since means aren’t that informative in such scenarios. Furthermore, since we tend to use ordinal models in these situations, one way to maximize the alignment between visualization and modeling is to use the actual categories from the scale in our figure.

Preparing the data

Let’s load some packages and our data (viz.RData), which you can download here. These are hypothetical data used in Garcia (2023).

Code
library(tidyverse)
library(scales)

load("viz.RData")

glimpse(viz)
#> Rows: 600
#> Columns: 8
#> $ ID          <fct> s1, s1, s1, s1, s1, s1, s1, s1, s1, s1, s1, s1, s1, s1, s1…
#> $ Item        <fct> i1, i2, i3, i4, i5, i6, i7, i8, i9, i10, i11, i12, i13, i1…
#> $ L1          <fct> English, English, English, English, English, English, Engl…
#> $ Age         <int> 21, 18, 40, 31, 35, 50, 38, 38, 27, 24, 26, 32, 38, 42, 42…
#> $ Proficiency <fct> Beg, Beg, Beg, Beg, Beg, Beg, Beg, Beg, Beg, Beg, Beg, Beg…
#> $ Certainty   <ord> 1, 4, 4, 1, 2, 3, 3, 1, 1, 3, 3, 1, 2, 3, 2, 2, 2, 2, 1, 3…
#> $ RT          <dbl> 2.7625297, 3.1230339, 3.7038581, 9.6426679, 4.6210513, 0.1…
#> $ Response    <fct> Correct, Incorrect, Incorrect, Incorrect, Correct, Incorre…

We now create a summary where we calculate the percentage of responses in each certainty category by L1 and Proficiency, the two variables of interest in our example. In the code below, Dark is a dummy column that will be useful shortly.

Code
prop = viz |> 
  summarize(n = n(), .by = c(L1, Proficiency, Certainty)) |> 
  mutate(Prop = n / sum(n), .by = c(L1, Proficiency),
         Dark = if_else(Certainty %in% c("3", "4"), "yes", "no"))

prop |> 
  slice_sample(n = 3)
#> # A tibble: 3 × 6
#>   L1      Proficiency Certainty     n  Prop Dark 
#>   <fct>   <fct>       <ord>     <int> <dbl> <chr>
#> 1 Spanish Adv         2            12  0.12 no   
#> 2 English Adv         2             9  0.09 no   
#> 3 Spanish Beg         2            35  0.35 no

Next, lets’s create our figure, which will have several layers.

Code
ggplot(data = prop, aes(x = Proficiency, y = Prop, fill = Certainty)) + 
  geom_col(color = "steelblue", linewidth = 0.5) + 
  facet_grid(~L1) +
  theme_classic(base_family = "Futura", base_size = 15) + 
  scale_fill_brewer(palette = 1) + 
  geom_text(aes(label = str_c(Prop * 100, "%"), color = Dark, size = Prop), 
            position = position_stack(vjust = 0.5), 
            fontface = "bold", family = "Futura", show.legend = FALSE) + 
  scale_color_manual(values = c("black", "white")) +
  scale_y_reverse() +
  coord_flip() +
  theme(legend.position = "none",
        axis.line = element_blank(),
        axis.ticks = element_blank(),
        axis.text.x = element_blank()) +
  labs(y = "\nCertainty scale: 1--4",
       x = "Proficiency")

Plotting ordinal data using size and colors

The figure uses bars and coord_flip() to emulate the original scale in the data. The proportions of each scale category are shown as text, and their size is proportional to the actual percentage. Finally, we use colors to match lower and upper bounds of the scale, so it’s easy to see that advanced learners are more certain in both language groups compared to beginners.

The dummy columns Dark plays a minor but visually important role here: it’s there to make sure that the color of the percentages and the color of the bars have enough contrast. This figure is adapted from Garcia (2021) (chapters 5 and 8).


Copyright © 2024 Guilherme Duarte Garcia

References

Garcia, Guilherme D. 2021. Data Visualization and Analysis in Second Language Research. New York NY: Routledge.
———. 2023. “Quantitative Data Visualization.” In The Encyclopedia of Applied Linguistics, edited by Carol A Chapelle. John Wiley & Sons, Ltd.