MÉMOVISUELDAGRONOMIE

The causes of colony loss according to American beekeepers


The data for this visualization comes from the TidyTuesday project, which offers a dataset every Tuesday for training in data analysis.

The dataset we will use here records bee colony losses in the united states of america depending on various stresses.

We will represent the temporal evolution of these stresses with a bump chart, which are designed for exploring changes in rank over time.

1. Load packages and data

In R, bump charts may be easily created with the the {ggbump} package. In addition to this library, we will the {tidyverse} package to process and plot data:

# To install {ggstream}:
# install.packages("ggbump") 

# Load pacakges
library(tidyverse)
library(ggbump)   

About the dataset, there is two different tables :

  • colony, with the percent of colony losses per state and per period;

  • stressor, with the percent of colonies affected by various stressors.

We will load and merge these tables as follows:

# Loading tables
colony <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-01-11/colony.csv')
stressor <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-01-11/stressor.csv')

# Merge both tables by time period and state
mrg  <- stressor%>%
  left_join(colony,by=c("year","months","state"))

head(mrg)
## # A tibble: 6 × 12
##    year months        state  stressor stress_pct colony_n colony_max colony_lost
##   <dbl> <chr>         <chr>  <chr>         <dbl>    <dbl>      <dbl>       <dbl>
## 1  2015 January-March Alaba… Varroa …       10       7000       7000        1800
## 2  2015 January-March Alaba… Other p…        5.4     7000       7000        1800
## 3  2015 January-March Alaba… Disesas…       NA       7000       7000        1800
## 4  2015 January-March Alaba… Pestici…        2.2     7000       7000        1800
## 5  2015 January-March Alaba… Other           9.1     7000       7000        1800
## 6  2015 January-March Alaba… Unknown         9.4     7000       7000        1800
## # ℹ 4 more variables: colony_lost_pct <dbl>, colony_added <dbl>,
## #   colony_reno <dbl>, colony_reno_pct <dbl>

2. Prepare the data

Each year is divided into four periods, with the variable “months”. To create a single time variable, we’ll assign a value to each period and then add it to the year.

clean <- mrg%>%
  mutate(dec=case_when(
    months=="January-March"~0,
    months=="April-June"~0.25,
    months=="July-September"~0.5,
    months=="October-December"~0.75,
  ))%>%
  mutate(fac=year+dec)

We can now sum up the number of colonies lost per stressor and per period:

clean <- clean%>%
  group_by(fac,stressor)%>%
  summarize(
    count=sum(na.omit(stress_pct*colony_n)),
    # Keep information regarding year and season in table:
    year=year[1],
    season=months[1]
  )%>%
  ungroup()

Finally, we will assign a rank to each stress factor and period (starting with 1 for the most stressful factor).

clean <- clean%>%
  group_by(fac)%>%
  arrange(-count,.by_group=TRUE)%>%
  # Rabk of each stress factor per period:
  mutate(id = row_number())%>%
  ungroup()

3. Make bump plot

We are now ready to plot the data.

# Bump chart with lines
ggplot(
    data=clean,
    aes(x=fac,y=-id,color=stressor)
  )+
  geom_line(linewidth=2,alpha=0.8)+
  theme_light()

After a little research, one point seems strange. It can be removed as follows:

# Bump chart with lines
ggplot(
    data=clean%>%filter(fac!=2019.25),
    aes(x=fac,y=-id,color=stressor)
  )+
  geom_line(linewidth=2,alpha=0.8)+
  theme_light()

{ggbump} allows you to create more elegant graphics, with curves rather than straight lines. this can be achieved simply by replacing geom_line() by geom_bump() :

# Bump chart with lines
p_bump<-ggplot(
    data=clean%>%filter(fac!=2019.25),
    aes(x=fac,y=-id,color=stressor)
  )+
  geom_bump(linewidth=2,alpha=0.8)+
  theme_light()

p_bump

We may add points to illustrate the different period/season of a given year:

# Add  points:
p_bump<-p_bump+
  geom_point(
    aes(pch=season,fill=stressor),
    size=4,color="white"
  )+
  scale_shape_manual(
    values = c(21,22,23,24)
  )

p_bump

This makes it transparent that we have chosen to omit a measurement point.

For ease of reading, it’s also important to add labels directly to the graph:

# Prepare labels :
lab<-clean%>%
  # Keep only last period
  filter(fac==2021.25)%>%
  # Label with rank and stressor
  mutate(lab=glue::glue("{id}. {stressor}"))

# Make plot
p_bump<-p_bump+
  # Add label at the end of each line
  geom_text(
    data=lab,
    mapping=aes(
      x=fac+0.1,label=lab,
      color=stressor
    ),
    hjust=0
  )+
  # Remove legends
  guides(
    color='none',fill='none',pch='none'
  )+
  # Expand x-axis
  scale_x_continuous(
    limits=c(2015,2022.5)
  )+
  theme(
    panel.grid = element_blank(),
    axis.text.y=element_blank(),
    axis.title.y=element_blank()
  )

p_bump

We can see that varroa is the main stress responsible for causing colony collapse. The effect of pesticides is greatest in spring and summer, when treatments are more frequent.

References