Data for farming

class: center, middle, inverse, title-slide

# Data for farming
## 2. Data exploration and mapping
### Benjamin Nowak
### December 2021

---

background-image: url(fig/back/hill.jpg)
background-size: cover
class: center, bottom, inverse

# **Getting started**

### Installation of **R** and **RStudio**

---

# Getting started

1. [Download R](https://cran.r-project.org/) on CRAN
> *"The “Comprehensive R Archive Network” (CRAN) is a collection of sites which carry identical material, consisting of the R distribution(s), the contributed extensions, documentation for R, and binaries."*

2. [Download R Studio](https://www.rstudio.com/products/rstudio/download/#download)
> *"RStudio is an integrated development environment (IDE) for R. It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management."*

- Choose 'Free RStudio Desktop' to run RStudio on your local machine

---

# Getting started

- Once RStudio is installed, we are ready to create a new **Project**

- *File > New Project > New Directory*
.center[
 <img src="fig/R/projects_new.png" width=50%></img>
]

- Doing so, the **current working directory** is set to the **project directory** (where we can for example store the data sets related to this project in a **'Data'** folder)

---

# Getting started

.center[
 <img src="fig/R/Rproj.png" width=50%></img>
]

.footnote[
**Source:** [Musings on R (Martin Chan)](https://martinctc.github.io/blog/rstudio-projects-and-working-directories-a-beginner's-guide/)
]

---

# Getting started

- We can create **a new script** inside this project

- *File > New File > R Script*
  
- Although RStudio keeps track of the changes made to the script, we will save it before we start

- *File > Save as*
  
  - You may save it in a **'Script'** folder, as suggested in the previous slide

.center[**We are now ready to process data!**]

---

background-image: url(fig/back/sheep.jpg)
background-size: cover
class: center, bottom, inverse

# **First example**

### World's leading milk producers

---

# Load packages

- **The [{tidyverse}](https://www.tidyverse.org/):** Collection of R packages for data science. All packages share an underlying design philosophy, grammar, and data structures.

![](fig/R/tidyverse.png)

.footnote[
**Source** [Barnier, 2021](https://juba.github.io/tidyverse/index.html)
]

---

# Load packages

- **The [{tidyverse}](https://www.tidyverse.org/):** Collection of R packages for data science. All packages share an underlying design philosophy, grammar, and data structures.

- First lines of code will be to install and load these extensions in our R session

```r
# Install tidyverse (to do only once)
# install.packages("tidyverse")

# Load tidyverse (to repeat at each session)
library(tidyverse)            
```
         
---

# Import data

- Data in this tutorial are related to **worlwide milk production** and **milk trades** (since 1990).
  - Data sets extacted from the [Food and Agriculture Organization (FAO)](https://www.fao.org/statistics/en/) data base.
  
- With the [{tidyverse}](https://www.tidyverse.org/), we may use [{readr}](https://readr.tidyverse.org/) to load data sets into our session
  - Compared to base functions, {readr} functions are much faster to import data sets, and include a progress bar for large data set that take a longer time to load

---

# Import data

- The data are available online, and can be loaded directly into R, without downloading
  - You may copy/paste [this link]('https://raw.githubusercontent.com/BjnNowak/CultivatedPlanet/main/data/FAO_AnimalProductivity_subset.csv') inside **read_csv()** to import the data

```r
# Import data
data <- readr::read_csv(
 'https://raw.githubusercontent.com/BjnNowak/CultivatedPlanet/main/data/FAO_AnimalProductivity_subset.csv'
)
# To take a look at 3 first lines
# head(data,3) 
```

---

# Data structure

- This data table is already close to the 'tidy' syntax and will require little modification

.center[
 <img src="fig/R/tidydata_1.jpg" width=70%></img>
]

.footnote[ 
  **Picture** [Alison Horst](https://github.com/allisonhorst) 
]

---

# Data processing

- Thanks to the pipe operator (**%>%**), we may perform a sequence of actions on the same data set.

- Most of table modifications are supported by the [{dplyr}](https://dplyr.tidyverse.org/) package. 
  - Below some examples of data processing:

```r
data<-data%>%
 # Rename columns with space
 dplyr::rename(
 # New name = Old name
 AreaCode = 'Area Code',
 ItemCode = 'Item Code',
 ElementCode = 'Element Code'
 )%>%
 # Suppress column that we don't need
 dplyr::select(-'Year Code')%>%
 # Remove one row
 dplyr::filter(Area!='China, mainland')
```

---

# Data processing

- In this example, we want to identify the main milk producers. First, we must identify the factors corresponding to milk production in our dataset.

- To do so, we will detect factors with 'milk' inside their names with [**str_detect()**](https://www.rdocumentation.org/packages/stringr/versions/1.4.0/topics/str_detect), then extract the name of these factors with [**pull()**](https://dplyr.tidyverse.org/reference/pull.html) and store them in a *'milks'* vector for further filtering

```r
milks<-data%>%
 # Keep only one 'case study'
 filter(Area=='World'&Year==2018)%>%
 # Keep only milk products
 filter(
 stringr::str_detect(
 Item, # Column to search
 fixed(
 'milk', # Pattern to search
 ignore_case=TRUE # To ignore case
 )))%>%
 dplyr::pull(Item) # Column to extract
```
---

# Data processing

- We may now use this vector to filter our data set and keep only rows related to **milk production of each country** since 1990.

```r
countries <- data%>%
 # Keep only rows related to milk production
 filter(Item %in% milks)%>%
 # Keep only data at country scale
 # (Code >5000 for continental & world statistics)
 filter(AreaCode<5000)
```

---

# Statistics for each country

- To calculate statistics for each country, we need to convert our table into a 'grouped' table with [**group_by()**](https://dplyr.tidyverse.org/reference/group_by.html)

.pull-left[

<img src="fig/R/group_by_ungroup.png" width=100%></img>
 
 **Picture** [Alison Horst](https://github.com/allisonhorst)
]

.pull-right[

- Look of the table will not change but the functions will be applied per group (not on the whole table)

- After **group_by()**, two possibilities: add a new column with **mutate()** OR create a resume of the statistics with **summarize()**

]

- Finally, use **ungroup()** to remove grouping.

---

# Statistics for each country

- Here we will use **group_by()** to compute **total milk production** since 1990 **for each country** and **for each type of milk**.

```r
total<-countries %>%
 group_by(Area,Item)%>% # Group by Country & Milk type
 summarise(
 production = sum(Value) # Sum annual production
 )%>%
 ungroup()

head(total,5)                # Show first 5 rows
```

```
## # A tibble: 5 x 3
## Area Item production
## <chr> <chr> <dbl>
## 1 Afghanistan Milk, whole fresh cow 41346067
## 2 Afghanistan Milk, whole fresh goat 3055601
## 3 Afghanistan Milk, whole fresh sheep 5843398
## 4 Albania Milk, whole fresh cow 24911316
## 5 Albania Milk, whole fresh goat 2207112
```

---

# Top 3 producers

- The [**slice()**](https://dplyr.tidyverse.org/reference/slice.html) function allows to extract specific rows from a data table.

-  It is accompanied by a number of helpers for common use cases
  - **slice_sample()** for random selection
  - **slice_min()** for selection of rows with lowest values...
  
- In order to extract **the top 3 producers of each type of milk**, we will use a combination of **group_by()** and **slice_max()**

---

# Top 3 producers

```r
ranking<-total%>%
 group_by(Item)%>% # To get top producers for each milk type
 dplyr::slice_max(
 production, # Get max values for production
 n=3 # Only keep top 3 producers
 )%>%
 ungroup()

ranking
```

```
## # A tibble: 9 x 3
## Area Item production
## <chr> <chr> <dbl>
## 1 United States of America Milk, whole fresh cow 2437081988
## 2 India Milk, whole fresh cow 1389268000
## 3 Russian Federation Milk, whole fresh cow 936533056
## 4 India Milk, whole fresh goat 119981705
## 5 Bangladesh Milk, whole fresh goat 54769040
## 6 Sudan (former) Milk, whole fresh goat 26470691
## 7 China Milk, whole fresh sheep 31689509
## 8 Turkey Milk, whole fresh sheep 28828040
## 9 Greece Milk, whole fresh sheep 22886995
```

---

# Main cow's milk producers?

.center[ **You may take some time to explore the main cow's milk producers** *Top 20 producers?* *Percentage of world's milk produced by each country?* *...*]

---

background-image: url(fig/back/cows.jpg)
background-size: cover
class: center, bottom, inverse

# **2nd example**

### Mapping **cow's milk** trade balances in Europe

---

# Import data

- For this example, we will work with a data set summarizing **fresh milk trades** between countries since 1990, also extracted from [the FAO database](https://www.fao.org/statistics/en/)
  - More precisely, we will work on a subset: **cow's whole milk trade for 2018**

- Data set is avaible at [this link](https://raw.githubusercontent.com/BjnNowak/CultivatedPlanet/main/data/FAO_MilkTrade_subset.csv) (copy/paste into **read_csv()** to import)

```r
trade <-read_csv(
 # Paste link inside read_csv()
 'https://raw.githubusercontent.com/BjnNowak/CultivatedPlanet/main/data/FAO_MilkTrade_subset.csv'
 )%>%
 # Subset: only cow whole milk trade for year 2018
 filter(Item=='Milk, whole fresh cow')%>%
 filter(Year=='2018')
```

---

# Filter imports and exports

- The dataset includes milk exports and imports, expressed in quantity (*tons*) and value (*dollars*)
  - We will focus on the **quantities** traded

- We will create two tables: one for the imports, one for the exports

```r
# One table with imports 
imp<-trade%>%
 filter(Element=='Import Quantity')%>%
 rename(Import=Value)

# One table with exports
exp<-trade%>%
 filter(Element=='Export Quantity')%>%
 rename(Export=Value)
```

---

# Compute trade balance

- We will now join the imports and exports table by country name to compute each country trade balance (*Exports-Imports*).

- Main possible [**table joins**](https://dplyr.tidyverse.org/reference/mutate-joins.html) are:

.center[
 <img src="fig/R_2/joins.png" width=70%></img>
]

---

# Compute trade balance

- Here, we will perform a **full_join()** between imports and exports tables, to get all trades (even for countries with only imports or exports)

```r
data <- exp%>%
 full_join(
 imp, # Order does not matter for full_join()
 by='Area' # Merge by country names
 )
```

- Doing so, empty values (ex: countries with no exports) have been replaced with NA. Before computing the trade balance, we may replace these values by 0.

---

# Compute trade balance

- Replace NA by 0 with **mutate()** and **case_when()**:

```r
data <- data %>%
 mutate(Export=case_when(
 is.na(Export)~0,
 TRUE~Export
 ))%>%
 mutate(Import=case_when(
 is.na(Import)~0,
 TRUE~Import
 ))
```

---

# Compute trade balance

- Compute trade balance and keep only few columns:

```r
data <- data%>%
 mutate(
 Balance = Export-Import
 )%>%
 select(Area,Export,Import,Balance)%>%
 arrange(Balance) # Sort table by trade balance

# Show most negative trade balance
head(data,3)
```

```
## # A tibble: 3 x 4
## Area Export Import Balance
## <chr> <dbl> <dbl> <dbl>
## 1 Italy 46764 1113711 -1066947
## 2 Germany 1611316 2353680 -742364
## 3 Ireland 32221 757811 -725590
```

---

# Mapping trade balance

.pull-left[
.center[
 <img src="fig/R_2/sf.jpg" width=100%></img>
]
**Picture** [Alison Horst](https://github.com/allisonhorst)
]

.pull-right[
- To map trade balances, we will use the [**{sf}**](https://github.com/r-spatial/sf/) extension

- *'sf'* means *'simple features'* 
> [Simple Features](https://en.wikipedia.org/wiki/Simple_Features) is a set of standards that specify a common storage and access model of geographic feature

]

---

# Load extensions

- In addition to **{sf}**, we will use the [**{maps}**](https://cran.r-project.org/web/packages/maps/maps.pdf) extension, which provides some basic maps for R (such as world countries map, that we will use as a basemap to show the computed trade balances)

```r
# Install extensions (to do only once)
# install.packages("sf")
# install.packages("maps")

# Load extensions (to repeat at each session)
library(sf)
library(maps)
```

---

# Load world basemap

- Now, with these two extension, we are ready to **(i)** load world basemap and **(ii)** convert it to an {sf} object with [st_as_sf()](https://r-spatial.github.io/sf/reference/st_as_sf.html)

```r
world<-maps::map( # Load world map
 database="world", 
 plot = FALSE, # Hide plot output
 fill = TRUE
 )%>%
 # Convert foreign object 
 # to an sf object
 sf::st_as_sf()
```

---

# Plotting maps with {ggplot2}

- **{sf}** object may be plotted with [{ggplot2}](https://ggplot2.tidyverse.org/) using [**geom_sf()**](https://ggplot2.tidyverse.org/reference/ggsf.html)

- **Basic structure of a ggplot:**

```r
plot1 <- ggplot(
 # Specify data set
 data = data_set,
 # Specify x- and y- axis
 aes(x=variable1, y=variable2)
 )+ # Add new layer to plot with '+'
 geom_point(
 # May also use data/aes inside geom
 aes(color=variable3)
 )
```

---

# World basemap

- For our case study, we may start by plotting world basemap as follows:

.pull-left[

```r
mp <- ggplot(data=world)+
 geom_sf(
 fill='grey80',
 color='white',
 size=0.3
 )

mp
```
]

.pull-right[
![](2nd_session_R_files/figure-html/p1-out-1.png)
]

---

# Europe basemap

- If we want to focus on Europe, we can center our map on this continent

.pull-left[

```r
# Bounding box for Europe
lon=c(-14,33)
lat=c(35,63)

# Zoom on Europe
mp<-mp+
 scale_x_continuous(
 limits=lon
 )+
 scale_y_continuous(
 limits=lat
 )+
 # Better theme for maps
 theme_void()
mp
```
]

.pull-right[
![](2nd_session_R_files/figure-html/p2-out-1.png)
]

---

# Merge trade data

- In order to add trade data to the map, we need to merge this table with world basemap (this time with **left_join()** to keep only rows with trade balance values)

```r
# New column named 'Area' (same name as in trade data) in world map
world<-world%>%mutate(Area=ID)

# Adding spatial information to trade balance table:
data<-data%>%
 # Name 'cleaning' (for European countries) before merging
 mutate(Area=case_when(
 # Name in trade table ~ Name in world map
 Area=='Czechia'~'Czech Republic',
 Area=='North Macedonia'~'Macedonia',
 Area=='Republic of Moldova'~'Moldova',
 Area=='Russian Federation'~'Russia',
 Area=='United Kingdom of Great Britain and Northern Ireland'~'UK',
 TRUE~Area
 ))%>%
 left_join(world)
```

---

# Trade balance in Europe

- We can now add trade balance to our map

.pull-left[

```r
mp<-mp+
 geom_sf(
 data=data,
 aes(
 fill=Balance,
 # Spatial info stored in 'geom' column
 geometry=geom
 ),
 color='white',size=0.3)
mp
```
]

.pull-right[
![](2nd_session_R_files/figure-html/p3-out-1.png)
]

---

# Map customization

- As for 'classic' ggplot objects, we may adjust [color gradient](https://ggplot2.tidyverse.org/reference/scale_gradient.html) (here with **scale_fill_gradient2()** to create a divergent palette)

.pull-left[

```r
mp<-mp+
 scale_fill_gradient2(
 low = "#de425b",
 mid = "#fff18f",
 high = "#488f31",
 midpoint = 0,
 breaks = c(-1000000,-500000,0,500000),
 labels = glue::glue("{c(-1,-0.5,0,0.5)}")
 )

mp
```
]

.pull-right[
![](2nd_session_R_files/figure-html/p4-out-1.png)
]

.footnote[
**Note:** Useful [link](https://learnui.design/tools/data-color-picker.html#divergent) for divergent color palette by **learnui.design**
]

---

# Map customization

- We may also customize labels

.pull-left[

```r
mp<-mp+
 labs(
 title="Cow's milk trade",
 fill="Balance\n(million tons)",
 caption="Data: FAO (Year 2018)"
 )

mp
```
]

.pull-right[
![](2nd_session_R_files/figure-html/p5-out-1.png)
]

---

# Create custom theme

- As we did in our [first session](https://bjnnowak.github.io/Lessons/introduction_R#1), we may create a custom theme for more customization

```r
# Save custom theme
custom_theme<-theme(
 plot.title=element_text(size=25,color='grey20',face='bold'),
 legend.title=element_text(size=20,color='grey20'),
 legend.text=element_text(size=15,color='grey40'),
 plot.caption=element_text(size=15,color='grey40')
)

# Apply custom theme to plot
mp<-mp+
 custom_theme
```

---

# Main conclusions

- What do we learn with this map?

.pull-left[

- Most negative balance for Italy, most positive balance for Austria

- Although Ireland and Germany are important producers, they import more milk than they export

- Overall, trades are relatively low compared to countries production (>5%)

]

.pull-right[
![](2nd_session_R_files/figure-html/p6-out-1.png)
]

---

background-image: url(fig/back/cow2.jpg)
background-size: cover
class: center, bottom, inverse

# **Going further**

### Example of plots made with {ggplot2}

---

# Going further...

- Following a similar workflow, we may add milk production for each country to the same map and [more text customization](https://bjnnowak.netlify.app/2021/09/05/r-changing-plot-fonts/)

.center[
 <img src="fig/R_2/MilkTrade.png" width=60%></img>
]

---

# Going further...

- Mix [chloropleth and time series](https://bjnnowak.netlify.app/2021/09/28/r-hybrid-map-chlorpleth-x-time-series/) with [{geofacet}](https://cran.r-project.org/web/packages/geofacet/vignettes/geofacet.html)

.center[
 <img src="fig/R_2/milk.png" width=100%></img>
]