Crop model with R and {tidyverse}

Benjamin Nowak

Introduction

In this tutorial, we will use the tidyverse to code the first part of a crop model: the estimation of the number of plant leaves from temperature data
The model used in this tutorial is based on the work of Ringeval et al. (2021)
This model will be applied to corn

Model structure

Estimation of corn yield is done in three steps:

1. Estimation of number of leaves (based on daily temperature)

2. Estimation of the amount of photosynthetically active radiation intercepted by the plants

3. Conversion of this amount of radiation into biomass (first plant then grain)

Model equations: Equation 1

For this tutorial, we will only code the equations related to the first part of the model: the estimation of the number of leaves from daily temperature

1. Thermal time

For any day, the thermal time (\(TT\), in °C) is computed from the daily mean temperature (\(tas\), in °C) by using a reference temperature under which plant growth stops (\(T_{0}\)): \[ \quad\text{(Eq. 1)}\quad \begin{cases} TT(day) = tas(day)-T_{0}, & tas(day)>T_{0} \\ TT(day) =0 ,& tas(day)\leq T_{0} \end{cases} \]

Model equations: Equation 2

For this tutorial, we will only code the equations related to the first part of the model: the estimation of the number of leaves from daily temperature

2. Sum of degree days

Then, the sum of growing degree days (\(GDD\), in °C), may be defined as follows: \[ \quad\text{(Eq. 2)}\quad GDD(day)=\sum_{i} TT(i) \]

Model equations: Equation 3

For this tutorial, we will only code the equations related to the first part of the model: the estimation of the number of leaves from daily temperature

3. Number of leaves

Finally, the number of leaves per plant (\(nleaf\)) is computed from GDD and two parameters: one representing the thermal requirement for the emergence of one leaf (\(GDD_{1leaf}\), in °C), the other the maximum number of leaves per plant (\(max_{nleaf}\))
\[ \quad\text{(Eq. 3)}\quad nleaf = min(max_{nleaf},\frac{GDD(day)}{GDD_{1leaf}}) \]

Model equations: Summary

For this tutorial, we will only code the equations related to the first part of the model: the estimation of the number of leaves from daily temperature

Summary

The model is made off:

One input variable: the daily mean T° (\(tas\))
Two internal variables: the thermal time (\(TT\)) and the sum of degree days (\(GDD\))
Three parameters: the reference T° (\(T_{0}\)), the phyllochron (\(GDD_{1leaf}\)) and the max number of leaves \(max_{nleaf}\))
One output variable: the number of leafs (\(nleaf\))

How are we going to code that model?

Welcome to the tidyverse

The tidyverse is a collection of extensions designed to work together and based on a common philosophy, making data manipulation and plotting easier

Core packages of the tidyverse (Barnier, 2021)

Welcome to the tidyverse

The “tidy” philosophy (Image credit: Allison Horst)

Welcome to the tidyverse

Ready? Let’s start by loading the tidyverse packages

# Install tidyverse (to do once)
# install.packages("tidyverse")

# Load tidyverse (to repeat at each session)
library(tidyverse)

Next step will be to load the input file with weather data

Load input file

For this tutorial, we will compare corn yield potential for two cities in the USA :

Des Moines (Iowa)
Sandstone (Minnesota)

We will start with Des Moines

Load input file

The input file may be downloaded from this link, or loaded directly in R as follows:

# Load data 
input <- readr::read_delim(
  file="https://raw.githubusercontent.com/BjnNowak/CropModel/main/Weather_DesMoines.csv",
  delim=";"
)
# Check first lines
head(input)

# A tibble: 6 × 28
  WBANNO LST_DATE CRX_VN LONGITUDE LATITUDE T_DAILY_MAX T_DAILY_MIN T_DAILY_MEAN
   <dbl>    <dbl>  <dbl>     <dbl>    <dbl>       <dbl>       <dbl>        <dbl>
1  54902 20200101   2.62     -93.3     41.6         9.8        -5.6          2.1
2  54902 20200102   2.62     -93.3     41.6         9.5         0.1          4.8
3  54902 20200103   2.62     -93.3     41.6         2.5        -0.9          0.8
4  54902 20200104   2.62     -93.3     41.6         0.2        -5.7         -2.8
5  54902 20200105   2.62     -93.3     41.6         7.4        -2.9          2.2
6  54902 20200106   2.62     -93.3     41.6         6.7        -6.1          0.3
# ℹ 20 more variables: T_DAILY_AVG <dbl>, P_DAILY_CALC <dbl>,
#   SOLARAD_DAILY <dbl>, SUR_TEMP_DAILY_TYPE <chr>, SUR_TEMP_DAILY_MAX <dbl>,
#   SUR_TEMP_DAILY_MIN <dbl>, SUR_TEMP_DAILY_AVG <dbl>, RH_DAILY_MAX <dbl>,
#   RH_DAILY_MIN <dbl>, RH_DAILY_AVG <dbl>, SOIL_MOISTURE_5_DAILY <dbl>,
#   SOIL_MOISTURE_10_DAILY <dbl>, SOIL_MOISTURE_20_DAILY <dbl>,
#   SOIL_MOISTURE_50_DAILY <dbl>, SOIL_MOISTURE_100_DAILY <dbl>,
#   SOIL_TEMP_5_DAILY <dbl>, SOIL_TEMP_10_DAILY <dbl>, …

Clean input file

As a first step, we will:

select only the column with date, daily mean temperature and solar radiation (the only climate variables used in the model)
rename the variables in the same way as in the model, for more clarity

# Creating 'data' from 'input', 
# with only date, mean T° and radiation
data<-input%>%                  
  dplyr::select(                
    date = LST_DATE,         # New name = Old name
    tas = T_DAILY_MEAN,
    rsds = SOLARAD_DAILY        
  ) 

# Check first lines
head(data)

# A tibble: 6 × 3
      date   tas  rsds
     <dbl> <dbl> <dbl>
1 20200101   2.1  6.81
2 20200102   4.8  7.23
3 20200103   0.8  1.49
4 20200104  -2.8  3.17
5 20200105   2.2  7.75
6 20200106   0.3  5.87

Clean input file

You may have notice that the date format is not very easy to read

Using the mutate() function, we will replace this column with one containing the day’s number (starting from 1 for January 1, 2020)

# Creating 'data' from 'input', 
# with only date, mean T° and radiation
data<-data%>%
  # Arrange table by chronological order
  dplyr::arrange(date)%>%
  # Add a new column with day number
  dplyr::mutate(                
    day_number = row_number()     
  )%>%
  # Drop old date column and reorganize columns
  select(day_number,tas,rsds)

# Check first lines
head(data)

# A tibble: 6 × 3
  day_number   tas  rsds
       <int> <dbl> <dbl>
1          1   2.1  6.81
2          2   4.8  7.23
3          3   0.8  1.49
4          4  -2.8  3.17
5          5   2.2  7.75
6          6   0.3  5.87

Clean input file

Last cleaning step!

In this tutorial, we are interested in corn development

Therefore, we will keep only dates between standard sowing and harvest dates for the area

To do so, we will use the filter() function.

day_sowing<-92                # Sowing after 1st May
day_harvest<-287              # Harvest ~ mid-october

data<-data%>%
  dplyr::filter(
    day_number>=day_sowing,                  
    day_number<=day_harvest
  )

head(data)

# A tibble: 6 × 3
  day_number   tas  rsds
       <int> <dbl> <dbl>
1         92   9.7 19.4 
2         93  13.6 10.2 
3         94   4.3  1.72
4         95   3.1 18.1 
5         96   7.7 22.2 
6         97  13.2  4.78

Clean input file: Summary

Lost? All cleaning steps may be summarized by the following “pipe”:

day_sowing<-92                # Sowing after 1st May
day_harvest<-287              # Harvest ~ mid-october

data<-readr::read_delim("https://raw.githubusercontent.com/BjnNowak/CropModel/main/Weather_DesMoines.csv",delim=";")%>%
  arrange(LST_DATE)%>%
  mutate(day_number = row_number())%>%
  select(day_number,tas = T_DAILY_MEAN,rsds = SOLARAD_DAILY)%>%
  filter(day_number>=day_sowing, day_number<=day_harvest)

# Check first lines
head(data)

# A tibble: 6 × 3
  day_number   tas  rsds
       <int> <dbl> <dbl>
1         92   9.7 19.4 
2         93  13.6 10.2 
3         94   4.3  1.72
4         95   3.1 18.1 
5         96   7.7 22.2 
6         97  13.2  4.78

We may now start to code the model!

Coding model equations: Equation 1

The first internal variable to be calculated is the thermal time (\(TT\), in °C)
It is computed as follows: \[ \quad\text{(Eq. 1)}\quad \begin{cases} TT(day) = tas(day)-T_{0}, & tas(day)>T_{0} \\ TT(day) =0 ,& tas(day)\leq T_{0} \end{cases} \]

We will perform this calculation by adding a new column to our data table, combining the mutate() function with the case_when() function to implement conditionality.

T0<-6 # Set T0 for corn: 6°C

model<-data%>%
  dplyr::mutate(
    TT=dplyr::case_when(
      tas<T0~0,             # Condition 1 ~ Column value
      tas>=T0~tas-T0        # Condition 2 ~ Column value
    )
  )

Coding model equations: Equation 2

Once \(TT\) is calculated, we can directly calculate the sum of growing degree days (\(GDD\)) following (Eq.2): \[ \begin{align}\quad\text{(Eq. 2)}\quad GDD(day)=\sum_{i} TT(i)\end{align} \]

This will be done with the cumsum() function from base R

model<-model%>%
  mutate(GDD = cumsum(TT))

# Print last rows
tail(model)

# A tibble: 6 × 5
  day_number   tas  rsds    TT   GDD
       <int> <dbl> <dbl> <dbl> <dbl>
1        282  15.4  15.5   9.4 2418.
2        283  21.7  14.7  15.7 2434.
3        284  19.6  15.1  13.6 2447.
4        285  19.8  14.6  13.8 2461.
5        286  14.4  16.3   8.4 2469.
6        287  12.7  16.0   6.7 2476

Coding model equations: Equation 3

Following (Eq.3), we are now ready to compute the number of leafs \(nleaf\) as the ratio between the sum of temperature \(GDD\) and the phyllochron \(GDD_{1leaf}\): \[ \quad\text{(Eq. 3)}\quad nleaf = min(max_{nleaf},\frac{GDD(day)}{GDD_{1leaf}}) \]

We are going to separate this equation into two parts:

calculating the number of potential leaves
then compare with the maximum number of possible leaves

# Set parameters: 
GDD_1leaf<-50 # Phyllochron
max_nleaf<-20 # Max number of leaves

model<-model%>%
  mutate(
    # Potential number of leaves (no max values)
    pot_nleaf = GDD/GDD_1leaf,
    # Estimated number of leaves (including max)
    nleaf = case_when(
      pot_nleaf<=max_nleaf~round(pot_nleaf),
      pot_nleaf>max_nleaf~max_nleaf
    )
  )

Use the model: Plot the results

Now we are ready to make a first representation of the model output with {ggplot2}: the evolution of the number of leaves during the growing season

p1<-ggplot2::ggplot(
    data=model,                           
    aes(x=day_number, y=nleaf)    
  )+
  geom_point(color="forestgreen")+
  labs(                                  
    title = "Modelisation of the number of leaves",
    x = "Day number",
    y = "Number of leaves"
  )

p1

Use the model: Make a function

To be able to reuse the model easily, it’s better to integrate it into a function

The following code shows the structure of a function in R

function_name <- function(arguments) {
    instructions
    return(results)
}

Use the model: Make a function

To be able to reuse the model easily, it’s better to integrate it into a function

Applied to our case study, a code example for the modelisation of the number of leaves is summarized here

Should the data cleaning stage also be integrated?
Should we add the base temperature, \(T_{0}\), as a function argument?

fun_model = function(data,GDD_1leaf){
  
  model<-data%>%
    mutate(TT=dplyr::case_when(tas<T0~0,tas>=T0~tas-T0))%>%
    mutate(GDD = cumsum(TT))%>%
    mutate(
      pot_nleaf = GDD/GDD_1leaf,
      nleaf = case_when(
        pot_nleaf<=max_nleaf~round(pot_nleaf),
        pot_nleaf>max_nleaf~max_nleaf
  ))

  return(model)
}

Use the model: Applications!

Using the newly created function, make a plot to visualize the effect of a reduction of the phyllochron, \(GDD_{1leaf}\), from 50°C to 40°C (thanks to plant breeding for example)

p2<-ggplot2::ggplot()+
  geom_point(
    data=fun_model(data=data,GDD_1leaf=50),                           
    aes(x=day_number, y=nleaf),  
    color="forestgreen"
  )+
    geom_point(
    data=fun_model(data=data,GDD_1leaf=40),                           
    aes(x=day_number, y=nleaf),  
    color="red"
  )+
  labs(                                  
    title = "Modelisation of the number of leaves",
    x = "Day number",
    y = "Number of leaves"
  )

p2

Use the model: Applications!

Using the newly created function, make a plot to compare the evolution of the number of leaves in DesMoines and Sandstone

The input file for Sandstone may be downloaded from this link, or loaded directly in R as follows:

# Load Sandstone data 
input_sandstone <- readr::read_delim(
  file="https://raw.githubusercontent.com/BjnNowak/CropModel/main/Weather_Sandstone.csv",
  delim=";"
)

End of the first part

Next: Model optimization