In this tutorial, we will use the tidyverse to code the first part of a crop model: the estimation of the number of plant leaves from temperature data
The model used in this tutorial is based on the work of Ringeval et al. (2021)
This model will be applied to corn
Estimation of corn yield is done in three steps:
1. Estimation of number of leaves (based on daily temperature)
2. Estimation of the amount of photosynthetically active radiation intercepted by the plants
3. Conversion of this amount of radiation into biomass (first plant then grain)
For this tutorial, we will only code the equations related to the first part of the model: the estimation of the number of leaves from daily temperature
For any day, the thermal time (\(TT\), in °C) is computed from the daily mean temperature (\(tas\), in °C) by using a reference temperature under which plant growth stops (\(T_{0}\)): \[ \quad\text{(Eq. 1)}\quad \begin{cases} TT(day) = tas(day)-T_{0}, & tas(day)>T_{0} \\ TT(day) =0 ,& tas(day)\leq T_{0} \end{cases} \]
For this tutorial, we will only code the equations related to the first part of the model: the estimation of the number of leaves from daily temperature
Then, the sum of growing degree days (\(GDD\), in °C), may be defined as follows: \[ \quad\text{(Eq. 2)}\quad GDD(day)=\sum_{i} TT(i) \]
For this tutorial, we will only code the equations related to the first part of the model: the estimation of the number of leaves from daily temperature
Finally, the number of leaves per plant (\(nleaf\)) is computed from GDD and two parameters: one representing the thermal requirement for the emergence of one leaf (\(GDD_{1leaf}\), in °C), the other the maximum number of leaves per plant (\(max_{nleaf}\))
\[
\quad\text{(Eq. 3)}\quad nleaf = min(max_{nleaf},\frac{GDD(day)}{GDD_{1leaf}})
\]
For this tutorial, we will only code the equations related to the first part of the model: the estimation of the number of leaves from daily temperature
The model is made off:
One input variable: the daily mean T° (\(tas\))
Two internal variables: the thermal time (\(TT\)) and the sum of degree days (\(GDD\))
Three parameters: the reference T° (\(T_{0}\)), the phyllochron (\(GDD_{1leaf}\)) and the max number of leaves \(max_{nleaf}\))
One output variable: the number of leafs (\(nleaf\))
The tidyverse is a collection of extensions designed to work together and based on a common philosophy, making data manipulation and plotting easier
Core packages of the tidyverse (Barnier, 2021)
The “tidy” philosophy (Image credit: Allison Horst)
Ready? Let’s start by loading the tidyverse packages
Next step will be to load the input file with weather data
For this tutorial, we will compare corn yield potential for two cities in the USA :
Des Moines (Iowa)
Sandstone (Minnesota)
We will start with Des Moines
The input file may be downloaded from this link, or loaded directly in R as follows:
# Load data
input <- readr::read_delim(
file="https://raw.githubusercontent.com/BjnNowak/CropModel/main/Weather_DesMoines.csv",
delim=";"
)
# Check first lines
head(input)
# A tibble: 6 × 28
WBANNO LST_DATE CRX_VN LONGITUDE LATITUDE T_DAILY_MAX T_DAILY_MIN T_DAILY_MEAN
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 54902 20200101 2.62 -93.3 41.6 9.8 -5.6 2.1
2 54902 20200102 2.62 -93.3 41.6 9.5 0.1 4.8
3 54902 20200103 2.62 -93.3 41.6 2.5 -0.9 0.8
4 54902 20200104 2.62 -93.3 41.6 0.2 -5.7 -2.8
5 54902 20200105 2.62 -93.3 41.6 7.4 -2.9 2.2
6 54902 20200106 2.62 -93.3 41.6 6.7 -6.1 0.3
# ℹ 20 more variables: T_DAILY_AVG <dbl>, P_DAILY_CALC <dbl>,
# SOLARAD_DAILY <dbl>, SUR_TEMP_DAILY_TYPE <chr>, SUR_TEMP_DAILY_MAX <dbl>,
# SUR_TEMP_DAILY_MIN <dbl>, SUR_TEMP_DAILY_AVG <dbl>, RH_DAILY_MAX <dbl>,
# RH_DAILY_MIN <dbl>, RH_DAILY_AVG <dbl>, SOIL_MOISTURE_5_DAILY <dbl>,
# SOIL_MOISTURE_10_DAILY <dbl>, SOIL_MOISTURE_20_DAILY <dbl>,
# SOIL_MOISTURE_50_DAILY <dbl>, SOIL_MOISTURE_100_DAILY <dbl>,
# SOIL_TEMP_5_DAILY <dbl>, SOIL_TEMP_10_DAILY <dbl>, …
As a first step, we will:
select only the column with date, daily mean temperature and solar radiation (the only climate variables used in the model)
rename the variables in the same way as in the model, for more clarity
# Creating 'data' from 'input',
# with only date, mean T° and radiation
data<-input%>%
dplyr::select(
date = LST_DATE, # New name = Old name
tas = T_DAILY_MEAN,
rsds = SOLARAD_DAILY
)
# Check first lines
head(data)
# A tibble: 6 × 3
date tas rsds
<dbl> <dbl> <dbl>
1 20200101 2.1 6.81
2 20200102 4.8 7.23
3 20200103 0.8 1.49
4 20200104 -2.8 3.17
5 20200105 2.2 7.75
6 20200106 0.3 5.87
You may have notice that the date format is not very easy to read
Using the mutate() function, we will replace this column with one containing the day’s number (starting from 1 for January 1, 2020)
# Creating 'data' from 'input',
# with only date, mean T° and radiation
data<-data%>%
# Arrange table by chronological order
dplyr::arrange(date)%>%
# Add a new column with day number
dplyr::mutate(
day_number = row_number()
)%>%
# Drop old date column and reorganize columns
select(day_number,tas,rsds)
# Check first lines
head(data)
# A tibble: 6 × 3
day_number tas rsds
<int> <dbl> <dbl>
1 1 2.1 6.81
2 2 4.8 7.23
3 3 0.8 1.49
4 4 -2.8 3.17
5 5 2.2 7.75
6 6 0.3 5.87
Last cleaning step!
In this tutorial, we are interested in corn development
Therefore, we will keep only dates between standard sowing and harvest dates for the area
To do so, we will use the filter() function.
day_sowing<-92 # Sowing after 1st May
day_harvest<-287 # Harvest ~ mid-october
data<-data%>%
dplyr::filter(
day_number>=day_sowing,
day_number<=day_harvest
)
head(data)
# A tibble: 6 × 3
day_number tas rsds
<int> <dbl> <dbl>
1 92 9.7 19.4
2 93 13.6 10.2
3 94 4.3 1.72
4 95 3.1 18.1
5 96 7.7 22.2
6 97 13.2 4.78
Lost? All cleaning steps may be summarized by the following “pipe”:
day_sowing<-92 # Sowing after 1st May
day_harvest<-287 # Harvest ~ mid-october
data<-readr::read_delim("https://raw.githubusercontent.com/BjnNowak/CropModel/main/Weather_DesMoines.csv",delim=";")%>%
arrange(LST_DATE)%>%
mutate(day_number = row_number())%>%
select(day_number,tas = T_DAILY_MEAN,rsds = SOLARAD_DAILY)%>%
filter(day_number>=day_sowing, day_number<=day_harvest)
# Check first lines
head(data)
# A tibble: 6 × 3
day_number tas rsds
<int> <dbl> <dbl>
1 92 9.7 19.4
2 93 13.6 10.2
3 94 4.3 1.72
4 95 3.1 18.1
5 96 7.7 22.2
6 97 13.2 4.78
The first internal variable to be calculated is the thermal time (\(TT\), in °C)
It is computed as follows: \[
\quad\text{(Eq. 1)}\quad
\begin{cases}
TT(day) = tas(day)-T_{0}, & tas(day)>T_{0} \\
TT(day) =0 ,& tas(day)\leq T_{0}
\end{cases}
\]
We will perform this calculation by adding a new column to our data table, combining the mutate() function with the case_when() function to implement conditionality.
Once \(TT\) is calculated, we can directly calculate the sum of growing degree days (\(GDD\)) following (Eq.2): \[
\begin{align}\quad\text{(Eq. 2)}\quad GDD(day)=\sum_{i} TT(i)\end{align}
\]
This will be done with the cumsum() function from base R
# A tibble: 6 × 5
day_number tas rsds TT GDD
<int> <dbl> <dbl> <dbl> <dbl>
1 282 15.4 15.5 9.4 2418.
2 283 21.7 14.7 15.7 2434.
3 284 19.6 15.1 13.6 2447.
4 285 19.8 14.6 13.8 2461.
5 286 14.4 16.3 8.4 2469.
6 287 12.7 16.0 6.7 2476
Following (Eq.3), we are now ready to compute the number of leafs \(nleaf\) as the ratio between the sum of temperature \(GDD\) and the phyllochron \(GDD_{1leaf}\): \[
\quad\text{(Eq. 3)}\quad nleaf = min(max_{nleaf},\frac{GDD(day)}{GDD_{1leaf}})
\]
We are going to separate this equation into two parts:
calculating the number of potential leaves
then compare with the maximum number of possible leaves
# Set parameters:
GDD_1leaf<-50 # Phyllochron
max_nleaf<-20 # Max number of leaves
model<-model%>%
mutate(
# Potential number of leaves (no max values)
pot_nleaf = GDD/GDD_1leaf,
# Estimated number of leaves (including max)
nleaf = case_when(
pot_nleaf<=max_nleaf~round(pot_nleaf),
pot_nleaf>max_nleaf~max_nleaf
)
)
Now we are ready to make a first representation of the model output with {ggplot2}:
To be able to reuse the model easily, it’s better to integrate it into a function
To be able to reuse the model easily, it’s better to integrate it into a function
Applied to our case study, a code example for the modelisation of the number of leaves is summarized here
Should the data cleaning stage also be integrated?
Should we add the base temperature, \(T_{0}\), as a function argument?
Using the newly created function, make a plot to visualize the effect of a reduction of the phyllochron, \(GDD_{1leaf}\), from 50°C to 40°C (thanks to plant breeding for example)
p2<-ggplot2::ggplot()+
geom_point(
data=fun_model(data=data,GDD_1leaf=50),
aes(x=day_number, y=nleaf),
color="forestgreen"
)+
geom_point(
data=fun_model(data=data,GDD_1leaf=40),
aes(x=day_number, y=nleaf),
color="red"
)+
labs(
title = "Modelisation of the number of leaves",
x = "Day number",
y = "Number of leaves"
)
p2
Using the newly created function, make a plot to compare the evolution of the number of leaves in DesMoines and Sandstone
The input file for Sandstone may be downloaded from this link, or loaded directly in R as follows:
Next: Model optimization
Crop model with R and {tidyverse}