This dataset includes heights and diameters of trees grown under different fertilisation regimes over a measurement period of more than four years. Fertiliser treatments were applied at the plot level, with measurements made on multiple trees within each plot. We wish to calculate growth increments after summarising plot averages for diameter on each measurement date.

First we load libraries, read the data and convert variables to appropriate classes.

```
# load libraries
library(tidyverse)
library(zoo)
# read in tree data and look at first few rows
hfetree <- read.csv('data/HFEIFbytree.csv')
head(hfetree)
```

```
## ID plotnr treat Date height diameter
## 1 7445 17 C 2008-01-01 2.88 2.18
## 2 9451 25 C 2008-01-01 2.52 1.55
## 3 7066 17 C 2008-01-01 2.34 1.52
## 4 3194 9 C 2008-01-01 2.32 1.56
## 5 2108 7 C 2008-01-01 2.28 1.34
## 6 7120 17 C 2008-01-01 2.26 1.45
```

```
# convert dates to 'Date' format
hfetree$Date <- as.Date(hfetree$Date)
unique(hfetree$Date)
```

```
## [1] "2008-01-01" "2008-03-01" "2008-04-01" "2008-05-01" "2008-07-01"
## [6] "2008-09-01" "2008-10-01" "2008-12-01" "2009-01-01" "2009-03-01"
## [11] "2009-07-01" "2009-12-01" "2010-02-01" "2010-03-01" "2010-09-01"
## [16] "2011-06-01" "2012-04-15"
```

`length(unique(hfetree$Date))`

`## [1] 17`

```
# convert plot numbers to 'factor')
hfetree$plotnr <- factor(hfetree$plotnr)
length(levels(hfetree$plotnr))
```

`## [1] 16`

Now that we have our dataframe ready we can specify groups and calculate group means.

```
# calculate average diameter for each plot and date
hfeplot <- hfetree %>%
group_by(Date, treat, plotnr) %>%
summarise(diameter=mean(diameter))
hfeplot
```

```
## # A tibble: 272 x 4
## # Groups: Date, treat [68]
## Date treat plotnr diameter
## <date> <fct> <fct> <dbl>
## 1 2008-01-01 C 7 0.902
## 2 2008-01-01 C 9 1.07
## 3 2008-01-01 C 17 1.25
## 4 2008-01-01 C 25 0.961
## 5 2008-01-01 F 3 1.58
## 6 2008-01-01 F 8 0.738
## 7 2008-01-01 F 14 0.46
## 8 2008-01-01 F 22 1.02
## 9 2008-01-01 I 2 1.12
## 10 2008-01-01 I 10 0.852
## # â€¦ with 262 more rows
```

```
# 17 measurement dates so 17 observations for each plot
summary(hfeplot)
```

```
## Date treat plotnr diameter
## Min. :2008-01-01 C :68 1 : 17 Min. : 0.460
## 1st Qu.:2008-07-01 F :68 2 : 17 1st Qu.: 2.744
## Median :2009-01-01 I :68 3 : 17 Median : 5.497
## Mean :2009-05-25 IL:68 7 : 17 Mean : 6.697
## 3rd Qu.:2010-02-01 8 : 17 3rd Qu.:10.311
## Max. :2012-04-15 9 : 17 Max. :17.073
## (Other):170
```

Measurements were taken at intervals with different durations, so we need to calculate increments for both `diameter`

and `Date`

and then divide the former by the latter. We can do this using the `lag`

function from the `dplyr`

package (note that this masks the function of the same name from the `stats`

package). `lag`

works by shifting a vector by certain number of observations (`k`

) â€“ `k = 1`

means that the vector is shifted so that it starts one observation earlier.

```
# calculate growth increment between measurement dates
hfegrowth <- hfeplot %>%
group_by(plotnr) %>%
# remove groups where n <= 1 since can't calculate lags for these
filter(n() > 1) %>%
# sort by Date to ensure calculations correct
arrange(Date) %>%
# calculate increments for diameter and day relative to the current observation,
# then calculate rate per day based on increments
mutate(incr.diam = diameter - lag(diameter, 1),
incr.days = as.numeric(Date - lag(Date, 1)),
growth.rate = incr.diam / incr.days)
# NAs in the new columns are for the first observations in the data
# (no available lagged observations)
summary(hfegrowth)
```

```
## Date treat plotnr diameter
## Min. :2008-01-01 C :68 1 : 17 Min. : 0.460
## 1st Qu.:2008-07-01 F :68 2 : 17 1st Qu.: 2.744
## Median :2009-01-01 I :68 3 : 17 Median : 5.497
## Mean :2009-05-25 IL:68 7 : 17 Mean : 6.697
## 3rd Qu.:2010-02-01 8 : 17 3rd Qu.:10.311
## Max. :2012-04-15 9 : 17 Max. :17.073
## (Other):170
## incr.diam incr.days growth.rate
## Min. :-0.4104 Min. : 28.00 Min. :-0.007286
## 1st Qu.: 0.3137 1st Qu.: 31.00 1st Qu.: 0.005069
## Median : 0.7988 Median : 61.00 Median : 0.009891
## Mean : 0.8554 Mean : 97.88 Mean : 0.011299
## 3rd Qu.: 1.3340 3rd Qu.:129.75 3rd Qu.: 0.016591
## Max. : 2.5737 Max. :319.00 Max. : 0.044474
## NA's :16 NA's :16 NA's :16
```

There is a weather station at the experimental site, which gathers data that might be useful for explaining variation in growth rates on each date. These data are measured ona half-hourly basis. `HFEmet2008.csv`

includes weather data from 2008.

```
# read in weather data and look at summary
hfemet <- read.csv('data/HFEmet2008.csv')
summary(hfemet)
```

```
## DateTime Tair AirPress RH
## 1/1/2008 0:00 : 1 Min. :-3.048 Min. : 95.3 Min. :14.85
## 1/1/2008 0:30 : 1 1st Qu.:11.730 1st Qu.:101.3 1st Qu.:57.27
## 1/1/2008 1:00 : 1 Median :16.690 Median :101.8 Median :79.80
## 1/1/2008 1:30 : 1 Mean :16.291 Mean :101.7 Mean :73.51
## 1/1/2008 10:00: 1 3rd Qu.:20.800 3rd Qu.:102.3 3rd Qu.:93.20
## 1/1/2008 10:30: 1 Max. :37.740 Max. :103.4 Max. :97.60
## (Other) :17562 NA's :1 NA's :67 NA's :1
## VPD PAR Rain wind
## Min. :0.01453 Min. : 0.0 Min. : 0.00000 Min. :0.0000
## 1st Qu.:0.10679 1st Qu.: 0.0 1st Qu.: 0.00000 1st Qu.:0.0000
## Median :0.36148 Median : 6.4 Median : 0.00000 Median :0.2380
## Mean :0.66822 Mean : 352.3 Mean : 0.05188 Mean :0.6471
## 3rd Qu.:0.98770 3rd Qu.: 559.0 3rd Qu.: 0.00000 3rd Qu.:1.1140
## Max. :4.76178 Max. :2428.7 Max. :21.20000 Max. :7.6900
## NA's :1 NA's :67 NA's :1 NA's :1
## winddirection
## Min. : 0.086
## 1st Qu.:112.600
## Median :164.300
## Mean :171.106
## 3rd Qu.:225.900
## Max. :354.500
## NA's :1
```

```
# convert dates and times and look at summary, unique dates
# Note, lubridate functions do not work well with %>%
hfemet$DateTime <- mdy_hm(hfemet$DateTime)
hfemet$Date <- as.Date(hfemet$DateTime)
summary(hfemet)
```

```
## DateTime Tair AirPress
## Min. :2008-01-01 00:00:00 Min. :-3.048 Min. : 95.3
## 1st Qu.:2008-04-01 11:52:30 1st Qu.:11.730 1st Qu.:101.3
## Median :2008-07-01 23:45:00 Median :16.690 Median :101.8
## Mean :2008-07-01 23:45:00 Mean :16.291 Mean :101.7
## 3rd Qu.:2008-10-01 11:37:30 3rd Qu.:20.800 3rd Qu.:102.3
## Max. :2008-12-31 23:30:00 Max. :37.740 Max. :103.4
## NA's :1 NA's :67
## RH VPD PAR Rain
## Min. :14.85 Min. :0.01453 Min. : 0.0 Min. : 0.00000
## 1st Qu.:57.27 1st Qu.:0.10679 1st Qu.: 0.0 1st Qu.: 0.00000
## Median :79.80 Median :0.36148 Median : 6.4 Median : 0.00000
## Mean :73.51 Mean :0.66822 Mean : 352.3 Mean : 0.05188
## 3rd Qu.:93.20 3rd Qu.:0.98770 3rd Qu.: 559.0 3rd Qu.: 0.00000
## Max. :97.60 Max. :4.76178 Max. :2428.7 Max. :21.20000
## NA's :1 NA's :1 NA's :67 NA's :1
## wind winddirection Date
## Min. :0.0000 Min. : 0.086 Min. :2008-01-01
## 1st Qu.:0.0000 1st Qu.:112.600 1st Qu.:2008-04-01
## Median :0.2380 Median :164.300 Median :2008-07-01
## Mean :0.6471 Mean :171.106 Mean :2008-07-01
## 3rd Qu.:1.1140 3rd Qu.:225.900 3rd Qu.:2008-10-01
## Max. :7.6900 Max. :354.500 Max. :2008-12-31
## NA's :1 NA's :1
```

`length(unique(hfemet$Date))`

`## [1] 366`

The half-hourly measures are too fine a resolution for trying to explain variation in growth, estimates of average temperature or total precipitation over a longer timespan prior to growth measurements are more valuable here. We can calculate estimates for rolling means and sums using functions in the `zoo`

library.

```
# calculate a rolling average for Tair, rolling sum for Rain for each month
hferoll <- hfemet %>%
# first generate a summary for Tair (mean) and Rain (sum) by Date
# to ensure that there is only one observation per date, otherwise rolling estimates
# will be based on 'k' half hourly measurements
group_by(Date) %>%
summarise(Tair=mean(Tair),
Rain=sum(Rain)) %>%
# then calculate rolling average/sum over a 30-day window prior to the observation date
# align='right' ensures averages are based on prior observations
arrange(Date) %>%
mutate(Tair.30=rollmean(Tair, 30, fill=NA, na.rm=T, align='right'),
Rain.30=rollsum(Rain, 30, fill=NA, na.rm=T, align='right')) %>%
select(Date, Tair.30, Rain.30)
# NAs in the new columns are for the first 29 observations in the data
# (no available lagged observations)
summary(hferoll)
```

```
## Date Tair.30 Rain.30
## Min. :2008-01-01 Min. : 8.727 Min. : 1.40
## 1st Qu.:2008-04-01 1st Qu.:12.187 1st Qu.: 30.80
## Median :2008-07-01 Median :16.462 Median : 65.00
## Mean :2008-07-01 Mean :15.766 Mean : 74.94
## 3rd Qu.:2008-09-30 3rd Qu.:20.044 3rd Qu.: 94.20
## Max. :2008-12-31 Max. :22.739 Max. :289.60
## NA's :29 NA's :29
```

Now we have two dataframes containing unique information but that can be joined one (or more) variables that are shared between the two. The â€˜dplyrâ€™ package has a series of â€˜joinâ€™ functions that can do this, the choice of function depends on the desired output. Here we are only interested in matching weather data from dates relevant to growth measurements.

```
# merge growth and weather data using 'join'
# we don't need to keep all observations from 'hferoll' so we use 'left_join'
# and enter 'hfegrowth' as the table on the left
dfr <- left_join(hfegrowth, hferoll, by='Date')
summary(dfr)
```

```
## Date treat plotnr diameter
## Min. :2008-01-01 C :68 1 : 17 Min. : 0.460
## 1st Qu.:2008-07-01 F :68 2 : 17 1st Qu.: 2.744
## Median :2009-01-01 I :68 3 : 17 Median : 5.497
## Mean :2009-05-25 IL:68 7 : 17 Mean : 6.697
## 3rd Qu.:2010-02-01 8 : 17 3rd Qu.:10.311
## Max. :2012-04-15 9 : 17 Max. :17.073
## (Other):170
## incr.diam incr.days growth.rate Tair.30
## Min. :-0.4104 Min. : 28.00 Min. :-0.007286 Min. : 9.487
## 1st Qu.: 0.3137 1st Qu.: 31.00 1st Qu.: 0.005069 1st Qu.:12.539
## Median : 0.7988 Median : 61.00 Median : 0.009891 Median :15.129
## Mean : 0.8554 Mean : 97.88 Mean : 0.011299 Mean :16.021
## 3rd Qu.: 1.3340 3rd Qu.:129.75 3rd Qu.: 0.016591 3rd Qu.:20.030
## Max. : 2.5737 Max. :319.00 Max. : 0.044474 Max. :20.310
## NA's :16 NA's :16 NA's :16 NA's :160
## Rain.30
## Min. : 36.60
## 1st Qu.: 40.00
## Median : 68.40
## Mean : 88.11
## 3rd Qu.:101.00
## Max. :243.60
## NA's :160
```

```
# clean up the data, we don't need all variables and NAs are not informative
dfr <- dfr %>%
drop_na %>%
# naming some variables strategically means that we can use 'tidyselect' helpers
select(Date, treat, plotnr, growth.rate, ends_with('.30')) %>%
# then rename variables for ease of use
rename(Tair=Tair.30, Rain=Rain.30)
dfr
```

```
## # A tibble: 112 x 6
## # Groups: plotnr [16]
## Date treat plotnr growth.rate Tair Rain
## <date> <fct> <fct> <dbl> <dbl> <dbl>
## 1 2008-03-01 C 7 0.0143 20.3 244.
## 2 2008-03-01 C 9 0.0194 20.3 244.
## 3 2008-03-01 C 17 0.0195 20.3 244.
## 4 2008-03-01 C 25 0.0165 20.3 244.
## 5 2008-03-01 F 3 0.0229 20.3 244.
## 6 2008-03-01 F 8 0.0199 20.3 244.
## 7 2008-03-01 F 14 0.0136 20.3 244.
## 8 2008-03-01 F 22 0.0238 20.3 244.
## 9 2008-03-01 I 2 0.0206 20.3 244.
## 10 2008-03-01 I 10 0.0130 20.3 244.
## # â€¦ with 102 more rows
```

Now we can plot relationships between growth rate, air temperature and rainfall. Growth rate appears to be positively correlated with both.

```
# plot growth rate by temperature and treatment
ggplot(dfr, aes(Tair, growth.rate, col=treat)) +
geom_point() +
xlab(expression(Mean~ ~air~ ~ temperature~ ~(''^'o'*C))) +
ylab(expression(Growth~ ~increment~ ~(mm~d^-1)))
```

```
ggplot(dfr, aes(Rain, growth.rate, col=treat)) +
geom_point() +
xlab('Sum rainfall (mm)') +
ylab(expression(Growth~ ~increment~ ~(mm~d^-1)))
```