The environment.csv
dataset (from Beckerman and Petchey’s textbook, Getting started with R: An introduction for biologists) includes measures of root biomass (in g/m\(^2\)) for 10 sites as a function of altitude (in m), temperature (in degrees C) and rainfall (in m).
enviro <- read.csv("environment.csv")
str(enviro)
## 'data.frame': 10 obs. of 5 variables:
## $ site : int 1 2 3 4 5 6 7 8 9 10
## $ altitude : int 13 160 100 205 45 84 349 509 399 30
## $ temperature: int 24 18 17 15 20 21 14 11 13 19
## $ rainfall : num 0.01 0.5 0.6 1.1 0.09 0.2 1.2 0.6 0.8 0.5
## $ biomass : int 20 120 110 200 45 70 150 275 220 38
Estimate the parameters of the model including the three predictors: biomass ~ altitude + temperature + rainfall
. Does the inclusion of the three predictors in the same model cause problems? Justify your answer.
Propose several alternative models for this dataset, including the null model (0 predictor) and models with 1 or 2 predictors (without interactions). Avoid using highly correlated predictors in the same model. Create a table comparing these models according to their AICc.
What is the best model for predicting root biomass at a new site similar to those sampled? Would it be useful to make average predictions from several models here? Justify your answer.
The file migration.csv
contains data from Rubolini et al. (2005) on 28 bird species that migrate between Europe and Africa.
migr <- read.csv("migration.csv")
str(migr)
## 'data.frame': 28 obs. of 14 variables:
## $ speciesID : int 1 3 4 5 7 8 9 11 12 13 ...
## $ species1 : chr "Acrocephalus" "Acrocephalus" "Anthus" "Anthus" ...
## $ species2 : chr "arundinaceus" "scirpaceus" "campestris" "trivialis" ...
## $ migDate : num 33 38 32 27 35 30 31 30.8 30 28 ...
## $ latBreed : num 46 48 43.5 55.3 47.5 50.3 51 51.5 48.8 59 ...
## $ latWntr : num -10.3 0 6 -10 -7.5 18.5 -15 7.5 -10 7.5 ...
## $ sexDchrmt : num 0 0 0 0 4.3 2 2.3 7 17.3 16 ...
## $ nestSite : int 0 0 0 0 0 0 0 0 1 1 ...
## $ moult : int 1 1 0 0 1 0 1 0 0 0 ...
## $ mWngLn : num 96.8 66.8 91.6 88.7 192.1 ...
## $ fWngLn : num 92.3 66 86.9 84.7 194.3 ...
## $ numSpecies: int 641 546 140 3531 269 104 166 101 737 12837 ...
## $ X : num -10.3 0 6 -10 -7.5 18.5 -15 7.5 -10 7.5 ...
## $ Y : num 33 38 32 27 35 30 31 30.8 30 28 ...
We are looking to predict the date of arrival in Europe (migDate, measured in days from April 1st) based on the following predictors:
In theory, birds are expected to arrive later if their breeding site is further north (due to climate and distance) and if they moult at the wintering site. Birds are expected to arrive earlier if their wintering grounds are at a higher latitude in Africa (less distance to travel) and if they nest in existing cavities.
Check the fit of the complete linear model including the 4 predictors. Interpret the values obtained for each of the coefficients of these predictors (but not the intercept). Are these results consistent with those expected in theory?
Using AICc, compare models including each of the following combinations of the 4 predictors:
How many models have a \(\Delta AIC \le 2\)? According to the Akaike weights, what is the probability that the best model is among those?
migr_test.csv
which contains the data of 10 other species from the Rubolini et al.migr_test <- read.csv("migr_test.csv")
str(migr_test)
## 'data.frame': 10 obs. of 14 variables:
## $ speciesID : int 2 6 10 14 18 22 26 30 34 38
## $ species1 : chr "Acrocephalus" "Calandrella" "Delichon" "Hippolais" ...
## $ species2 : chr "schoenobaenus" "brachydactyla" "urbica" "icterina" ...
## $ migDate : num 35 27.5 29 39 31.2 28 35 27 22 22
## $ latBreed : num 57.5 39.5 48.5 56 54.5 49 45.5 56.5 48 44
## $ latWntr : num -7.5 15.5 -15 -19 13 -7.5 -12 -9 11 16
## $ sexDchrmt : num 0 0 0 0 0 9 19.3 0 5.7 2.3
## $ nestSite : int 0 0 0 0 0 0 0 0 0 1
## $ moult : int 1 0 1 1 1 0 1 1 0 1
## $ mWngLn : num 67.2 93.4 111.1 78.9 64.6 ...
## $ fWngLn : num 64.7 89.8 110 78 63.6 ...
## $ numSpecies: int 2524 138 1624 10297 63 1163 1525 24767 2658 410
## $ X : num -7.5 15.5 -15 -19 13 -7.5 -12 -9 11 16
## $ Y : num 35 27.5 29 39 31.2 28 35 27 22 22
Calculate the mean of the square prediction error (observation - prediction)\(^2\) for these 10 new observations according to (i) the best model identified in (b) and (ii) the weighted average prediction of all models.
Tip: To obtain a vector of the average predictions, choose the mod.avg.pred
component of the object produced by the modavgPred
function.