Software notes:

I prepared all course materials in R version 4.0.2 (R Core Team 2020). However, the codes should als run in older versions of R >3.6.0. You need to install the following packages and their dependencies:

  • data.table (Dowle and Srinivasan 2019)
  • raster (Hijmans 2019)
  • randomForest (Liaw and Wiener 2002)
  • lattice (Sarkar 2008)
  • RColorBrewer (Neuwirth 2014)
  • PresenceAbsence (Freeman and Moisen 2008)

1 Background

Here, I provide a short, half-day introduction to species distribution modelling in R. The course gives a brief overview of the concept of species distribution modelling, and introduces the main modelling steps. Codes and data largely follow the materials from Zurell and Engler (2019) although we will use a different case study.

Species distribution models (SDMs) are a popular tool in quantitative ecology (Franklin 2010; Peterson et al. 2011; Guisan, Thuiller, and Zimmermann 2017) and constitute the most widely used modelling framework in global change impact assessments for projecting potential future range shifts of species (IPBES 2016). There are several reasons that make them so popular: they are comparably easy to use because many software packages (e.g. Thuiller et al. 2009; Phillips, Anderson, and Schapire 2006) and guidelines (e.g. Elith, Leathwick, and Hastie 2008; Elith et al. 2011; Merow, Smith, and Silander Jr 2013; Guisan, Thuiller, and Zimmermann 2017) are available, and they have comparably low data requirements.

As input, SDMs require georeferenced biodiversity observations (e.g. individual locations, species’ presence, species’ counts, species richness; the response or dependent variable) and geographic layers of environmental information (e.g. climate, land cover, soil attributes; the predictor or independent variables). Such information are now widely available in digital format. For example, online repositories provide data on species distributions (e.g. GBIF and OBIS), on individual animal locations (e.g. Movebank), on climate (e.g. WorldClim and CHELSA) as well as land cover and other remote sensing products (e.g. Copernicus). We can then relate the biodiversity observations at specific sites to the prevailing environmental conditions at those sites. Different statistical and machine-learning algorithms are available for this. Once we have estimated this biodiversity-environment relationship, we can make predictions in space and in time by projecting the model onto available environmental layers (Figure 1.1).