RStudio project

Open the RStudio project that we created in the first session. I recommend to use this RStudio project for the entire course and within the RStudio project create separate R scripts for each session.

  • Create a new empty R script by going to the tab “File”, select “New File” and then “R script”
  • In the new R script, type # Session b2: Environmental data and save the file in your folder “scripts” within your project folder, e.g. as “b2_EnvData.R”

In species distribution modelling, we aim to understand how species’ occurrence are related to environment. Thus, additional to our species data, we need environmental information. Many environmental data are now available at very high spatial resolution, e.g. lidar data (Bakx et al. 2019). However, often, high resolution data are not necessarily available globally - although the data are constantly improving. I can’t give you a full overview over all available data sets. Rather, you should get an idea how you process the data to make best use of them for your biodiversity models.

1 Climate data

The geodata package is offering direct access to some standard repositories; see the help pages ?geodata. We will use this for extracting climate data from the worldclim data base (http://worldclim.org/)(Hijmans et al. 2005). Please note that also other climatologies exist, e.g. the Chelsa climatologies (http://chelsa-climate.org/)(Karger et al. 2017). However, we here stick to the data offered through the geodata package.

1.1 Current climate

First, we download the 19 bioclimatic variables at a 10’ resolution, following the same procedure as in practical a1. Do you remember what the 19 bioclimatic variables are? See here: https://www.worldclim.org/data/bioclim.html. Remember to think about your folder structure, where you want to store the climate data!

library(geodata)
## Loading required package: terra
## terra 1.7.46
# Download global bioclimatic data from worldclim (you may have to set argument 'download=T' for first download, if 'download=F' it will attempt to read from file):
clim <- geodata::worldclim_global(var = 'bio', res = 10, download = F, path = 'data')

# Now, let's look at the data:
clim
## class       : SpatRaster 
## dimensions  : 1080, 2160, 19  (nrow, ncol, nlyr)
## resolution  : 0.1666667, 0.1666667  (x, y)
## extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326) 
## sources     : wc2.1_10m_bio_1.tif  
##               wc2.1_10m_bio_2.tif  
##               wc2.1_10m_bio_3.tif  
##               ... and 16 more source(s)
## names       : wc2.1~bio_1, wc2.1~bio_2, wc2.1~bio_3, wc2.1~bio_4, wc2.1~bio_5, wc2.1~bio_6, ... 
## min values  :   -54.72435,     1.00000,    9.131122,       0.000,   -29.68600,   -72.50025, ... 
## max values  :    30.98764,    21.14754,  100.000000,    2363.846,    48.08275,    26.30000, ...
# Can you explain, what a raster stack is?
plot(clim)

Remember that the terra package offers different functionalities to manipulate the spatial data, for example aggregating the data to coarser resolutions (aggregate), cropping (crop()), and adding spatial layers to a SpatRaster object (c()):

terra::aggregate(clim[[1]], fact=6, fun="mean")

1.2 Future climate scenarios

The Chelsa and worldclim data bases also offer downscaled climate scenarios. The scenarios stem from the World Climate Research Programme Coupled Model Intercomparison Projects (CMIPs). The most recent is the CMIP6 and the corresponding scenarios can be downloaded form the Chelsa or worlclim websites. For the latter, the downscaled climate scenarios are again accessible through the geodata package (?geodata::cmip6_world). In the function geodata::cmip6_world(), we have to indicate which model (global circulation model, GCM) we want to download, which ssp (shared socioeconomic pathway, SSP) and which time period (projection period; e.g., 2041-2060). More information on the model abbreviations and the available SSPs can be found here: https://www.worldclim.org/data/cmip6/cmip6_clim10m.html. As above, we have to provide var and res arguments as well.

# Download future climate scenario from 'ACCESS-ESM1-5' climate model.
# Please note that you have to set download=T if you haven't downloaded the data before:
clim_fut <- geodata::cmip6_world(model='ACCESS-ESM1-5', ssp='245', time='2041-2060', var='bioc', download=F, res=10, path='data')

# Inspect the SpatRaster object:
clim_fut
## class       : SpatRaster 
## dimensions  : 1080, 2160, 19  (nrow, ncol, nlyr)
## resolution  : 0.1666667, 0.1666667  (x, y)
## extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326) 
## source      : wc2.1_10m_bioc_ACCESS-ESM1-5_ssp245_2041-2060.tif 
## names       : bio01, bio02, bio03,  bio04, bio05, bio06, ... 
## min values  : -52.8,   0.0,   0.3,   11.1, -28.1, -70.2, ... 
## max values  :  33.3,  21.5,  94.7, 2299.4,  51.7,  26.2, ...

We see that the current and future climate SpatRaster objects have different layer names. This could cause problems in distribution modelling and we thus want make sure that they all have the same layer names.

# Inspect layer names
names(clim)
##  [1] "wc2.1_10m_bio_1"  "wc2.1_10m_bio_2"  "wc2.1_10m_bio_3"  "wc2.1_10m_bio_4" 
##  [5] "wc2.1_10m_bio_5"  "wc2.1_10m_bio_6"  "wc2.1_10m_bio_7"  "wc2.1_10m_bio_8" 
##  [9] "wc2.1_10m_bio_9"  "wc2.1_10m_bio_10" "wc2.1_10m_bio_11" "wc2.1_10m_bio_12"
## [13] "wc2.1_10m_bio_13" "wc2.1_10m_bio_14" "wc2.1_10m_bio_15" "wc2.1_10m_bio_16"
## [17] "wc2.1_10m_bio_17" "wc2.1_10m_bio_18" "wc2.1_10m_bio_19"
names(clim_fut)
##  [1] "bio01" "bio02" "bio03" "bio04" "bio05" "bio06" "bio07" "bio08" "bio09"
## [10] "bio10" "bio11" "bio12" "bio13" "bio14" "bio15" "bio16" "bio17" "bio18"
## [19] "bio19"
# In this case, let's keep the names of the future climate layers
names(clim) <- names(clim_fut)

You can also write SpatRaster objects to file:

terra::writeRaster(clim,filename='data/bioclim_global_res10.grd')
terra::writeRaster(clim_fut,filename='data/bioclim_fut_global_res10.grd')

*.grd was the native file format of the raster package, the predecessor of terra, which we will also use here. It consists of two files, a data file and a header file (*.gri).

2 Land cover data

The geodata package also offers access to other environmental data useful for species distribution modelling, for example soil (?geodata::soil_world) and land cover data (?geodata::landcover).

The land cover data are derived from the ESA WorldCover data set (https://esa-worldcover.org/en) that “provides a new baseline global land cover product at 10 m resolution for 2020 based on Sentinel-1 and 2 data”. The geodata package offers the fractional cover at 30-seconds spatial resolution (c. 1 km at the equator). For illustration, let`s download tree cover globally.

# Download fractional tree cover at 30-sec resolution:
# Please note that you have to set download=T if you haven't downloaded the data before:
trees_30sec <- geodata::landcover(var='trees', path='data', download=F)

# map the tree cover
plot(trees_30sec)

Above, we used climate data at 10-min spatial resolution. To obtain the same spatial resolution for the land cover, we have to aggregate the SpatRaster object.

# Aggregate tree cover to 10-min spatial resolution
trees_10min <- terra::aggregate(trees_30sec, fact=20, fun='mean')

# Map the 10-min tree cover
plot(trees_10min)

Now that our tree cover data and climate data are at the same spatial resolution, we can stack them into a multi-layer object. But caution, the SpatRaster objects also need to have the same spatial extent.

# This produces an error that spatial extents do not match:
env_cur <- c(clim, trees_10min)
## Error: [rast] extents do not match
# Which SpatRaster object has the larger extent?
terra::ext(clim)
## SpatExtent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)
terra::ext(trees_10min)
## SpatExtent : -180, 179.99999999999, -59.999999999996, 84 (xmin, xmax, ymin, ymax)
# As the climate data have the larger extent, we now have to "extend" our land cover extent
terra::extend(trees_10min, clim)
## class       : SpatRaster 
## dimensions  : 1080, 2160, 1  (nrow, ncol, nlyr)
## resolution  : 0.1666667, 0.1666667  (x, y)
## extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326) 
## source(s)   : memory
## name        : trees 
## min value   :     0 
## max value   :     1
# Produce the multi-layer environmental data object with matching extents:
env_cur <- c(clim, terra::extend(trees_10min, clim))

3 Joining species and environmental data

Last, we can join our species and environmental data. Such joined species-environment data later serve as input to our species distribution models.

# Load our previously saved species data:
load(file='data/gbif_shrew_cleaned.RData')

When we have coordinate data, as we have in the GBIF data, we can use these coordinates to “pierce” through SpatRaster layers. That’s one of the easiest ways to extract relevant environmental data for our species records. However, as a very first step we have to decide which GBIF information should be retained in our data set.

# The GBIF data contain a lot of columns that we probably don't need:
head(gbif_shrew_cleaned)
# I suggest to keep the following columns for now:
gbif_shrew2 <- gbif_shrew_cleaned[,
    c("key", "scientificName", "decimalLatitude", "decimalLongitude", "basisOfRecord", "speciesKey", "species", "year")]

# We can extract the environmental data for the GBIF coordinates.
# Coordinates are always provided as x/y format, in our case lon/lat.
# We also extract the cellnumbers as this allows checking for duplicates later.
head(terra::extract(x = env_cur, 
    y = data.frame(gbif_shrew2[,c('decimalLongitude','decimalLatitude')]), cells=T ))
##   ID    bio01    bio02    bio03    bio04    bio05    bio06    bio07      bio08
## 1  1 6.218458 7.440792 31.85918 605.8054 18.89700 -4.45825 23.35525 13.7589169
## 2  2 7.154459 8.365458 33.95176 614.9968 20.70625 -3.93300 24.63925  0.8864167
## 3  3 7.154459 8.365458 33.95176 614.9968 20.70625 -3.93300 24.63925  0.8864167
## 4  4 2.502896 6.938416 31.54722 566.5660 14.37500 -7.61875 21.99375  9.5671673
## 5  5 4.237437 7.772583 31.97574 627.9045 17.49875 -6.80900 24.30775 11.9804583
## 6  6 3.388250 8.198584 33.54644 619.1552 16.30225 -8.13725 24.43950 10.9958334
##        bio09     bio10     bio11 bio12 bio13 bio14    bio15 bio16 bio17 bio18
## 1 -0.1909583 13.758917 -0.922375  1636   193   100 23.04582   551   310   551
## 2 11.7621250 14.922333 -0.062500  1377   138    91 12.41136   385   304   353
## 3 11.7621250 14.922333 -0.062500  1377   138    91 12.41136   385   304   353
## 4 -3.6100416  9.567167 -3.906833  1813   200   110 18.72311   573   354   573
## 5 -3.1796250 11.980458 -3.179625  1009   137    43 38.40795   394   146   394
## 6 -3.3333752 10.995833 -3.979792  1490   195    76 30.29223   543   262   543
##   bio19     trees   cell
## 1   318 0.4710940 556255
## 2   369 0.7686102 543289
## 3   369 0.7686102 543289
## 4   361 0.3076328 558415
## 5   146 0.5738233 556269
## 6   265 0.5509280 551962
# Finally, we put species and environmental data into the same data frame:
gbif_shrew2 <- cbind(gbif_shrew2, terra::extract(x = env_cur, y = data.frame(gbif_shrew2[,c('decimalLongitude','decimalLatitude')]), cells=T ))

We now have to inspect the data again to see whether we have any missing values or any other issues.

summary(gbif_shrew2)

Because we superimposed an arbitrary resolution now when joining the GBIF and environmental data, we could potentially have multiple records in a single raster cell. As we have extracted the cell numbers from the SpatRaster object, checking for duplicates is very simple.

# Check for duplicates - how many duplicates?
sum(duplicated(gbif_shrew2$cell))
## [1] 177
# Only retain non-duplicated cells (will not work in this example as we don't have duplicates):
gbif_shrew_env <- gbif_shrew2[!duplicated(gbif_shrew2$cell),]

save(gbif_shrew2, gbif_shrew_cleaned,file='data/gbif_shrew_cleaned.RData')

Exercise:

  • Choose another climate scenario, download the data and create two new data sets merging the GBIF data for your own species (from practical b1) with each of the climate scenarios, respectively.
  • Choose another land cover class, download the data and aggregate to the appropriate spatial resolution. Merge the GBIF data for your own species (from practical b1) with the current climate and land cover data.

References

Bakx, T. R. M., Z. Koma, A. C. Seijmonsbergen, and W. D. Kissling. 2019. “Use and Categorization of Light Detection and Ranging Vegetation Metrics in Avian Diversity and Species Distribution Research.” Diversity and Distributions 25 (7): 1045–59. https://doi.org/10.1111/ddi.12915.
Hijmans, R. J., S. E. Cameron, J. L. Parra, P. G. Jones, and A. Jarvis. 2005. “Very High Resolution Interpolated Climate Surfaces for Global Land Areas.” International Journal of Climatology 25 (15): 1965–78. https://doi.org/10.1002/joc.1276.
Karger, D. N., O. Conrad, J. Boehner, T. Kawohl, H. Kreft, R. Wilber Soria-Auza, N. E. Zimmermann, H. P. Linder, and M. Kessler. 2017. “Climatologies at High Resolution for the Earth’s Land Surface Areas.” Scientific Data 4 (September): 170122.