RStudio project

Open the RStudio project that we created in the previous session. I recommend to use this RStudio project for the entire module and within the RStudio project create separate R scripts for each session.

  • Create a new empty R script by going to the tab “File”, select “New File” and then “R script”
  • In the new R script, type # Session 1: Spatial data in R and save the file in your folder “scripts” within your project folder, e.g. as “1_SpatialData.R”

Nowadays, R offers a lot of GIS functionalities in different packages.

We will use a number of such GIS packages in R. Remember that we can install new packages with the function install.packages(). Make sure also to install the dependencies.

install.packages('terra, dep=T)
# load required package
library(terra)

1 Introduction to spatial data

Spatial data are typically stored as either vector data or raster data (With 2019). Discrete objects with clear boundaries are usually represented by vector data (e.g. individual trees, roads, countries) while continuous phenomena without clear boundaries are usually represented by raster data (e.g. elevation, temperature).

Vector data can be mapped a points, lines, and polygons:

  • Points occur at discrete locations and are represented by a coordinate pair (x,y). Each point may carry additional informaton, the so-called attributes, e.g. the species name of the individual captured at the location along with date.
  • Lines describe linear features and are defined by at least two coordinate pairs (x,y), the end points of the line. A line can also consistent of several line segments.
  • Polygons describe two-dimensional features in the landscapes and define a bounded area, enclosed by lines. Thus, a polygon needs to consist of at least three coordinate pairs (x,y.

By contrast, Raster data represent the landscape as a regularly spaced grid and are used to represent data that vary continuously across space such as elevation, temperature or NDVI. Raster cells can contain numeric values (e.g. elevation) or categorical values (e.g. land cover type). The coordinate information is stored differently from vector data because storing the coordinates for each grid cell in the raster would use too much storage space. To define a raster, we only need to know the coordinates of one corner, the spatial extent and the spatial resolution to infer the coordinates of each cell.

2 Vector data in R

The package terra provides a set of classes and methods to carry out spatial data analysis with raster and vector data. For illustration, we will start by creating a few SpatVector objects.

For more detailed tutorials see https://rspatial.org/terra/.

2.1 Spatial points

For illustration, we will generate some random points data and convert them into Spat* objects.

# We set a seed for the random number generator, so that we all obtain the same results
set.seed(12345)

coords <- cbind(
  x <- rnorm(50, sd=2),
  y <- rnorm(50, sd=2)
)

# sort coords data along x coordinates
coords <- coords[order(coords[,1]),]

# look at data structure
str(coords)
##  num [1:50, 1:2] -4.76 -3.64 -3.32 -3.2 -3.11 ...
# plot the coordinates
plot(coords)

We now convert the random points into a SpatVector object and inspect it.

# Convert into SpatVector
sp <- terra::vect(coords)

# Check out the object
class(sp)
## [1] "SpatVector"
## attr(,"package")
## [1] "terra"
# And the entries:
sp 
##  class       : SpatVector 
##  geometry    : points 
##  dimensions  : 50, 0  (geometries, attributes)
##  extent      : -4.760716, 4.393667, -4.693888, 4.954222  (xmin, xmax, ymin, ymax)
##  coord. ref. :
geom(sp)
##       geom part          x          y hole
##  [1,]    1    1 -4.7607161 -1.7101650    0
##  [2,]    2    1 -3.6359119  0.5559074    0
##  [3,]    3    1 -3.3241005 -0.6159068    0
##  [4,]    4    1 -3.1954190  2.8054108    0
##  [5,]    5    1 -3.1062748  4.6610239    0
##  [6,]    6    1 -2.8261976 -1.0100870    0
##  [7,]    7    1 -2.6135977 -1.3890934    0
##  [8,]    8    1 -2.1205311  3.7738939    0
##  [9,]    9    1 -1.8386440 -4.6938880    0
## [10,]   10    1 -1.7727150  1.7762789    0
## [11,]   11    1 -1.5010640 -1.1737592    0
## [12,]   12    1 -1.2886569 -2.0987056    0
## [13,]   13    1 -0.9632947  1.6525166    0
## [14,]   14    1 -0.9069943  0.7033257    0
## [15,]   15    1 -0.6631552  3.1869769    0
## [16,]   16    1 -0.6481732  1.3440849    0
## [17,]   17    1 -0.5683195  4.2901300    0
## [18,]   18    1 -0.5523682  1.6475907    0
## [19,]   19    1 -0.3246220  2.0425168    0
## [20,]   20    1 -0.2324956  0.2991840    0
## [21,]   21    1 -0.2186066  0.1071805    0
## [22,]   22    1  0.0516021  1.6497401    0
## [23,]   23    1  0.5085424  1.9424413    0
## [24,]   24    1  0.5974474 -2.5913434    0
## [25,]   25    1  0.7412557  1.1066062    0
## [26,]   26    1  0.9823766  3.7341984    0
## [27,]   27    1  1.0404329  3.1799257    0
## [28,]   28    1  1.1348065  4.3154396    0
## [29,]   29    1  1.1663753 -1.1995951    0
## [30,]   30    1  1.1710576 -1.0807721    0
## [31,]   31    1  1.2117749 -1.3419531    0
## [32,]   32    1  1.2242470  0.9524966    0
## [33,]   33    1  1.2407596 -1.6230810    0
## [34,]   34    1  1.2601971  1.3823425    0
## [35,]   35    1  1.4189320  3.8953853    0
## [36,]   36    1  1.5592438  0.1092312    0
## [37,]   37    1  1.6237464  1.2907661    0
## [38,]   38    1  1.6337997 -3.6647546    0
## [39,]   39    1  1.7089034 -1.9612659    0
## [40,]   40    1  1.8742811 -0.7836387    0
## [41,]   41    1  2.2414253  1.0337093    0
## [42,]   42    1  2.2570217 -1.9278030    0
## [43,]   43    1  2.9115702 -1.5692987    0
## [44,]   44    1  2.9214588  1.3746642    0
## [45,]   45    1  3.2648913  4.9542218    0
## [46,]   46    1  3.5354677  1.0730474    0
## [47,]   47    1  3.6101950  1.8852017    0
## [48,]   48    1  3.6346241 -2.6850630    0
## [49,]   49    1  4.0983807 -0.6087382    0
## [50,]   50    1  4.3936671  2.0862871    0

We see that it is a vector data object with point geometries. An extent (the bounding box) was created automatically from the provided coordinates. The coord. ref. value is empty as we haven’t provided a coordinate reference system (“CRS”) yet. We can explicitly add a CRS when creating the SpatVector object:

sp <- terra::vect(coords, crs = '+proj=longlat +datum=WGS84')
sp
##  class       : SpatVector 
##  geometry    : points 
##  dimensions  : 50, 0  (geometries, attributes)
##  extent      : -4.760716, 4.393667, -4.693888, 4.954222  (xmin, xmax, ymin, ymax)
##  coord. ref. : +proj=longlat +datum=WGS84 +no_defs

At the moment the SpatVector object only contains the point coordinates without any additional thematic information (or attributes) for each point. Let’s assume the spatial points are trees, either of three species. The 15 western-most points are all birch trees, the eastern trees are a mix of beach and oak.

# Create attribute table
data <- data.frame(ID=1:50,species= NA)
# Define the 15 western trees as birch
data[1:15, 'species'] <- 'birch'
# Randomly assign the species beach or oak to the remaining 35 trees
data[-c(1:15), 'species'] <- sample(c('beech','oak'),35,replace=T)

Now, we combine the SpatVector object and the data frame and store it in a new SpatVector object.

# in terra, you can simply add the additional information to the existing Spatial Vector
sp_trees <- sp
terra:: values(sp_trees) <- data

# Inspect the new object
sp_trees
##  class       : SpatVector 
##  geometry    : points 
##  dimensions  : 50, 2  (geometries, attributes)
##  extent      : -4.760716, 4.393667, -4.693888, 4.954222  (xmin, xmax, ymin, ymax)
##  coord. ref. : +proj=longlat +datum=WGS84 +no_defs 
##  names       :    ID species
##  type        : <int>   <chr>
##  values      :     1   birch
##                    2   birch
##                    3   birch

2.2 Spatial lines and Spatial polygons

Similarly, we can also create SpatVector of lines or polygons. First, we want to create a line vector object from the oaks - we could assume that it is an oak hedgerow.

# prepare a subset of point geometries for oaks
oak   <- terra::subset(sp_trees, sp_trees$species == 'oak')

# extract oak coordinates
oak_coords <- terra::crds(oak)

# Create SpatVector lines through all oak trees in the data
oak_hedge <- terra::vect(oak_coords, type="lines", crs='+proj=longlat +datum=WGS84')

# Inspect the object
oak_hedge
##  class       : SpatVector 
##  geometry    : lines 
##  dimensions  : 1, 0  (geometries, attributes)
##  extent      : -0.6481732, 4.393667, -3.664755, 2.086287  (xmin, xmax, ymin, ymax)
##  coord. ref. : +proj=longlat +datum=WGS84 +no_defs

Next, we create a spatial polygon vector from the birch data - we could assume that it is a birch woodland patch.

# prepare a subset of point geometries for birch
birch <- terra::subset( sp_trees, sp_trees$species == 'birch')

# extract birch coordinates
birch_coords <- terra::crds(birch)

# Create a spatial polygon around all birch trees in the data by using a convex hull:
birch_hull <- chull(birch_coords)

# Create SpatVector polygons 
birch_patch <- terra::vect(birch_coords[birch_hull,], type = "polygons", crs='+proj=longlat +datum=WGS84')

# inspect the object
birch_patch
##  class       : SpatVector 
##  geometry    : polygons 
##  dimensions  : 1, 0  (geometries, attributes)
##  extent      : -4.760716, -0.6631552, -4.693888, 4.661024  (xmin, xmax, ymin, ymax)
##  coord. ref. : +proj=longlat +datum=WGS84 +no_defs

Finally, let’s plot our spatial point, line and polygon data.

# We first plot the bird woodland patch but with the spatial extent of sp_trees (plus some buffer):
plot(birch_patch, border='blue',col='YellowGreen', ext=ext(sp_trees)*1.1, lwd=2)

# Add the oak hedge
plot(oak_hedge, col='ForestGreen', lwd=2, add=T)

# Add the trees
plot(sp_trees, cex=2, add=T)

2.3 Reading vector data from file

Most often, vector data are stored as shapefiles. Download the zip folder containg IUCN range data of the Alpine Shrew to your data folder and unzip it. You will see that several files are contained in this folder. All of these are necessary parts of the shapefile and contain the different information on geometry and attributes.

We use the terra package to read in the data:

(shrew <- terra::vect('data/IUCN_Sorex_alpinus.shp'))
##  class       : SpatVector 
##  geometry    : polygons 
##  dimensions  : 1, 27  (geometries, attributes)
##  extent      : 5.733728, 26.67935, 42.20601, 51.89984  (xmin, xmax, ymin, ymax)
##  source      : IUCN_Sorex_alpinus.shp
##  coord. ref. : lon/lat WGS 84 (EPSG:4326) 
##  names       : id_no      binomial presence origin seasonal compiler yrcompiled
##  type        : <chr>         <chr>    <int>  <int>    <int>    <chr>      <int>
##  values      : 29660 Sorex alpinus        1      1        1     IUCN       2008
##         citation source dist_comm (and 17 more)
##            <chr>  <chr>     <chr>              
##  IUCN (Internat~     NA        NA
# Plot Central Europe
library(maps)
map('world',xlim=c(5,30), ylim=c(40,55))

# Overlay the range of the Alpine Shrew
plot(shrew, col='red', add=T)

3 Raster data in R

We will use the terra package to represent and analyse raster data in R. The package contains raster data class SpatRaster, which can contain only a single layer of raster information or multiple layers (from separate files or from a single multi-layer file, respectively).

The function rast() can be used to create or read in SpatRaster objects.

# Create a SpatRaster object
(r1 <- terra::rast(ncol = 10, nrow = 10, xmax = -80, xmin = -150, ymax = 60, ymin = 20 ))
## class       : SpatRaster 
## dimensions  : 10, 10, 1  (nrow, ncol, nlyr)
## resolution  : 7, 4  (x, y)
## extent      : -150, -80, 20, 60  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84

We can access the attributes of each grid cell by using the function values(). Obviously, there are no values yet in the RasterLayer object and thus we assign some randomly.

# Inspect value in SpatRaster:
summary(terra::values(r1))
##      lyr.1    
##  Min.   : NA  
##  1st Qu.: NA  
##  Median : NA  
##  Mean   :NaN  
##  3rd Qu.: NA  
##  Max.   : NA  
##  NA's   :100
# Assign values (randomly drawn from normal distribution) to the SpatRaster object:
terra::values(r1) <- rnorm(ncell(r1))

# plot the raster
plot(r1)

We can easily create a multi-layer SpatRaster object using the c() (concatenate) function.

# Create another RasterLayer and assign values
r2 <- r1
terra::values(r2) <- 1:ncell(r2)

# Create multi-layer SpatRaster object
(r <- c(r1, r2))
## class       : SpatRaster 
## dimensions  : 10, 10, 2  (nrow, ncol, nlyr)
## resolution  : 7, 4  (x, y)
## extent      : -150, -80, 20, 60  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 
## source(s)   : memory
## names       :     lyr.1, lyr.1 
## min values  : -2.156362,     1 
## max values  :  2.593773,   100
names(r) <- c('Layer 1', 'Layer 2')
plot(r)

3.1 Read in raster data

In most cases, we will read in raster data from file. For this we can use the same commands as above, rast() for reading in both single-layer and multi-layer raster files.

First, download the temperature map for UK here to your data folder and read it in. The raster layer represents the mean annual in temperature [°C * 10].

# Read in SpatRaster object from file:
(temp <- terra::rast('data/UK_temp.tif'))
## class       : SpatRaster 
## dimensions  : 122, 66, 1  (nrow, ncol, nlyr)
## resolution  : 10000, 10000  (x, y)
## extent      : 0, 660000, 0, 1220000  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +units=m +no_defs 
## source      : UK_temp.tif 
## name        :   UK_temp 
## min value   :  35.71785 
## max value   : 112.00000
plot(temp)

Second, download the GeoTiff file containing multi-layer raster data for 19 bioclimatic variables of UK, and place it in your data folder. The bioclimatic variables are explained herehttps://www.worldclim.org/data/bioclim.html.

# Reading multi-layer SpatRaster data from file uses the same command as for single-layer data:
(bioclim <- terra::rast('data/UK_bioclim.tif'))
## class       : SpatRaster 
## dimensions  : 122, 66, 19  (nrow, ncol, nlyr)
## resolution  : 10000, 10000  (x, y)
## extent      : 0, 660000, 0, 1220000  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +units=m +no_defs 
## source      : UK_bioclim.tif 
## names       :      bio1,     bio2, bio3,     bio4,     bio5,      bio6, ... 
## min values  :  35.71785, 41.00000,   30, 3060.245, 134.0000, -55.48918, ... 
## max values  : 112.00000, 82.64219,   40, 5321.534, 231.5805,  43.00000, ...
plot(bioclim)

3.2 Download raster data

The geodata package is offering direct access to some standard repositories. Check them out in the help page ?geodata. The package provides access, for example, to elevation data, data on the global administrative boundaries (gadm) as well as current and future climates from WorldClim (Hijmans et al. 2005).

## class       : SpatRaster 
## dimensions  : 1080, 2160, 19  (nrow, ncol, nlyr)
## resolution  : 0.1666667, 0.1666667  (x, y)
## extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326) 
## sources     : wc2.1_10m_bio_1.tif  
##               wc2.1_10m_bio_2.tif  
##               wc2.1_10m_bio_3.tif  
##               ... and 16 more source(s)
## names       : wc2.1~bio_1, wc2.1~bio_2, wc2.1~bio_3, wc2.1~bio_4, wc2.1~bio_5, wc2.1~bio_6, ... 
## min values  :   -54.72435,     1.00000,    9.131122,       0.000,   -29.68600,   -72.50025, ... 
## max values  :    30.98764,    21.14754,  100.000000,    2363.846,    48.08275,    26.30000, ...
library(geodata)

# Download global bioclimatic data from worldclim (you may have to set argument 'download=T' for first download, if 'download=F' it will attempt to read from file):
(clim <- geodata::worldclim_global(var = 'bio', res = 10, download = F, path = 'data'))
plot(clim)

3.3 Manipulate raster data

The terra package offers functionality to manipulate the spatial data, for example aggregating the data to coarser resolutions (aggregate()), or cropping to a specific extent (crop()).

# Rough coordinates of European extent
ext_eur <- c(-15,45,35,72)

# Crop the temperature layer (bio1) to roughly European extent
temp_eur <- terra::crop(clim[[1]], ext_eur)

# Aggregate to one-degree and two-degree resolution (from currently 10-min resolution)
temp_eur_onedeg <- terra::aggregate(temp_eur, fact=6)  
temp_eur_twodeg <- terra::aggregate(temp_eur, fact=12)

# Plot the resultin maps
par(mfrow=c(1,3))
plot(temp_eur)
plot(temp_eur_onedeg)
plot(temp_eur_twodeg)

# Test other options like the na.rm argument:
# Aggregate to one-degree and two-degree resolution (from currently 10-min resolution)
temp_eur_onedeg <- terra::aggregate(temp_eur, fact=6, na.rm=T)  
temp_eur_twodeg <- terra::aggregate(temp_eur, fact=12, na.rm=T)

# Plot the resultin maps
par(mfrow=c(1,3))
plot(temp_eur)
plot(temp_eur_onedeg)
plot(temp_eur_twodeg)

3.4 Extract raster data

There are different ways for extracting information from raster layers. We have already worked with values(). If we have coordinate data, we can use these coordinates to “pierce” through raster layers. That’s one of the easiest ways to extract relevant environmental data for specific locations. Let’s first extract the climate data for Potsdam.

# Potsdam coordinates - we need these as two-column matrix (or data.frame or SpatVector)
lonlat_Potsdam <- matrix( c(13.063561, 52.391842), ncol=2)

# Extract temperature values for Potsdam:
terra::extract(temp_eur, lonlat_Potsdam)
##   wc2.1_10m_bio_1
## 1        9.229239
# Extract climate values for Potsdam:
terra::extract(clim, lonlat_Potsdam)
##   wc2.1_10m_bio_1 wc2.1_10m_bio_2 wc2.1_10m_bio_3 wc2.1_10m_bio_4
## 1        9.229239        8.012188        30.85883        687.4697
##   wc2.1_10m_bio_5 wc2.1_10m_bio_6 wc2.1_10m_bio_7 wc2.1_10m_bio_8
## 1         23.6355         -2.3285          25.964        17.79533
##   wc2.1_10m_bio_9 wc2.1_10m_bio_10 wc2.1_10m_bio_11 wc2.1_10m_bio_12
## 1        4.618125         17.79533          0.94125              546
##   wc2.1_10m_bio_13 wc2.1_10m_bio_14 wc2.1_10m_bio_15 wc2.1_10m_bio_16
## 1               65               34         20.14255              171
##   wc2.1_10m_bio_17 wc2.1_10m_bio_18 wc2.1_10m_bio_19
## 1              111              171              127

For further illustration, we want to extract raster data for several locations, specifically for different European capitals.

capitals <- data.frame(lon=c(2.333333, -3.683333, -0.083333, 12.583333, 10.75, 21, 23.316667, 24.1, 0), 
        lat=c(48.86666667, 40.4, 51.5, 55.66666667, 59.91666667, 52.25, 42.68333333, 56.95, 0),
        capital=c('Paris', 'Madrid', 'London', 'Copenhagen', 'Oslo', 'Warsaw', 'Sofia', 'Riga','Oops'))


# Map temperature and locations
plot(temp_eur)
points(capitals[,1:2], cex=2, pch=19)

# Extract temperature values at these locations
terra::extract(temp_eur, capitals[,1:2])
##   ID wc2.1_10m_bio_1
## 1  1       11.604875
## 2  2       14.344760
## 3  3       10.485146
## 4  4        8.415707
## 5  5        5.774378
## 6  6        8.199729
## 7  7        9.835594
## 8  8        6.548781
## 9  9              NA
# Combine data frames:
(capitals_clim <- cbind(capitals, terra::extract(temp_eur, capitals[,1:2])))
##         lon      lat    capital ID wc2.1_10m_bio_1
## 1  2.333333 48.86667      Paris  1       11.604875
## 2 -3.683333 40.40000     Madrid  2       14.344760
## 3 -0.083333 51.50000     London  3       10.485146
## 4 12.583333 55.66667 Copenhagen  4        8.415707
## 5 10.750000 59.91667       Oslo  5        5.774378
## 6 21.000000 52.25000     Warsaw  6        8.199729
## 7 23.316667 42.68333      Sofia  7        9.835594
## 8 24.100000 56.95000       Riga  8        6.548781
## 9  0.000000  0.00000       Oops  9              NA

Note: if no environmental data are available for a specific location, e.g. no temperature values are available in the ocean, then these locations will receive an NA value. Hence, you should always check your resulting data for NAs.

# Remove NAs in resulting data frame
na.exclude(capitals_clim)
##         lon      lat    capital ID wc2.1_10m_bio_1
## 1  2.333333 48.86667      Paris  1       11.604875
## 2 -3.683333 40.40000     Madrid  2       14.344760
## 3 -0.083333 51.50000     London  3       10.485146
## 4 12.583333 55.66667 Copenhagen  4        8.415707
## 5 10.750000 59.91667       Oslo  5        5.774378
## 6 21.000000 52.25000     Warsaw  6        8.199729
## 7 23.316667 42.68333      Sofia  7        9.835594
## 8 24.100000 56.95000       Riga  8        6.548781

Exercise:

  • Take the capital locations and extract also the temperature values at the aggregated scales (temp_eur_onedeg, temp_eur_twodeg). Compare the temperature values.

  • Take the capital locations and extract all 19 bioclimatic variables for these (from the SpatRaster object clim)

  • Find the coordinates of 10 other capitol cities and extract the climate values

  • Take a look at the description of Coordinate Reference Systems, CRS, by Robert Hijmans

  • What are CRS? How can I extract the CRS information from a SpatRaster object?

  • Which CRS is used in the temp_eur raster?

  • Reproject temp_eur to a different CRS following instructions on above-mentioned website

  • When is reprojection useful?

References

Hijmans, R. J., S. E. Cameron, J. L. Parra, P. G. Jones, and A. Jarvis. 2005. “Very High Resolution Interpolated Climate Surfaces for Global Land Areas.” International Journal of Climatology 25 (15): 1965–78. https://doi.org/10.1002/joc.1276.
With, Kimberly A. 2019. Essentials of Landscape Ecology. Oxford University Press. https://doi.org/10.1093/oso/9780198838388.001.0001.