ElevDistr

Process a single data point

Computing the distance to the treeline for a single point is the easiest task. Except for climatic layers and the elevation model, ElevDistr contains everything needed for this task. First of all, it is essential that two climatic RasterLayers are imported: one for growing season length (gsl) and one for growing season temperature (gst). I suggest to use the layers from CHELSA. The user is free to use different climatic layers, but it is highly recommended to customize the object pointsAboveTreeline when using layers other than the CHELSA rasters. The algorithm does not break when this adjustment is not done, but it is very likely that this has negative effects on the computation time.

gstURL <- paste0("https://os.zhdk.cloud.switch.ch/envicloud/chelsa/chelsa_V2/",
                 "GLOBAL/climatologies/1981-2010/bio/CHELSA_gst_1981-2010_V.2.1.tif")
gslURL <- paste0("https://os.zhdk.cloud.switch.ch/envicloud/chelsa/chelsa_V2/",
                 "GLOBAL/climatologies/1981-2010/bio/CHELSA_gsl_1981-2010_V.2.1.tif")

gst <- terra::rast(gstURL, vsi = TRUE)
gsl <- terra::rast(gslURL, vsi = TRUE)

Furthermore, a digital elevation model (DEM) needs to be imported. Therefore, I choose a part of the GMTED2010 courtesy of the U.S. Geological Survey. This works very similar as shown above.

gmted2010URL <- paste0("https://edcintl.cr.usgs.gov/downloads/sciweb1/shared/topo/downloads/GMTED/",
                      "Global_tiles_GMTED/300darcsec/med/E000/30N000E_20101117_gmted_med300.tif")

gmted2010Part <- terra::rast(gmted2010URL, vsi = TRUE)

The only thing that needs to be done now is to call the wrapper and specify a longitude, latitude and an elevation of the point of interest.
Note: All coordinates need to be in World Geodetic System 1984 (‘WGS-84’).

distance_to_treeline(lon = 8.65, lat = 46.87,  gstRaster = gst,  gslRaster = gsl, 
                     elevationRaster = gmted2010Part, elevation = 504, pointDf = pointsAboveTreeline, 
                     plot = FALSE, plotHist = FALSE, gstMin = 6.4, gslMin = 94)

If using only a few points, the user should plot a map and a histogram of the local treeline height. This can be achieved by changing the default parameters: plot = TRUE for the map and plotHist = TRUE for the histogram.
Please note that you need to register a Google API in order to be able to produce a map. A detailed explanation on how to do this can be found here: https://www.rdocumentation.org/packages/ggmap/versions/3.0.0. You can check with ggmap::has_google_key() if you have an already registered key and use the command ggmap::ggmap_show_api_key() to allow the wrapper downloading the map of interest.

#> Warning in plot_distr(point, temp$df, treeline, 12): Could not download a map.
#> Plot is not generated. Please enable the APIs. See:
#> https://www.rdocumentation.org/packages/ggmap/versions/3.0.0.

Example of treelinie distribution Furthermore, even though it is possible to change the treeline definition by changing the threshold for minimal growing season length and growing season temperature, this option should only be changed if you know what you are doing. Changed thresholds have a big impact on the computational output and are carefully chosen based on Paulsen and Körner (2014) doi:10.1007/s00035-014-0124-0.

Process GBIF data

After explaining the basics of the wrapper function distance_to_treeline, I will now show a example on how to calculate whole species distribution from GBIF data. This chapter gives the user a overview of how this R package could be used to process bigger data frames. Remember just because this tool is specially designed for dealing with the uncertainty of spatial data, it does not mean that this approach is a solution that fits for everything. Depending on your question you might want to change the approach.

Download data

To be able to show multiple data processing, the first thing needed is data. For this example, we have to download two species: Ranunculus pygmaeus and Ranunculus thora. Therefore, the data might be imported directly from an excel file or use the package rgbif. Here the fully automatically approach with rgbif is shown. However, it might be the case, if you are looking for many data points you hit the hard limit of 100,000 occurrence. In such a case downloading the csv file from the GBIF homepage cannot be avoided.

#install.packages("rgbif") #Remove hashtag if you have not installed this package
myspecies <- c("Ranunculus pygmaeus", "Ranunculus thora")
gbifData <- rgbif::occ_data(scientificName = myspecies, hasCoordinate = TRUE, limit = 20000)

To keep the example clear, I just keep the variables of interest: species name, longitude, latitude, elevation and the taxon rank.

temp1 <- gbifData$`Ranunculus pygmaeus`$data[, c("scientificName", "decimalLatitude", "decimalLongitude",
                                                 "elevation", "taxonRank")]
temp2 <- gbifData$`Ranunculus thora`$data[, c("scientificName", "decimalLatitude", "decimalLongitude",
                                              "elevation", "taxonRank")]

ranunculus <- rbind(temp1, temp2)

Filter data

Now we are ready to process the downloaded data frame. Here it is only presented how to exclude species with no meaningful elevation. However, depending on your data and your question this very basic filtering might not be sufficient. I recommend to check out the package CoordinateCleaner and think properly of what kind of potential bias you must exclude from your data set.
Here I exclude all data points that are below 0 meters or above 8,850 meters above the sea level or contain a NA. This is achieved using the Tidyverse package.

#install.packages("tidyverse") #Remove hashtag if you have not installed this package
library("tidyverse")
ranunculusFiltered <- ranunculus %>% filter(!is.na(elevation) & 0 < elevation & elevation < 8850)

Furthermore, I would like to keep only entries that have “species” as an assigned taxon rank. In this particular example, no excepted variations exist.

ranunculusFiltered <- ranunculusFiltered %>% filter(taxonRank == "SPECIES")

Down sampling

To speed up the computation, only 100 random samples are kept. Again, the user might choose a different approach dependent on the performed filtering and the wished output accuracy.

set.seed(42) #Set a seed for reproducible
ranunculusSampled <- ranunculusFiltered %>% group_by(scientificName) %>% slice_head(n = 100)

Compute distance to the treeline

The data is now almost ready for classification. The classification works in the same way as in the example with a single point. The only difference: now vectors for the longitude, latitude and the elevation are used as a input.

#Import climatic layers
gstURL <- paste0("https://os.zhdk.cloud.switch.ch/envicloud/chelsa/chelsa_V2/",
                 "GLOBAL/climatologies/1981-2010/bio/CHELSA_gst_1981-2010_V.2.1.tif")
gslURL <- paste0("https://os.zhdk.cloud.switch.ch/envicloud/chelsa/chelsa_V2/",
                 "GLOBAL/climatologies/1981-2010/bio/CHELSA_gsl_1981-2010_V.2.1.tif")

gst <- terra::rast(gstURL, vsi = TRUE)
gsl <- terra::rast(gslURL, vsi = TRUE)

#Import the DEM
gmted2010URL <- paste0("https://edcintl.cr.usgs.gov/downloads/sciweb1/shared/topo/downloads/GMTED/",
                      "Global_tiles_GMTED/300darcsec/med/E000/30N000E_20101117_gmted_med300.tif")

gmted2010Part <- terra::rast(gmted2010URL, vsi = TRUE)

#Run classification for the first five elements
elev <- distance_to_treeline(lon = ranunculusSampled$decimalLongitude[1:5], 
                             lat = ranunculusSampled$decimalLatitude[1:5], gstRaster = gst,
                             gslRaster = gsl, elevationRaster = gmted2010Part, pointDf = pointsAboveTreeline,
                             elevation = ranunculusSampled$elevation[1:5], plot = FALSE, 
                             plotHist = FALSE, gstMin = 6.4, gslMin = 94)
#> 
  |                                                                  
  |                                                            |   0%
  |                                                                  
  |============                                                |  20%
  |                                                                  
  |========================                                    |  40%
#> Warning in distance_to_treeline(lon = ranunculusSampled$decimalLongitude[1:5],
#> : The "gridSize" parameter was to small. The parameter was automatically
#> increased by 10 km.
#> 
  |                                                                  
  |====================================                        |  60%
  |                                                                  
  |================================================            |  80%
#> Warning in distance_to_treeline(lon = ranunculusSampled$decimalLongitude[1:5],
#> : The "gridSize" parameter was to small. The parameter was automatically
#> increased by 5 km.
#> 
  |                                                                  
  |============================================================| 100%

Be aware that the computation time for each point is roughly three seconds, the time increases linear with the amount of input points; \(O(n)\). Filtering and parallelization might be a suitable method of reducing the computation time.
The result stored in elev is now the computed distance to the closest local treeline.

elev
#> [1] NA NA NA NA NA

Increase the speed

The way the three raster layers are loaded in this tutorial, is not optimized for speed. If you would like to increase the speed it is recommended to download the layers from CHELSA:
-gst
-gsl
and the U.S. Geological Survey:
-GMTED2010
If a digital elevation model of the whole globe is needed the GTOPO30.tif can be down loaded from my Google Drive. GTOPO30 courtesy of the U.S. Geological Survey. All the layers can be importer by using:

temp <- terra::rast("your_raster_layer.tif")

How to customize “pointsAboveTreeline”

As stated in the first paragraph: If users would like to use different climatic data as presented in the tutorial, this is possible. However, to not increase the computation time drastically, it is recommended to recalculate the data frame pointsAboveTreeline.
After importing the raster layer of your choice into R, you can use the function generate_point_df.

#Import raster of your choice
gst <- terra::rast(file = "your_gst_layer.tif")
gsl <- terra::rast(file = "your_gsl_layer.tif")

#Compute new data frame
newPointsAboveTreeline <- generate_point_df(gstRaster = gst, gslRaster = gsl, stepSize = 0.0416666, 
                                            gstTreshold = 6.4, gslTreshold = 94)

#Save the output
save(newPointsAboveTreeline, file = "newPointsAboveTreeline.Rdata")

Please note that the computation take a while, but it needs to be performed only once.

Furthermore, depending on your task you might want to exclude all the areas that are that do fall outside of the mountain polygons GMBA, as we did it for pointsAboveTreeline. Therefore, download the shape file from the link above and proceed like the following:

#Load the polygons of all the mountains (from the GMBA project)
mountain_polygons <- terra::vect("GMBA mountain inventory V1.2(entire world)/
                                 GMBA Mountain Inventory_v1.2-World.shp")

#Keep only the points that are in a alpine polygon
#"keep" will be a vector containing all row numbers that contain coordinate, which lie in the Alps
keep <- terra::is.related(terra::vect(newPointsAboveTreeline, geom = c("longitude", "latitude")),
                          mountain_polygons, "intersects") |> which() 

newPointsAboveTreeline2 <- newPointsAboveTreeline [keep,] #Pick the lines of interest

How the wrapper works

In case that users are interested how the algorithm works, in this paragraph is explained what happens behind the scene. For simplicity, the process is explained using the same data as in the first paragraph.
First the wrapper searches for the closest point above the treeline in the data frame pointsAboveTreeline (or what ever the user feeds into the wrapper). The closest point is searched by using a k-dimensional tree and a nearest neighbor search. Therefore, the function get_nearest_point calls the package RANN. It is important to find the closest point above the treeline, because we need alpine points and non alpine points in the next steps.

pointAbove <- get_nearest_point(lon = 8.65, lat = 46.87, pointDf = pointsAboveTreeline)
pointAbove
#> $lon
#> [1] 8.728898
#> 
#> $lat
#> [1] 46.93756

After getting a point who lies above the treeline, a grid is generated. The grid center is the point we just extracted. The width (in km) and the distance between the points (in degrees) are defined by the arguments.

grid <- generate_grid(lon = pointAbove$lon, lat = pointAbove$lat, squareSize = 10, stepSize = 0.0025)
head(grid$df)
#>   longitude latitude
#> 1  8.663898 46.89251
#> 2  8.663898 46.89501
#> 3  8.663898 46.89751
#> 4  8.663898 46.90001
#> 5  8.663898 46.90251
#> 6  8.663898 46.90501

Now for each grid point the growing season length and growing season temperature is extracted. Next, based on the selected thresholds, it is evaluated if the point is above or below the treeline. The extracted information is added to the existing data frame from the generate_grid output.

grid$df <- classify_above_treeline(coords = grid$df, gstRaster = gst, gslRaster = gsl,
                                   gstTreshold = 6.4, gslTreshold = 94)
head(grid$df)
#>   longitude latitude growingSeasonTemperature growingSeasonLength aboveTreeline
#> 1  8.663898 46.89251                     8.95                 133         FALSE
#> 2  8.663898 46.89501                     8.95                 133         FALSE
#> 3  8.663898 46.89751                     8.95                 133         FALSE
#> 4  8.663898 46.90001                     8.95                 135         FALSE
#> 5  8.663898 46.90251                     8.95                 135         FALSE
#> 6  8.663898 46.90501                     8.95                 135         FALSE

Based on the classification if a point is above or below the treeline is drawn: two points with a different classification lay next to each other, a vertical or a horizontal line is drawn in the middle of both points. Points classified as NA are ignored. The lines are stored in a data frame: an identifier, a start latitude and longitude, an end latitude and longitude.

treelineDf <- sample_treeline(df = grid$df, lonLength = grid$lonLength, latLength = grid$latLength)
head(treelineDf)
#>              id     lat1     lon1     lat2     lon2
#> 2 horizontal195 46.91626 8.675148 46.91626 8.677648
#> 3 horizontal198 46.92376 8.675148 46.92376 8.677648
#> 4 horizontal232 46.91626 8.677648 46.91626 8.680148
#> 5 horizontal235 46.92376 8.677648 46.92376 8.680148
#> 6 horizontal269 46.91626 8.680148 46.91626 8.682648
#> 7 horizontal272 46.92376 8.680148 46.92376 8.682648

After this step, the wrapper contains a safety feature: whenever there are less than ten treeline elements, the last three functions are recalculated, but with a bigger grid (+5 km). This is implemented in case the user selects different climatic raster layers with the functions generate_point_df and classify_above_treeline. Mathematically it is possible that this leads to only a few or no points above the treeline (no points would cause multiple errors). After this safety loop the plot argument is evaluated. If it is = TRUE a map is generated by calling the function plot_distr.

plot_distr(nearestCorner = pointAbove, grid = grid$df, treelineDf = treelineDf, size = 12)

In the final step, the treeline is sampled with a step size defined in the input argument treelineSampling. From all the drawn points the elevation is extracted from the elevationRaster that contains a digital elevation model. From all the elevations the median is calculated and from the pointElevation the estimated treeline elevation is subtracted. This function (if desired) also plots a histogram of the sampled elevation.

calculate_distance(treeline = treelineDf, elevationRaster = gmted2010Part, pointElevation = 504,
                   treelineSampling = 10, plot = FALSE)
#> [1] -1491

All the functions called by the wrapper can, if desired, be called independently. In my opinion this does not make a lot of sense for the last two functions, but I leave this decisions to the users.