Chapter 3 Spatial Data
3.1 Overview
Previous research has identified a suite of biophysical predictors important in predicting the distribution and habitat use of sea ducks and coastal birds (Table 1). However, likely owing to the international border dividing the Salish Sea, many of these environmental/biophysical layers have yet to be compiled or used for trans-boundary habitat modeling. Furthermore, there are numerous available data sets that have yet to be tested in predicting the distribution of sea ducks that could be important (Table 2). The purpose of this chapter is to give an overview of spatial data that can support international sea duck habitat models in the Salish Sea. At the end of this chapter the user will know:
What spatial data is available for modeling in the Salish Sea
Where to get the data
The limitations of the data (e.g., spatial resolution and temporal scale)
How to work with different types of data
Table 1. Summary of biophysical predictors used in modeling sea duck distribution and habitat use.
Covariate | Type | Citations |
---|---|---|
Net Primary Productivity | Aquatic | Lamb et al. 2020; Rickbeil et al. 2014 |
Sea Surface Temperature | Aquatic | Lamb et al. 2020; Rickbeil et al. 2014; Zipkins et al. 2010 |
Salinity | Aquatic | Lamb et al. 2020 |
Sediments | Aquatic | Rickbeil et al. 2014 |
Distance to Nearest Shoreline | Spatial | Lamb et al. 2020 |
Bottom Depth (Bathymetry) | Spatial | Lamb et al. 2020; Zipkins et al. 2010 |
Slope/Ocean Floor Topography | Spatial | Lamb et al. 2020; Zipkins et al. 2010 |
Bay or Otherwise | Spatial | Zipkins et al. 2010 |
Climate During Migration | Climate | Zipkins et al. 2010 |
Monthly Winter Temperature | Climate | Rickbeil et al. 2014 |
Monthly Winter Precipitation | Climate | Rickbeil et al. 2014 |
Shore Zone | Terrestrial | Rickbeil et al. 2014 |
Latitude | Spatial | Lamb et al. 2020; Zipkins et al. 2010 |
Longitude | Spatial | Lamb et al. 2020 |
Spatial Autocorrelation | Other | Zipkins et al. 2010 |
Temporal Autocorrelation | Other | Zipkins et al. 2010 |
Offset or Covariate for Effort | Other | Zipkins et al. 2010; Michel et al. 2021 |
Table 2. Summary of potential biophysical predictors not previously tested in modeling sea duck distribution and habitat use.
Covariate | Type |
---|---|
Sea Vessel Traffic | Spatial |
Bottom Patches | Spatial |
Distribution of Prey | Spatial |
Aquatic Vegiation | Spatial |
3.2 Available Data
Below you will find a collection of available data sets that could be used in sea duck habitat suitability analysis. The data sets provided here have been guided by previous literature and may not be comprehensive. When available newer or better data sets have been included to replace out of date sources. Some covariates in Table 1 have been excluded as they are specific to the research being conducted (latitude/longitude, autocorrelation, and effort). A suite of data sets that have not been previously tested have also been included here. Not all of these data sets are trans boundary in nature but have been included for special use cases and could be updated if and when a corresponding data set becomes available.
3.2.1 Net Primary Productivity
Net Primary Productivity (NPP), the measure of energy or biomass accumulation by primary producers, influences the distribution of consumers at greater trophic levels. With the launch of ocean-observing satellites, reasonable estimates of ocean primary production have become available. Numerous models exist for determining NPP, however, the vertical general productive model (VGPM) is a common standard and is a function of chlorophyll, available light, and photosynthetic efficiency (O’Malley 2012). Alternatively, the concentration chlorophyll-a has been used as an index of NPP as it related to phytoplankton biomass (Moses et al. 2009).
3.2.1.1 Data Sources
Dataset | Format | Unit | Resolution (km2) | Timescale | Source |
---|---|---|---|---|---|
Vertical General Productive Model | HDF | mgC m-2 day-1 | 100 | Monthly or 8day: 2002-2021 | O’Malley 2012 |
Chlorophyll-a | netCDF (.nc) | mg m-3 | 1 | Daily: 2002-Present | MODIS |
3.2.1.2 Data Use Considerations
Both datasets are freely available to download and use from their respective links in the table above. User sign up might be required.
The VGPM has a few flaws that might limit its usability for habitat suitability modeling. Firstly, its resolution of approximately 10km by 10km is a bit coarse. Furthermore, the model is limited to calculations south of 49o N in December and 53o N in January. With the northern most point of the Salish Sea Bio-region being at 50.8o N the VGPM would not cover the full extent of the study area of interest.
Chlorophyll-a data from MODIS on the other hand comes with better resolution and timescale, however, this comes at a cost of file size and the number of files which can be cumbersome to deal with. Furthermore, even though the data set has a daily temporal scale, cloud cover can lead to missing data. To overcome this issue one could use MODIS Level 3 data that provides averages at 8-days or monthly at a resolution of 4 km which could be problematic depending on the time scale you might be interested in.
3.2.2 Sea Surface Temperature
Sea Surface Temperature (SST), can also influence the distribution of species based on their thermal needs (Lamb et al. 2020; Zipkins et al. 2010).
3.2.2.1 Data Sources
Dataset | Format | Unit | Resolution (km2) | Timescale | Source |
---|---|---|---|---|---|
MODIS Aqua 11 um Day/Night Sea Surface Temp | netCDF (.nc) | C | 1 | Daily: 2002-Present | MODIS |
AVHRR Pathfinder SST | netCDF (.nc) | C | 16 | Daily: 1981-Present | NOAA/NASA AVHRR Pathfinder Program |
MUR-SST | netCDF (.nc) | C | 1 | Daily: 2003-Present | PODAAC |
3.2.2.2 Data Use Considerations
MODIS, Pathfinder, and MUR-SST data are freely accessible to download at their respective links above. MODIS data comes in a higher resolution but might be more cumbersome to work with given there are two satellite passes per day resulting in large file downloads. Pathfinder data is ready to use immediately and has day and night averages. Furthermore, like Chlorophyll-a data from MODIS, daily data can often be incomplete depending on cloud cover in the region of interest on a given day. An alternative would be to use MODIS level 3 data which averages over 8 day or monthly periods but at the cost of a lower resolution of 4km.
MUR-SST data from the Physical Oceanography Distributed Active Archive Center is possibly a good solution to overcome the problems with the first two data sources. This data set uses data from multiple sensors that have been calibrated with each other to provide SST data that is consistant in time and space. This allows for high spatial resolution without data-voids due to cloud contamination.
3.2.3 Salinity
Salinity has been shown to influence sea duck distribution. Lamb et al. (2020) found that during the winter 4 species of sea duck selected positively for salinity.
3.2.3.1 Data Sources
Dataset | Format | Unit | Resolution (km2) | Timescale | Source |
---|---|---|---|---|---|
SMAP Salinity V5.0 | netCDF (.nc) | g kg-1 | 1600/4900 | 8-Day: 2015-Present | Remote Sensing Systems |
3.2.3.2 Data Use Considerations
The SMAP Salinity V5.0 provides global coverage of sea surface salinity at a temporal scale of 8 days and at scales of 40km or 70km resulting in very poor spatial resolution (1600 km2 and 4900 km2 respectively). Furthermore, the 70km product is considered the official product and is based on smoothing of the 40 km product which can be noisy. Given this it would be very difficult to use this layer in a meaningful way to model sea duck habitat usage in the Salish Sea.
3.2.4 Distance to Nearest Shoreline
Lamb et al. (2020) identified distance to nearest shore as a predictor in sea duck distribution during winter, however, this differed among species. This covariate was included along with bottom slope and depth as a proxy for fine-scale variation in things like currents and eddies.
3.2.4.1 Data Sources
Dataset | Format | Resolution (km) | Timescale | Source |
---|---|---|---|---|
World Vector Shoreline | Shapefile (.shp) | NA | Static | Global Self-consistent Hierarchical High-Resolution Database |
Prototype Global Shoreline Data | Shapefile (.shp) | NA | Static | National Geospatial-Intelligence Agency |
3.2.4.2 Data Use Considerations
Both data sets are freely available for download at their respective links in the above table. Word Vector Shoreline (WVS) data is in the form of land polygons whereas the Prototype Global Shoreline (PGS) data comes in the form of lines. The PGS data is based on 2000 Landsat imagery (~30m resolution) which is finer than the WVS which at it’s finest scale is about ~2.5km. Both data sets are old with newer imagery only being used to fill in gaps.
3.2.5 Bottom Depth (Bathymetry) and Ocean Floor Topography (Slope)
Bottom depth and topography have been shown to influence sea duck distribution likely due to a number of factors. Mostly because these species rely heavily on benthic organisms, such as mollusks, sea ducks will prefer shallower areas for foraging where they can reach the sea floor that is supportive of prey populations (Lamb et al. 2020; Zipkins et al. 2010). Bottom depth and topography may also act as proxies for finer scale processes such as currents and eddies (Lamb et al. 2020).
3.2.5.1 Data Sources
Dataset | Format | Unit | Resolution (km2) | Timescale | Geography | Source |
---|---|---|---|---|---|---|
2-minute Gridded Global Relief Data, (ETOPO2) v2 | geoTiff (.geotiff) or netCDF (.nc) | m | 0.2025 | Static | US/CAD | National Centers for Environmental Information |
Coastal Relief Model US | netCDF (.nc) | m | 0.0081 | Static | US | National Centers for Environmental Information |
Canadian Hydrographic Service Non-Navigational (NONNA) Bathymetric Data | geoTiff (.geotiff) and others | m | 0.1 or 1 | Static | CAD | Canadian Hydrographic Service |
Salish Sea Bathymetry Basemap | geoTiff (.geotiff) | m | 0.0081 | Static | US/CAD | Salish Sea Atlas (Western Washington University) |
3.2.5.2 Data Use Considerations
All data sets are freely available for download using their respective links above. Ocean floor topography can be calculated using bathymetry and guidelines on how to do this will be covered in the data use guide below.
Both the US Coastal Relief Model and the Canadian Hydrographic Service data are included for completeness, however, these data sets are not trans-boundary in nature and are at different spatial scales which limits their use for trans-boundary sea duck modeling.
ETOPO2 bathymetry data is a global data set at a resolution of 15 Arc-Seconds (~450m) which would be a suitable solution to using different data sets from the US and Canada. Another alternative which is specific to the Salish Sea Bioregion is bathymetry data from Western Washington University’s Salish Sea Atlas. This data set was compiled using two different data sets from the US and Canada and is at a finer resolution than ETOPO2 data (3 Arc-seconds or ~90m). This data set is only suitable to use for work done within the Salish Sea Bioregion as the raster has been clipped to that boundary.
3.2.6 Climate
Climatic variables such as the North Atlantic Oscillation (NAO) index has been shown to influence the distribution of wintering sea ducks (Zipkins et al. 2010). The NAO index is related to the climatic variability and has been shown to affect the marine environment and ultimately ecological processes in plants and animals. Positive values are associated with more winter storms of greater intensity. More specific to the pacific region is the Pacific Decadal Oscillation (PDO) which will be included here as it pertains to the Salish Sea. Rickbeil et al. (2014) found weak support for atmospheric variables (temperature and precipitation) predicting occurrence, however, these are included below for reference.
3.2.6.1 Data Sources
Dataset | Format | Unit | Resolution (km2) | Timescale | Source |
---|---|---|---|---|---|
Pacific Decadal Oscillation (PDO) | tabular | NA | NA | Monthly: 1854-present | National Centers for Environmental Information |
ClimateNA | Multiple | Multiple | 0.64 | Annual, Seasonal, Yearly | Wang et al. 2016 |
3.2.6.2 Data Use Considerations
All data sets are freely available using their respective links above. The PDO provides a single monthly index for the Pacific Ocean. ClimateNA on the other hand allows you to extract many monthly, seasonal, and annual climate variables (e.g Mean Temperature and Precipitation) from free point locations. This includes historical and future predicted values. The data can be accessed using the ClimateNA software, API, or R package. The major limitation to both of these data sources is that the finest temporal resolution is down to a given month.
3.2.7 Shore-zone
Rickbeil et al. (2014) found that including more detailed shore-zone data from the Physical shore-zone mapping system for British Columbia provided modest improvements over only using remotely sensed environmental data for modeling the distributions of coastal bird species in BC.
3.2.7.1 Data Sources
Dataset | Format | Unit | Resolution (km2) | Timescale | Geography | Source |
---|---|---|---|---|---|---|
Physical shore-zone mapping system for British Columbia | wms/kml | Categorical | Static | CAD | Howes et al 1997 | |
Washington State ShoreZone Inventory | geodatabase (.gdb) | Categorical | Static | US | Washington State Department of Natural Resources |
3.2.7.2 Data Use Considerations
Shore-zone mapping of both British Columbia and Washington was both done during the late 1990s using a comparable methodology based on Howes (1994). At the time of writing this the British Columbia shore-zone data is difficult to access and work with. Many links to download the data are broken leaving the only available data to be accessed via WMS or .kml. WMS is image based which makes extracting the data extremely difficult and the .kml option has known issues. Alternatively, the Washington data is easy to access in the form of a geodatabase at the link in the above table. Both data sets should be able to be used together (when available) but one important thing to note is that given their age they may not be representative of today’s shore zone.
3.2.8 Sea Vessels
3.2.8.1 Data Sources
Dataset | Format | Timescale | Geography | Source |
---|---|---|---|---|
Vessel Traffic Routes | shapefile (.shp) | Static | CAD/US | Fisheries and Oceans Canada |
Shipping Fairways, Lanes, and Zones for US waters | shapefile (.shp) | Static | US | National Oceanic and Atmospheric Administration |
WSDOT - Ferry Routes | shapefile (.shp) | Static | US | Washington State Department of Transportation |
AIS Vessel Transit Counts | geoTiff (.geotiff) | 2015-2021 | US/CAD | U.S. Coast Guard |
3.2.8.2 Data Use Considerations
Both Canada and the US have data on sea vessel routes within the Salish Sea. The Canadian Vessel Traffic Routes data set contains information on ferry routes, mandatory direction of traffic flow, separation lines and zones. To be able to use this data in modeling sea duck distribution one would have to do a bit of research to understand what all of the different layers mean using the Canadian Chart 1 Symbols, Abbreviations and Terms. This is not as simple as using the Shipping Fairways, Lanes, and Zones for US waters which clearly shows shipping lanes in Washington. Washington State ferry routes have been included as they are not shown in the Shipping Fairway data set.
The AIS Vessel Transit counts is an interesting data set that is primarily meant to be US but also spans into the Canadian side of the Salish Sea. This data set is a summary of each time a vessel track passes through a 100m grid cell. Ultimately showing areas of high or low traffic. Furthermore this can be broken down by vessel type: All, Cargo, Fishing, Passenger, Pleasure/Sailing, Tanker, and Tug/Tow.
3.2.9 Bottom Patches
3.2.9.1 Data Sources
Dataset | Format | Unit | Timescale | Geography | Source |
---|---|---|---|---|---|
Nearshore Bottom Patches for Pacific Canada | shapefile (.shp) | Categorical | Static | CAD | Fisheries and Oceans Canada |
3.2.9.2 Data Use Considerations
The nearshore bottom patches data set for the Pacific Coast of Canada provides continuous substrate map to a depth of 50m off the coast of BC. Substrate is a key indicator of habitat in this important ecosystem where data collection is challenging and expensive. Horizontal accuracy of this data ranges from meters to 10s of meters due to source data and data processing. Areas with higher data density are likely to be more accurate. Unfortunately, at this time there does not seem to be an analogous product for the southern Salish Sea. Habitat mapping has been done here but only in select locations that does not cover the entirety of the southern Salish Sea (see here).
3.2.10 Distribution of Prey
Dataset | Format | Unit | Timescale | Geography | Source |
---|---|---|---|---|---|
Commercial Shellfish Harvest Sites | shapefile (.shp) | NA | Current | US | Washington State Department of Health |
British Columbia Aquaculture | CSV | NA | Current | CAD | Fisheries and Oceans Canada |
Pacific Herring spawn index | CSV | NA | 1951-2022 | CAD | Fisheries and Oceans Canada |
Herring Spawning Locations | shapefile (.shp) | NA | US | Washington Dept of Fish and Wildlife | |
Southern Salish Sea Herring Biomass | Excel (.xlsx) | Metric Tons | 1973-2002 | US | Washington Dept of Fish and Wildlife |
3.2.10.1 Data Use Considerations
Aquaculture locations in British Columbia and Washington portions of the Salish Sea are both available from their respective links above. Data from Washington are provided in a spatial point layer whereas BC data is provided in a tabular format which contains latitude and longitude coordinates which can easily be converted to a spatial point layer as well. Both data sets simply provide spatial locations of aquaculture locations with the BC data set differing slightly as it includes other aquaculture locations in addition to shellfish. Only current locations of license holders are shown which do not appear to include historical locations.
Fisheries and Oceans Canada provides a yearly Herring Spawn Index for approximately 300 sections along the coast of British Columbia. This is a relative index of Herring Spawn biomass. There is currently no analogous data set for the United States, however, actual biomass has been estimated by the Washington Department of Fish and Wildlife for the Puget Sound. Furthermore, for the same region there is a data set of spawning habitat although survey dates go as far back as 1969 and may not be representative today and there does not appear to be an equivalent data set for BC.
3.2.11 Aquatic Vegetation
Dataset | Format | Timescale | Geography | Source |
---|---|---|---|---|
British Columbia Marine Conservation Analysis | shapefile (.shp) | 2006-2013 | CAD | BC Conservation Foundation |
Eelgrass Extent - Coastal British Columbia | shapefile (.shp) | 2016 | CAD | Hakai Institute |
Washington State ShoreZone Inventory | shapefile (.shp) | 1994-2000 | US | Washington State Department of Natural Resources |
Washington Marine Vegetation Atlas | geoJSON | US | Washington State Department of Natural Resources |
3.2.11.1 Data Use Considerations
The British Columbia Marine Conservation Analysis provides a suite of data on the aquatic vegetation in British Columbia’s coastal waters. Available data includes information on Kelp, Surfgrass, Ditch Grass, and Eelgrass. Data sets vary and contain multiple sources per species (e.g. Eelgrass polygons, Priority Eelgrass Habitat, and Eelgrass Biobands). Metadata for each of these different data sets can be found here.
A predicted data set representing polygons of Eelgrass for the BC Coast is available from the Hakai Institute. Currently data is not directly downloadable, however, a data request can be made to data@hakai.org. This data set uses coastline ShoreZone and bathymetry data to predict Eelgrass extent. More information can be found using the link in the above table.
The Washington State ShoreZone Inventory contains data on aquatic vegetation for Washington state. Data from this survey is old coming from aerial video collected in 1994-2000. Information on Eelgrass, Dunegrass, Surfgrass, and Kelp is available in a linear format indication areas of continuous or patchy presence along the shore.
The Washington Marine Vegetation Atlas provides spatial information on the vegetation that grows in nearshore areas for the southern Salish Sea. This data set consists of 2000 polygon areas that have been surveyed from 2 to 56 times and the presence or absence of different vegetation is reported (Seagrass, Kelp, and Macroalgae). Because this data only tells you presence absence information on abundance is not provided. Data is not easily downloadable, however, it can be queried or downloaded using R or GIS software. Below you will find instructions on accessing the data using R.
3.2.12 Additional Data Sources
Below is a table with a list of additional data sources not covered above which may or may not be useful in modeling sea duck distribution in the Salish Sea. Each source contains a link where you can download and discover additional data and notes are provided to give an overview of what is available.
Additional Data Source | Notes |
---|---|
Salish Sea Atlas | The Salish Sea Atlas from Western Washington University is an open access digital book that contains cultural and environmental geospatial data sets for the Salish Sea Bioregion. Some data sets included are sea depth (mentioned above), landcover, total annual precipitaiton, and impervious surfaces. |
Bio-ORACLE | Bio-ORACLE is a data source for GIS Rasters that provide global geophysical, biotic, and environmental data for marine surface and benthic areas. Data is proved at a resolution of approximately 9.2 km squares. Example data sets included are Chlorophyll concentration, current velocity, sea surface temperature, and salinity. Furthermore, data can be downloaded using the R package ‘sdmpredictors.’ |
The British Columbia Marine Conservation Analysis (2006–2013) | In addition to information on aquatic vegetation, the BCMCA contains a suite of additional data sets for BC coastal waters including tidal currents, Shorezone exposure, algae, invertebrates, birds, and mammals to name a few. It is important to note that data is > 10 years old and is only available for BC. |
Washington Coastal Atlas | In addition to information on aquatic vegetation, the Washington Coastal Atlas contains many other useful data sources specific to Shoreline, Ocean, Wetland, Administrative, and Land Cover. The interactive map allows you to view data and links to downloading (when available) are provided in layer information. |
BC Coastal Resource Information Management System | The B.C. Coastal Resource Information Management System (CRIMS) is an interactive tool for viewing and downloading coastal and marine data for British Columbia. The development of this tool is ongoing meaning new layers can be added as they become available. |
Salish Sea Key Sites | A derived product based on the Puget Sound Ambient Modeling Project data, expert opinion, and other sources. This identifies areas within the Salish Sea that are most important to sea ducks, and can be used as a baseline to be compared with current occurrence data of sea ducks. Analyses can update the key site atlas and identify additional or shifting key sites. More information can be found here |
3.3 Working With Different Data Sets in R
Working with multiple types of geospatial data sets like the ones mentioned above can be quite challenging especially if the user does not have access or a great understanding of software like ArcGIS/QGIS. Furthermore, doing any analysis using GIS software can be difficult to reproduce. R as a GIS tool has advanced considerably in the last 10 years and has the added benefit of being flexible, reproducible, and open source. In addition, many ecologists rely on R for statistical analysis of their data which makes a strong argument for using R for an entire analysis (data extraction, manipulation, and analysis). Here we will go over some useful ways to interact and extract data from the various types of data sets mentioned above to provide the user with a starting point for their analysis.
3.3.1 Getting Started
To get started there are a number of useful packages worth mentioning that will be used in the following examples. Depending on the user’s skill set some of these packages may be familiar or completely new.
- The
sf
package provides simple feature access in R. This package works best for working with spatial data (point, line, polygon, multipolygon etc) associated with tabular attributes (e.g shapefiles). You may be familiar with thesp
package that has similar functionality in a different format, however, this package is heading for retirement by the end of 2023 and does not support integration withtidyverse
which is very popular among data scientists in R. - The
terra
package is essentially a modern version of theraster
package but faster and more flexible. Furthermore, theraster
package currently relies on packages that are being depreciated along withsp
at the end of the year.
Download and then load the required packages:
### Install Packages
#devtools::install_github('rstudio/leaflet') ### Use development version
#install.packages("leaflet.extras")
#install.packages("leaflet.providers")
#install.packages("tidyverse")
#install.packages("sf")
#install.packages("terra")
#install.packages("geojsonsf")
#install.packages("exactextractr")
### Load required packages
require(leaflet)
require(leaflet.extras)
require(leaflet.providers)
require(tidyverse)
require(sf)
require(terra)
require(geojsonsf)
For the purpose of this tutorial we will focus specifically on data
within the Salish Sea ecoregion. A copy of the shapefile can be
downloaded from the Salish Sea
Atlas or can simply be read
into R using the geojsonsf
package which allows for reading in spatial
data directly from the web in geojson format. You can view an
interactive map of the boundary using the leaflet
and supporting
packages. At time of writing this the CRAN version of leaflet
was not
working with the terra
package. If you encounter this problem please
download and use the development version
devtools::install_github('rstudio/leaflet')
.
### Read in Salish Sea Boundary from arcgis service
SalishSeaBoundary<-geojson_sf("https://services.arcgis.com/qboYD3ru0louQq4F/arcgis/rest/services/Salish_Sea_Bioregion_Boundary/FeatureServer/0/query?outFields=*&where=1%3D1&f=geojson") %>% st_transform(4326)
### View interactive map of boundary
leaflet(width = "100%") %>%
addTiles() %>%
addPolygons(data=SalishSeaBoundary,color = "black", weight = 2, smoothFactor = 1,opacity = 1.0, fillOpacity = 0.5, fillColor = "darkgreen") %>% addFullscreenControl() %>%
addLegend(colors = c("darkgreen"),labels = c("Salish Sea Bioregion"),position = "bottomright")
We are also going to need some pretend survey points for which we will use to extract spatial covariates. Imagine this data set is the location and abundance of Black Scoters during a given winter survey. We can read in the data and have a look at it here:
### Read in Salish Sea Boundary from arcgis service
BLSC <- read_sf("Data\\Points\\POINT.shp")
### View interactive map of boundary
leaflet(width = "100%") %>%
addTiles() %>%
addMarkers(data = BLSC,popup = ~paste0("Abundance: ",as.character(Abundance))) %>% addFullscreenControl()
We might also want to summarize our survey area into a spatial grid
rather than individual points for the purpose of modeling and
predicting. This can easily be done using the sf
package. To do this
we need to make sure to we are using a coordinate system that is in the
units we want. To make a 1km grid we can convert to UTM Zone 10 (26910)
before creating the grid. We convert back to WGS84 simply for viewing on
the map.
### Transform to UTM 10 make grid 1km in size and convert to spatial dataframe
grid <- BLSC %>% st_transform(26910) %>% st_make_grid(1000, what = "polygons", square = FALSE) %>% st_sf() %>% st_transform(4326)
### View newly created grid
leaflet(width = "100%") %>%
addTiles() %>%
addPolygons(data=grid,color = "black", weight = 1, smoothFactor = 1,opacity = 1.0, fillOpacity = 0) %>%
addMarkers(data = BLSC,popup = ~paste0("Abundance: ",as.character(Abundance))) %>% addFullscreenControl()
Now that we have some starting data to play with we can begin to extract spatial covariates for them.
3.3.2 Working with NetCDF (.nc) Covariates
Many of the data sets mentioned above come in netCDF format which is a
very widely used format for storing and sharing scientific array data
including raster data. To download data from some of the sources above
you may need a set of coordinates to refine your search. This can be
done using sf
package to calculate the bounding box this can be done
on your study area (SalishSeaBoundary) or based on the data you have
(BLSC).
Here
is a link as an example of downloading MUR SST within the Salish Sea
Ecoregion boundary.
## xmin ymin xmax ymax
## -125.92876 46.59306 -120.65461 50.86204
## xmin ymin xmax ymax
## -123.04885 48.90451 -122.78620 49.04377
NetCDF data can be read using the terra
package and easily visualized
using leaflet. For exploration purposes the code below also includes a
toggle to add the BLSC points and the grid we created.
### Read in Data NetCDF
SST <- terra::rast("Data\\jplMURSST41_862f_217f_f62b.nc")
### Set colour scale
pal <- colorNumeric(c("#0C2C84", "#41B6C4", "#FFFFCC"), values(SST),
na.color = "transparent")
### Plot the data to view
leaflet(width = "100%") %>%
addTiles() %>%
addRasterImage(SST, colors = pal, opacity = 0.8, group = "SST") %>% addLegend(pal = pal, values = values(SST), title = "Average Surface temp") %>% addPolygons(data=grid,color = "black", weight = 1, smoothFactor = 1,opacity = 1.0, fillOpacity = 0, group = "Grid") %>% addMarkers(data = BLSC,popup = ~paste0("Abundance: ",as.character(Abundance)), group = "BLSC") %>% addFullscreenControl() %>% addLayersControl(
overlayGroups = c("SST", "Grid","BLSC"),
options = layersControlOptions(collapsed = FALSE)
) %>% hideGroup("Grid")
Next we will want to extract these covariates and add them to our
respective data sets. This can be done using the terra
package and the
extract()
function. Points will just get the value attributed to them
whereas polygons will require some sort of function to account for
multiple raster values being within them. Typically for this scenario
using the mean is appropriate.
### Do extraction for survey data
BLSC <- extract(SST,BLSC,bind=T) %>% st_as_sf()
grid <- extract(SST,grid,fun="mean",bind=T) %>% st_as_sf()
### See added column with data
head(BLSC)
## Simple feature collection with 6 features and 2 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -123.0275 ymin: 48.98027 xmax: -122.8367 ymax: 49.04377
## Geodetic CRS: WGS 84
## Abundance analysed_sst geometry
## 1 44 7.298 POINT (-122.9805 49.01849)
## 2 86 7.287 POINT (-122.885 49.00976)
## 3 36 7.266 POINT (-123.0275 49.04377)
## 4 38 7.306 POINT (-122.8367 48.99122)
## 5 131 7.449 POINT (-122.9432 48.98027)
## 6 63 7.315 POINT (-122.9317 49.01492)
## Simple feature collection with 6 features and 1 field
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -123.0625 ymin: 48.90728 xmax: -123.0488 ymax: 48.99558
## Geodetic CRS: WGS 84
## analysed_sst geometry
## 1 7.22900 POLYGON ((-123.0556 48.9072...
## 2 7.20900 POLYGON ((-123.0556 48.9228...
## 3 7.21125 POLYGON ((-123.0556 48.9384...
## 4 7.22850 POLYGON ((-123.0557 48.9540...
## 5 NaN POLYGON ((-123.0557 48.9696...
## 6 NaN POLYGON ((-123.0557 48.9851...
3.3.3 Working with Raster Covariates
Raster data sets are another popular spatial format one comes across when working with spatial covariates. Typically these come in geoTiff format but other formats are supported such as Band Interleaved by Line (BIL), SAGA, ENVI, & others. Just like NetCDF, geoTiff rasters are read into R the same way. For the purpose of this tutorial we will use the Bathymetry (bottom depth) raster compiled and supplied by the Salish Sea Atlas.
### Read in raster bathymetry and project to plot on leaflet map
Depth <- terra::rast("Data\\SS_Bathymetry\\SS_Bathymetry.tif")
You can see what this raster file looks like below which was generated
using the same leaflet code as we used for SST above. If using leaflet
for plotting spatial data you may notice some irregularities due to
different projections so sometimes you will have to project your data to
WGS84 for visualization purposes like done below. For analysis purposes
this projection may not be the best.
Raster data can often be quite large and difficult to work with. The
best way to go about this problem is to crop the raster to the extent of
your study. This can easily be done using the crop()
function from the
terra
package. We will use our grid we created to crop this raster.
You can see in the map below that the data has been cropped according to
the study area.
### crop raster to study area must be same crs
Depth_Crop <- terra::crop(Depth,grid %>% st_transform(crs(Depth)))
Much like SST we can get a set of values for our survey points and grid. You will notice a few transformations happening below. This is to make sure that the projection of our points and grids match the projection of the raster layer before being converted back to WGS84.
### Do extraction for survey data
BLSC <- extract(Depth_Crop,BLSC %>% st_transform(crs(Depth_Crop)),bind=T) %>% st_as_sf() %>% st_transform(4326)
grid <- extract(Depth_Crop,grid %>% st_transform(crs(Depth_Crop)),fun="mean",bind=T) %>% st_as_sf() %>% st_transform(4326)
### See added column with data
head(BLSC)
## Simple feature collection with 6 features and 3 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -123.0275 ymin: 48.98027 xmax: -122.8367 ymax: 49.04377
## Geodetic CRS: WGS 84
## Abundance analysed_sst SS_Bathymetry geometry
## 1 44 7.298 -6.034532 POINT (-122.9805 49.01849)
## 2 86 7.287 -6.033193 POINT (-122.885 49.00976)
## 3 36 7.266 -1.037675 POINT (-123.0275 49.04377)
## 4 38 7.306 -16.794600 POINT (-122.8367 48.99122)
## 5 131 7.449 -29.292519 POINT (-122.9432 48.98027)
## 6 63 7.315 -6.834748 POINT (-122.9317 49.01492)
## Simple feature collection with 6 features and 2 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -123.0625 ymin: 48.90728 xmax: -123.0488 ymax: 48.99558
## Geodetic CRS: WGS 84
## analysed_sst SS_Bathymetry geometry
## 1 7.22900 -132.48201 POLYGON ((-123.0556 48.9072...
## 2 7.20900 -130.37502 POLYGON ((-123.0556 48.9228...
## 3 7.21125 -115.75745 POLYGON ((-123.0556 48.9384...
## 4 7.22850 -48.45209 POLYGON ((-123.0557 48.9540...
## 5 NaN NaN POLYGON ((-123.0557 48.9696...
## 6 NaN NaN POLYGON ((-123.0557 48.9851...
Bottom slope can also easily be calculated in R using the terrain()
function within the terra
package. You can read more about this
function and the alternative metrics that can be calulated from terrain
rasters here.
Slope values can then be extracted for our survey data just like did for
depth. One important thing to note is that since we are working with
slope which is in degrees the mean must be calculated differently. A
function for doing so is provided below. Unfortunately, this function
can’t be used within the extract()
function so averages have to be
calculated manually for each grid polygon by extracting all slope values
and averaging them using the custom function.
####
mean_angle <- function(a, angle=c("degree", "radians")) {
angle=angle[1]
deg2rad <- function(x) { x * pi/180}
rad2deg <- function(x) { x * 180/pi }
deg2vec <- function(x, ang = c("degree", "radians")) {
if(ang == "degree") {
a <- c(sin(deg2rad(x)), cos(deg2rad(x)))
} else if(ang == "radians") {
a <- c(sin(x), cos(x))
}
return(a)
}
vec2deg <- function(x) {
res <- rad2deg(atan2(x[1], x[2]))
if (res < 0) { res <- 360 + res }
return(res)
}
mean_vec <- function(x) {
y <- lapply(x, deg2vec, ang=angle)
Reduce(`+`, y)/length(y)
}
return( vec2deg(mean_vec(a)) )
}
### Do extraction for survey data
BLSC <- extract(Slope,BLSC %>% st_transform(crs(Slope)),bind=T) %>% st_as_sf() %>% st_transform(4326)
### Average slope more complicated must do this manually
grid <-grid %>% mutate(AverageSlope=NA)
for (i in 1:nrow(grid)) {
gridvalues <- extract(Slope,vect(grid[i,] %>% st_transform(crs(Slope))))
if(!all(is.na(gridvalues$slope))){
grid$AverageSlope[i] <- mean_angle(na.omit(gridvalues$slope))} else {
grid$AverageSlope[i] <- NA
}
}
### See added column with data
head(BLSC)
## Simple feature collection with 6 features and 4 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -123.0275 ymin: 48.98027 xmax: -122.8367 ymax: 49.04377
## Geodetic CRS: WGS 84
## Abundance analysed_sst SS_Bathymetry slope geometry
## 1 44 7.298 -6.034532 0.023155068 POINT (-122.9805 49.01849)
## 2 86 7.287 -6.033193 0.261033239 POINT (-122.885 49.00976)
## 3 36 7.266 -1.037675 0.008198081 POINT (-123.0275 49.04377)
## 4 38 7.306 -16.794600 0.118249444 POINT (-122.8367 48.99122)
## 5 131 7.449 -29.292519 0.061369546 POINT (-122.9432 48.98027)
## 6 63 7.315 -6.834748 1.160786780 POINT (-122.9317 49.01492)
## Simple feature collection with 6 features and 3 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -123.0625 ymin: 48.90728 xmax: -123.0488 ymax: 48.99558
## Geodetic CRS: WGS 84
## analysed_sst SS_Bathymetry geometry AverageSlope
## 1 7.22900 -132.48201 POLYGON ((-123.0556 48.9072... 0.4780118
## 2 7.20900 -130.37502 POLYGON ((-123.0556 48.9228... 0.3352073
## 3 7.21125 -115.75745 POLYGON ((-123.0556 48.9384... 1.1366872
## 4 7.22850 -48.45209 POLYGON ((-123.0557 48.9540... 3.3713673
## 5 NaN NaN POLYGON ((-123.0557 48.9696... 1.3949719
## 6 NaN NaN POLYGON ((-123.0557 48.9851... NA
The raster tools outlined above will also work for other raster layers such as the AIS Vessel Transit Counts data.
3.3.4 Working with Linear Covariates
There are a number of linear covariates mentioned above that also
require a bit of data manipulation or calculation in order to be useful.
Specifically distance to nearest shoreline. Using the Prototype Global
Shoreline Data mentioned
above I will outline how a user would go about calculating this for both
our BLSC points and our grids. You will need to download data for the
appropriate region (region 26 in this case) and because it covers a
larger area than needed lets clip it to the extent of the Salish Sea
Bioregion. This can be done using the st_intersection()
function from
sf
. Note that columns from both are normally included but I have just
kept the columns from the original shoreline data.
### Read in shoreline data using sf
Shoreline <- read_sf("Data\\shapefile26\\cd26.shp") %>% st_transform(4326)
### Clip shoreline data to just salish sea to make it smaller and only keep original columns
Shoreline <- st_intersection(Shoreline,SalishSeaBoundary) %>% select(names(Shoreline))
Calculating the distance to nearest shoreline is probably more
complicated than it needs to be but I will explain the steps. Because
this data set comes as a bunch of linear features compared to a single
line for shoreline we first need to subset the segments that are closest
to each of our BLSC points. This can be done using the
st_nearest_feature()
function which basically gives us the index of
the nearest feature for each point (some might be duplicated but that is
okay). The next step is to get the nearest points between each pair of
BLSC point and shoreline segment using the st_nearest_points()
function. We can then at the same time calculate the distance between
these two points using the st_length()
function.
### Get nearest shoreline features for each point - should be 20
nearest <- Shoreline %>% slice(st_nearest_feature(BLSC,Shoreline))
### Get distance in meters and make sure it is numeric
distance <- as.numeric(st_length(st_nearest_points(BLSC, nearest, pairwise = T)))
### Add distance to BLSC data
BLSC <- BLSC %>% mutate(DistanceToShore=distance)
Click on the points below to see their calculated distance to nearest shore. Use the added measurement tool on the map to test out the calculations (some variation will be expected).
This process would also work for nearest distance to a ferry lane or
other linear feature. As for calculations for the grid we created this
can also be done easily and it would be recommended to do so to the
centroid of each grid using the st_centroid()
function.
nearest <- Shoreline %>% slice(st_nearest_feature(st_centroid(grid),Shoreline))
### Get distance in meters and make sure it is numeric
distance <- as.numeric(st_length(st_nearest_points(st_centroid(grid), nearest, pairwise = T)))
### Add distance to BLSC data
grid <- grid %>% mutate(DistanceToShore=distance)
Extracting values from other linear features would work very much the same way. For example, shoreline inventory data with information on shoreline type, one could extract information from that layer based on the nearest feature. I will give a quick example of how to do this using the shoreline data. Lets extract the ID of each shoreline segment. This could easily be any field such as shoreline type etc.
3.3.5 Climate Data
Various climate data variables can be obtained using the ClimateNAr
package. This package is not on CRAN so you will have to
register for ClimateBC/NA and download
and install the package as per there instructions. You can use this
map to get the different types of
inputs needed for getting climate variables for the specific period you
want. Below is an example of how you could go about downloading seasonal
climate variables for the BLSC data assuming this data was for the year
2022.
### Create column to store average winter temperature. This can be one or more of the values you are interested in.
BLSC <- BLSC %>% mutate(Tave_wt=NA)
### Loop through each point and extract climate temp
for (i in 1:nrow(BLSC)) {
### extract coordinates for each point
coords <- st_coordinates(BLSC[i,]) %>% as.data.frame()
### Query climate NA for seasonal data for the year 2022
climate_vars <- ClimateNA_API(ClimateBC_NA='NA', c(coords$Y,coords$X,0), period='Year_2022.ann', MSY='S')
### Populate value for that point. Can also do more than one value at a time.
BLSC$Tave_wt[i] <- climate_vars$Tave_wt
}
3.3.6 Other Spatial Data
There are many different types of data sets one might come across that may even be outside the ones listed here or gone over in more detail above. However, many of the tools mentioned above will still apply. For instance, instead of calculating the nearest distance to shore the same could be done on point data such as BC Aquaculture sites or distance to Eelgrass etc. Not all use cases have been explored here as there are endless ways to work with this type of data. Below are some additional resources that can be helpful for understanding spatial analysis in R.
3.4 References
Lamb JS, Paton PW, Osenkowski JE, Badzinski SS, Berlin AM, Bowman T, Dwyer C, Fara LJ, Gilliland SG, Kenow K. 2020. Assessing year‐round habitat use by migratory sea ducks in a multi‐species context reveals seasonal variation in habitat selection and partitioning. Ecography. 43(12):1842–1858.
Michel N. 2021. Avian Habitat Suitability Models for Puget Sound Estuary Birds. Prepared for the Puget Sound Ecosystem Monitoring Program, Puget Sound Partnership. Tacoma, WA.
Rickbeil GJ, Coops NC, Drever MC, Nelson TA. 2014. Assessing coastal species distribution models through the integration of terrestrial, oceanic and atmospheric data. Journal of biogeography. 41(8):1614–1625.
Zipkin EF, Gardner B, Gilbert AT, O’Connell AF, Royle JA, Silverman ED. 2010. Distribution patterns of wintering sea ducks in relation to the North Atlantic Oscillation and local environmental characteristics. Oecologia. 163:893–902. >>>>>>> edebd789d7ea92f60591d5086ec0b09ade850ab7