Chapter 3 Spatial Data

3.1 Overview

Previous research has identified a suite of biophysical predictors important in predicting the distribution and habitat use of sea ducks and coastal birds (Table 1). However, likely owing to the international border dividing the Salish Sea, many of these environmental/biophysical layers have yet to be compiled or used for trans-boundary habitat modeling. Furthermore, there are numerous available data sets that have yet to be tested in predicting the distribution of sea ducks that could be important (Table 2). The purpose of this chapter is to give an overview of spatial data that can support international sea duck habitat models in the Salish Sea. At the end of this chapter the user will know:

What spatial data is available for modeling in the Salish Sea
Where to get the data
The limitations of the data (e.g., spatial resolution and temporal scale)
How to work with different types of data

Table 1. Summary of biophysical predictors used in modeling sea duck distribution and habitat use.

Covariate	Type	Citations
Net Primary Productivity	Aquatic	Lamb et al. 2020; Rickbeil et al. 2014
Sea Surface Temperature	Aquatic	Lamb et al. 2020; Rickbeil et al. 2014; Zipkins et al. 2010
Salinity	Aquatic	Lamb et al. 2020
Sediments	Aquatic	Rickbeil et al. 2014
Distance to Nearest Shoreline	Spatial	Lamb et al. 2020
Bottom Depth (Bathymetry)	Spatial	Lamb et al. 2020; Zipkins et al. 2010
Slope/Ocean Floor Topography	Spatial	Lamb et al. 2020; Zipkins et al. 2010
Bay or Otherwise	Spatial	Zipkins et al. 2010
Climate During Migration	Climate	Zipkins et al. 2010
Monthly Winter Temperature	Climate	Rickbeil et al. 2014
Monthly Winter Precipitation	Climate	Rickbeil et al. 2014
Shore Zone	Terrestrial	Rickbeil et al. 2014
Latitude	Spatial	Lamb et al. 2020; Zipkins et al. 2010
Longitude	Spatial	Lamb et al. 2020
Spatial Autocorrelation	Other	Zipkins et al. 2010
Temporal Autocorrelation	Other	Zipkins et al. 2010
Offset or Covariate for Effort	Other	Zipkins et al. 2010; Michel et al. 2021

Table 2. Summary of potential biophysical predictors not previously tested in modeling sea duck distribution and habitat use.

Covariate	Type
Sea Vessel Traffic	Spatial
Bottom Patches	Spatial
Distribution of Prey	Spatial
Aquatic Vegiation	Spatial

3.2 Available Data

Below you will find a collection of available data sets that could be used in sea duck habitat suitability analysis. The data sets provided here have been guided by previous literature and may not be comprehensive. When available newer or better data sets have been included to replace out of date sources. Some covariates in Table 1 have been excluded as they are specific to the research being conducted (latitude/longitude, autocorrelation, and effort). A suite of data sets that have not been previously tested have also been included here. Not all of these data sets are trans boundary in nature but have been included for special use cases and could be updated if and when a corresponding data set becomes available.

3.2.1 Net Primary Productivity

Net Primary Productivity (NPP), the measure of energy or biomass accumulation by primary producers, influences the distribution of consumers at greater trophic levels. With the launch of ocean-observing satellites, reasonable estimates of ocean primary production have become available. Numerous models exist for determining NPP, however, the vertical general productive model (VGPM) is a common standard and is a function of chlorophyll, available light, and photosynthetic efficiency (O’Malley 2012). Alternatively, the concentration chlorophyll-a has been used as an index of NPP as it related to phytoplankton biomass (Moses et al. 2009).

3.2.1.1 Data Sources

Dataset	Format	Unit	Resolution (km²)	Timescale	Source
Vertical General Productive Model	HDF	mgC m^-2 day^-1	100	Monthly or 8day: 2002-2021	O’Malley 2012
Chlorophyll-a	netCDF (.nc)	mg m^-3	1	Daily: 2002-Present	MODIS

3.2.1.2 Data Use Considerations

Both datasets are freely available to download and use from their respective links in the table above. User sign up might be required.

The VGPM has a few flaws that might limit its usability for habitat suitability modeling. Firstly, its resolution of approximately 10km by 10km is a bit coarse. Furthermore, the model is limited to calculations south of 49^o N in December and 53^o N in January. With the northern most point of the Salish Sea Bio-region being at 50.8^o N the VGPM would not cover the full extent of the study area of interest.

Chlorophyll-a data from MODIS on the other hand comes with better resolution and timescale, however, this comes at a cost of file size and the number of files which can be cumbersome to deal with. Furthermore, even though the data set has a daily temporal scale, cloud cover can lead to missing data. To overcome this issue one could use MODIS Level 3 data that provides averages at 8-days or monthly at a resolution of 4 km which could be problematic depending on the time scale you might be interested in.

3.2.2 Sea Surface Temperature

Sea Surface Temperature (SST), can also influence the distribution of species based on their thermal needs (Lamb et al. 2020; Zipkins et al. 2010).

3.2.2.1 Data Sources

Dataset	Format	Unit	Resolution (km²)	Timescale	Source
MODIS Aqua 11 um Day/Night Sea Surface Temp	netCDF (.nc)	C	1	Daily: 2002-Present	MODIS
AVHRR Pathfinder SST	netCDF (.nc)	C	16	Daily: 1981-Present	NOAA/NASA AVHRR Pathﬁnder Program
MUR-SST	netCDF (.nc)	C	1	Daily: 2003-Present	PODAAC

3.2.2.2 Data Use Considerations

MODIS, Pathfinder, and MUR-SST data are freely accessible to download at their respective links above. MODIS data comes in a higher resolution but might be more cumbersome to work with given there are two satellite passes per day resulting in large file downloads. Pathfinder data is ready to use immediately and has day and night averages. Furthermore, like Chlorophyll-a data from MODIS, daily data can often be incomplete depending on cloud cover in the region of interest on a given day. An alternative would be to use MODIS level 3 data which averages over 8 day or monthly periods but at the cost of a lower resolution of 4km.

MUR-SST data from the Physical Oceanography Distributed Active Archive Center is possibly a good solution to overcome the problems with the first two data sources. This data set uses data from multiple sensors that have been calibrated with each other to provide SST data that is consistant in time and space. This allows for high spatial resolution without data-voids due to cloud contamination.

3.2.3 Salinity

Salinity has been shown to influence sea duck distribution. Lamb et al. (2020) found that during the winter 4 species of sea duck selected positively for salinity.

3.2.3.1 Data Sources

Dataset	Format	Unit	Resolution (km²)	Timescale	Source
SMAP Salinity V5.0	netCDF (.nc)	g kg-1	1600/4900	8-Day: 2015-Present	Remote Sensing Systems

3.2.3.2 Data Use Considerations

The SMAP Salinity V5.0 provides global coverage of sea surface salinity at a temporal scale of 8 days and at scales of 40km or 70km resulting in very poor spatial resolution (1600 km² and 4900 km² respectively). Furthermore, the 70km product is considered the official product and is based on smoothing of the 40 km product which can be noisy. Given this it would be very difficult to use this layer in a meaningful way to model sea duck habitat usage in the Salish Sea.

3.2.4 Distance to Nearest Shoreline

Lamb et al. (2020) identified distance to nearest shore as a predictor in sea duck distribution during winter, however, this differed among species. This covariate was included along with bottom slope and depth as a proxy for fine-scale variation in things like currents and eddies.

3.2.4.1 Data Sources

Dataset	Format	Resolution (km)	Timescale	Source
World Vector Shoreline	Shapefile (.shp)	NA	Static	Global Self-consistent Hierarchical High-Resolution Database
Prototype Global Shoreline Data	Shapefile (.shp)	NA	Static	National Geospatial-Intelligence Agency

3.2.4.2 Data Use Considerations

Both data sets are freely available for download at their respective links in the above table. Word Vector Shoreline (WVS) data is in the form of land polygons whereas the Prototype Global Shoreline (PGS) data comes in the form of lines. The PGS data is based on 2000 Landsat imagery (~30m resolution) which is finer than the WVS which at it’s finest scale is about ~2.5km. Both data sets are old with newer imagery only being used to fill in gaps.

3.2.5 Bottom Depth (Bathymetry) and Ocean Floor Topography (Slope)

Bottom depth and topography have been shown to influence sea duck distribution likely due to a number of factors. Mostly because these species rely heavily on benthic organisms, such as mollusks, sea ducks will prefer shallower areas for foraging where they can reach the sea floor that is supportive of prey populations (Lamb et al. 2020; Zipkins et al. 2010). Bottom depth and topography may also act as proxies for finer scale processes such as currents and eddies (Lamb et al. 2020).

3.2.5.1 Data Sources

Dataset	Format	Unit	Resolution (km²)	Timescale	Geography	Source
2-minute Gridded Global Relief Data, (ETOPO2) v2	geoTiff (.geotiff) or netCDF (.nc)	m	0.2025	Static	US/CAD	National Centers for Environmental Information
Coastal Relief Model US	netCDF (.nc)	m	0.0081	Static	US	National Centers for Environmental Information
Canadian Hydrographic Service Non-Navigational (NONNA) Bathymetric Data	geoTiff (.geotiff) and others	m	0.1 or 1	Static	CAD	Canadian Hydrographic Service
Salish Sea Bathymetry Basemap	geoTiff (.geotiff)	m	0.0081	Static	US/CAD	Salish Sea Atlas (Western Washington University)

3.2.5.2 Data Use Considerations

All data sets are freely available for download using their respective links above. Ocean floor topography can be calculated using bathymetry and guidelines on how to do this will be covered in the data use guide below.

Both the US Coastal Relief Model and the Canadian Hydrographic Service data are included for completeness, however, these data sets are not trans-boundary in nature and are at different spatial scales which limits their use for trans-boundary sea duck modeling.

ETOPO2 bathymetry data is a global data set at a resolution of 15 Arc-Seconds (~450m) which would be a suitable solution to using different data sets from the US and Canada. Another alternative which is specific to the Salish Sea Bioregion is bathymetry data from Western Washington University’s Salish Sea Atlas. This data set was compiled using two different data sets from the US and Canada and is at a finer resolution than ETOPO2 data (3 Arc-seconds or ~90m). This data set is only suitable to use for work done within the Salish Sea Bioregion as the raster has been clipped to that boundary.

3.2.6 Climate

Climatic variables such as the North Atlantic Oscillation (NAO) index has been shown to influence the distribution of wintering sea ducks (Zipkins et al. 2010). The NAO index is related to the climatic variability and has been shown to affect the marine environment and ultimately ecological processes in plants and animals. Positive values are associated with more winter storms of greater intensity. More specific to the pacific region is the Pacific Decadal Oscillation (PDO) which will be included here as it pertains to the Salish Sea. Rickbeil et al. (2014) found weak support for atmospheric variables (temperature and precipitation) predicting occurrence, however, these are included below for reference.

3.2.6.1 Data Sources

Dataset	Format	Unit	Resolution (km²)	Timescale	Source
Pacific Decadal Oscillation (PDO)	tabular	NA	NA	Monthly: 1854-present	National Centers for Environmental Information
ClimateNA	Multiple	Multiple	0.64	Annual, Seasonal, Yearly	Wang et al. 2016

3.2.6.2 Data Use Considerations

All data sets are freely available using their respective links above. The PDO provides a single monthly index for the Pacific Ocean. ClimateNA on the other hand allows you to extract many monthly, seasonal, and annual climate variables (e.g Mean Temperature and Precipitation) from free point locations. This includes historical and future predicted values. The data can be accessed using the ClimateNA software, API, or R package. The major limitation to both of these data sources is that the finest temporal resolution is down to a given month.

3.2.7 Shore-zone

Rickbeil et al. (2014) found that including more detailed shore-zone data from the Physical shore-zone mapping system for British Columbia provided modest improvements over only using remotely sensed environmental data for modeling the distributions of coastal bird species in BC.

3.2.7.1 Data Sources

Dataset	Format	Unit	Resolution (km²)	Timescale	Geography	Source
Physical shore-zone mapping system for British Columbia	wms/kml	Categorical		Static	CAD	Howes et al 1997
Washington State ShoreZone Inventory	geodatabase (.gdb)	Categorical		Static	US	Washington State Department of Natural Resources

3.2.7.2 Data Use Considerations

Shore-zone mapping of both British Columbia and Washington was both done during the late 1990s using a comparable methodology based on Howes (1994). At the time of writing this the British Columbia shore-zone data is difficult to access and work with. Many links to download the data are broken leaving the only available data to be accessed via WMS or .kml. WMS is image based which makes extracting the data extremely difficult and the .kml option has known issues. Alternatively, the Washington data is easy to access in the form of a geodatabase at the link in the above table. Both data sets should be able to be used together (when available) but one important thing to note is that given their age they may not be representative of today’s shore zone.

3.2.8 Sea Vessels

3.2.8.1 Data Sources

Dataset	Format	Timescale	Geography	Source
Vessel Traffic Routes	shapefile (.shp)	Static	CAD/US	Fisheries and Oceans Canada
Shipping Fairways, Lanes, and Zones for US waters	shapefile (.shp)	Static	US	National Oceanic and Atmospheric Administration
WSDOT - Ferry Routes	shapefile (.shp)	Static	US	Washington State Department of Transportation
AIS Vessel Transit Counts	geoTiff (.geotiff)	2015-2021	US/CAD	U.S. Coast Guard

3.2.8.2 Data Use Considerations

Both Canada and the US have data on sea vessel routes within the Salish Sea. The Canadian Vessel Traffic Routes data set contains information on ferry routes, mandatory direction of traffic flow, separation lines and zones. To be able to use this data in modeling sea duck distribution one would have to do a bit of research to understand what all of the different layers mean using the Canadian Chart 1 Symbols, Abbreviations and Terms. This is not as simple as using the Shipping Fairways, Lanes, and Zones for US waters which clearly shows shipping lanes in Washington. Washington State ferry routes have been included as they are not shown in the Shipping Fairway data set.

The AIS Vessel Transit counts is an interesting data set that is primarily meant to be US but also spans into the Canadian side of the Salish Sea. This data set is a summary of each time a vessel track passes through a 100m grid cell. Ultimately showing areas of high or low traffic. Furthermore this can be broken down by vessel type: All, Cargo, Fishing, Passenger, Pleasure/Sailing, Tanker, and Tug/Tow.

3.2.9 Bottom Patches

3.2.9.1 Data Sources

Dataset	Format	Unit	Timescale	Geography	Source
Nearshore Bottom Patches for Pacific Canada	shapefile (.shp)	Categorical	Static	CAD	Fisheries and Oceans Canada

3.2.9.2 Data Use Considerations

The nearshore bottom patches data set for the Pacific Coast of Canada provides continuous substrate map to a depth of 50m off the coast of BC. Substrate is a key indicator of habitat in this important ecosystem where data collection is challenging and expensive. Horizontal accuracy of this data ranges from meters to 10s of meters due to source data and data processing. Areas with higher data density are likely to be more accurate. Unfortunately, at this time there does not seem to be an analogous product for the southern Salish Sea. Habitat mapping has been done here but only in select locations that does not cover the entirety of the southern Salish Sea (see here).

3.2.10 Distribution of Prey

Dataset	Format	Unit	Timescale	Geography	Source
Commercial Shellfish Harvest Sites	shapefile (.shp)	NA	Current	US	Washington State Department of Health
British Columbia Aquaculture	CSV	NA	Current	CAD	Fisheries and Oceans Canada
Pacific Herring spawn index	CSV	NA	1951-2022	CAD	Fisheries and Oceans Canada
Herring Spawning Locations	shapefile (.shp)	NA		US	Washington Dept of Fish and Wildlife
Southern Salish Sea Herring Biomass	Excel (.xlsx)	Metric Tons	1973-2002	US	Washington Dept of Fish and Wildlife

3.2.10.1 Data Use Considerations

Aquaculture locations in British Columbia and Washington portions of the Salish Sea are both available from their respective links above. Data from Washington are provided in a spatial point layer whereas BC data is provided in a tabular format which contains latitude and longitude coordinates which can easily be converted to a spatial point layer as well. Both data sets simply provide spatial locations of aquaculture locations with the BC data set differing slightly as it includes other aquaculture locations in addition to shellfish. Only current locations of license holders are shown which do not appear to include historical locations.

Fisheries and Oceans Canada provides a yearly Herring Spawn Index for approximately 300 sections along the coast of British Columbia. This is a relative index of Herring Spawn biomass. There is currently no analogous data set for the United States, however, actual biomass has been estimated by the Washington Department of Fish and Wildlife for the Puget Sound. Furthermore, for the same region there is a data set of spawning habitat although survey dates go as far back as 1969 and may not be representative today and there does not appear to be an equivalent data set for BC.

3.2.11 Aquatic Vegetation

Dataset	Format	Timescale	Geography	Source
British Columbia Marine Conservation Analysis	shapefile (.shp)	2006-2013	CAD	BC Conservation Foundation
Eelgrass Extent - Coastal British Columbia	shapefile (.shp)	2016	CAD	Hakai Institute
Washington State ShoreZone Inventory	shapefile (.shp)	1994-2000	US	Washington State Department of Natural Resources
Washington Marine Vegetation Atlas	geoJSON		US	Washington State Department of Natural Resources

3.2.11.1 Data Use Considerations

The British Columbia Marine Conservation Analysis provides a suite of data on the aquatic vegetation in British Columbia’s coastal waters. Available data includes information on Kelp, Surfgrass, Ditch Grass, and Eelgrass. Data sets vary and contain multiple sources per species (e.g. Eelgrass polygons, Priority Eelgrass Habitat, and Eelgrass Biobands). Metadata for each of these different data sets can be found here.

A predicted data set representing polygons of Eelgrass for the BC Coast is available from the Hakai Institute. Currently data is not directly downloadable, however, a data request can be made to data@hakai.org. This data set uses coastline ShoreZone and bathymetry data to predict Eelgrass extent. More information can be found using the link in the above table.

The Washington State ShoreZone Inventory contains data on aquatic vegetation for Washington state. Data from this survey is old coming from aerial video collected in 1994-2000. Information on Eelgrass, Dunegrass, Surfgrass, and Kelp is available in a linear format indication areas of continuous or patchy presence along the shore.

The Washington Marine Vegetation Atlas provides spatial information on the vegetation that grows in nearshore areas for the southern Salish Sea. This data set consists of 2000 polygon areas that have been surveyed from 2 to 56 times and the presence or absence of different vegetation is reported (Seagrass, Kelp, and Macroalgae). Because this data only tells you presence absence information on abundance is not provided. Data is not easily downloadable, however, it can be queried or downloaded using R or GIS software. Below you will find instructions on accessing the data using R.

3.2.12 Additional Data Sources

Below is a table with a list of additional data sources not covered above which may or may not be useful in modeling sea duck distribution in the Salish Sea. Each source contains a link where you can download and discover additional data and notes are provided to give an overview of what is available.

Additional Data Source	Notes
Salish Sea Atlas	The Salish Sea Atlas from Western Washington University is an open access digital book that contains cultural and environmental geospatial data sets for the Salish Sea Bioregion. Some data sets included are sea depth (mentioned above), landcover, total annual precipitaiton, and impervious surfaces.
Bio-ORACLE	Bio-ORACLE is a data source for GIS Rasters that provide global geophysical, biotic, and environmental data for marine surface and benthic areas. Data is proved at a resolution of approximately 9.2 km squares. Example data sets included are Chlorophyll concentration, current velocity, sea surface temperature, and salinity. Furthermore, data can be downloaded using the R package ‘sdmpredictors.’
The British Columbia Marine Conservation Analysis (2006–2013)	In addition to information on aquatic vegetation, the BCMCA contains a suite of additional data sets for BC coastal waters including tidal currents, Shorezone exposure, algae, invertebrates, birds, and mammals to name a few. It is important to note that data is > 10 years old and is only available for BC.
Washington Coastal Atlas	In addition to information on aquatic vegetation, the Washington Coastal Atlas contains many other useful data sources specific to Shoreline, Ocean, Wetland, Administrative, and Land Cover. The interactive map allows you to view data and links to downloading (when available) are provided in layer information.
BC Coastal Resource Information Management System	The B.C. Coastal Resource Information Management System (CRIMS) is an interactive tool for viewing and downloading coastal and marine data for British Columbia. The development of this tool is ongoing meaning new layers can be added as they become available.
Salish Sea Key Sites	A derived product based on the Puget Sound Ambient Modeling Project data, expert opinion, and other sources. This identifies areas within the Salish Sea that are most important to sea ducks, and can be used as a baseline to be compared with current occurrence data of sea ducks. Analyses can update the key site atlas and identify additional or shifting key sites. More information can be found here

3.3 Working With Different Data Sets in R

Working with multiple types of geospatial data sets like the ones mentioned above can be quite challenging especially if the user does not have access or a great understanding of software like ArcGIS/QGIS. Furthermore, doing any analysis using GIS software can be difficult to reproduce. R as a GIS tool has advanced considerably in the last 10 years and has the added benefit of being flexible, reproducible, and open source. In addition, many ecologists rely on R for statistical analysis of their data which makes a strong argument for using R for an entire analysis (data extraction, manipulation, and analysis). Here we will go over some useful ways to interact and extract data from the various types of data sets mentioned above to provide the user with a starting point for their analysis.

3.3.1 Getting Started

To get started there are a number of useful packages worth mentioning that will be used in the following examples. Depending on the user’s skill set some of these packages may be familiar or completely new.

The sf package provides simple feature access in R. This package works best for working with spatial data (point, line, polygon, multipolygon etc) associated with tabular attributes (e.g shapefiles). You may be familiar with the sp package that has similar functionality in a different format, however, this package is heading for retirement by the end of 2023 and does not support integration with tidyverse which is very popular among data scientists in R.
The terra package is essentially a modern version of the raster package but faster and more flexible. Furthermore, the raster package currently relies on packages that are being depreciated along with sp at the end of the year.

Download and then load the required packages:

### Install Packages
#devtools::install_github('rstudio/leaflet') ### Use development version
#install.packages("leaflet.extras")
#install.packages("leaflet.providers")
#install.packages("tidyverse")
#install.packages("sf")
#install.packages("terra")
#install.packages("geojsonsf")
#install.packages("exactextractr")

### Load required packages
require(leaflet)
require(leaflet.extras)
require(leaflet.providers)
require(tidyverse)
require(sf)
require(terra)
require(geojsonsf)

For the purpose of this tutorial we will focus specifically on data within the Salish Sea ecoregion. A copy of the shapefile can be downloaded from the Salish Sea Atlas or can simply be read into R using the geojsonsf package which allows for reading in spatial data directly from the web in geojson format. You can view an interactive map of the boundary using the leaflet and supporting packages. At time of writing this the CRAN version of leaflet was not working with the terra package. If you encounter this problem please download and use the development version devtools::install_github('rstudio/leaflet').

### Read in Salish Sea Boundary from arcgis service
SalishSeaBoundary<-geojson_sf("https://services.arcgis.com/qboYD3ru0louQq4F/arcgis/rest/services/Salish_Sea_Bioregion_Boundary/FeatureServer/0/query?outFields=*&where=1%3D1&f=geojson") %>% st_transform(4326)

### View interactive map of boundary
leaflet(width = "100%") %>%
  addTiles() %>%
  addPolygons(data=SalishSeaBoundary,color = "black", weight = 2, smoothFactor = 1,opacity = 1.0, fillOpacity = 0.5, fillColor = "darkgreen") %>% addFullscreenControl() %>%
  addLegend(colors = c("darkgreen"),labels = c("Salish Sea Bioregion"),position = "bottomright")

We are also going to need some pretend survey points for which we will use to extract spatial covariates. Imagine this data set is the location and abundance of Black Scoters during a given winter survey. We can read in the data and have a look at it here:

### Read in Salish Sea Boundary from arcgis service
BLSC <- read_sf("Data\\Points\\POINT.shp")

### View interactive map of boundary
leaflet(width = "100%") %>%
  addTiles() %>%
  addMarkers(data = BLSC,popup = ~paste0("Abundance: ",as.character(Abundance))) %>% addFullscreenControl()

We might also want to summarize our survey area into a spatial grid rather than individual points for the purpose of modeling and predicting. This can easily be done using the sf package. To do this we need to make sure to we are using a coordinate system that is in the units we want. To make a 1km grid we can convert to UTM Zone 10 (26910) before creating the grid. We convert back to WGS84 simply for viewing on the map.

### Transform to UTM 10 make grid 1km in size and convert to spatial dataframe
grid <- BLSC %>% st_transform(26910) %>% st_make_grid(1000, what = "polygons", square = FALSE) %>% st_sf() %>% st_transform(4326)


### View newly created grid
leaflet(width = "100%") %>%
  addTiles() %>%
  addPolygons(data=grid,color = "black", weight = 1, smoothFactor = 1,opacity = 1.0, fillOpacity = 0) %>%
  addMarkers(data = BLSC,popup = ~paste0("Abundance: ",as.character(Abundance))) %>% addFullscreenControl()

Now that we have some starting data to play with we can begin to extract spatial covariates for them.

3.3.2 Working with NetCDF (.nc) Covariates

Many of the data sets mentioned above come in netCDF format which is a very widely used format for storing and sharing scientific array data including raster data. To download data from some of the sources above you may need a set of coordinates to refine your search. This can be done using sf package to calculate the bounding box this can be done on your study area (SalishSeaBoundary) or based on the data you have (BLSC). Here is a link as an example of downloading MUR SST within the Salish Sea Ecoregion boundary.

st_bbox(SalishSeaBoundary)

##       xmin       ymin       xmax       ymax 
## -125.92876   46.59306 -120.65461   50.86204

st_bbox(BLSC)

##       xmin       ymin       xmax       ymax 
## -123.04885   48.90451 -122.78620   49.04377

NetCDF data can be read using the terra package and easily visualized using leaflet. For exploration purposes the code below also includes a toggle to add the BLSC points and the grid we created.

### Read in Data NetCDF
SST <- terra::rast("Data\\jplMURSST41_862f_217f_f62b.nc")

### Set colour scale
pal <- colorNumeric(c("#0C2C84", "#41B6C4", "#FFFFCC"), values(SST),
  na.color = "transparent")

### Plot the data to view
leaflet(width = "100%") %>%
  addTiles() %>%
  addRasterImage(SST, colors = pal, opacity = 0.8, group = "SST") %>% addLegend(pal = pal, values = values(SST), title = "Average Surface temp") %>% addPolygons(data=grid,color = "black", weight = 1, smoothFactor = 1,opacity = 1.0, fillOpacity = 0, group = "Grid") %>% addMarkers(data = BLSC,popup = ~paste0("Abundance: ",as.character(Abundance)), group = "BLSC") %>% addFullscreenControl() %>% addLayersControl(
    overlayGroups = c("SST", "Grid","BLSC"),
    options = layersControlOptions(collapsed = FALSE)
  ) %>% hideGroup("Grid")

Next we will want to extract these covariates and add them to our respective data sets. This can be done using the terra package and the extract() function. Points will just get the value attributed to them whereas polygons will require some sort of function to account for multiple raster values being within them. Typically for this scenario using the mean is appropriate.

### Do extraction for survey data
BLSC <- extract(SST,BLSC,bind=T) %>% st_as_sf()
grid <- extract(SST,grid,fun="mean",bind=T) %>% st_as_sf()

### See added column with data
head(BLSC)

## Simple feature collection with 6 features and 2 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -123.0275 ymin: 48.98027 xmax: -122.8367 ymax: 49.04377
## Geodetic CRS:  WGS 84
##   Abundance analysed_sst                   geometry
## 1        44        7.298 POINT (-122.9805 49.01849)
## 2        86        7.287  POINT (-122.885 49.00976)
## 3        36        7.266 POINT (-123.0275 49.04377)
## 4        38        7.306 POINT (-122.8367 48.99122)
## 5       131        7.449 POINT (-122.9432 48.98027)
## 6        63        7.315 POINT (-122.9317 49.01492)

head(grid)

## Simple feature collection with 6 features and 1 field
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -123.0625 ymin: 48.90728 xmax: -123.0488 ymax: 48.99558
## Geodetic CRS:  WGS 84
##   analysed_sst                       geometry
## 1      7.22900 POLYGON ((-123.0556 48.9072...
## 2      7.20900 POLYGON ((-123.0556 48.9228...
## 3      7.21125 POLYGON ((-123.0556 48.9384...
## 4      7.22850 POLYGON ((-123.0557 48.9540...
## 5          NaN POLYGON ((-123.0557 48.9696...
## 6          NaN POLYGON ((-123.0557 48.9851...

3.3.3 Working with Raster Covariates

Raster data sets are another popular spatial format one comes across when working with spatial covariates. Typically these come in geoTiff format but other formats are supported such as Band Interleaved by Line (BIL), SAGA, ENVI, & others. Just like NetCDF, geoTiff rasters are read into R the same way. For the purpose of this tutorial we will use the Bathymetry (bottom depth) raster compiled and supplied by the Salish Sea Atlas.

### Read in raster bathymetry and project to plot on leaflet map
Depth <- terra::rast("Data\\SS_Bathymetry\\SS_Bathymetry.tif")

You can see what this raster file looks like below which was generated using the same leaflet code as we used for SST above. If using leaflet for plotting spatial data you may notice some irregularities due to different projections so sometimes you will have to project your data to WGS84 for visualization purposes like done below. For analysis purposes this projection may not be the best.

Raster data can often be quite large and difficult to work with. The best way to go about this problem is to crop the raster to the extent of your study. This can easily be done using the crop() function from the terra package. We will use our grid we created to crop this raster. You can see in the map below that the data has been cropped according to the study area.

### crop raster to study area must be same crs
Depth_Crop <- terra::crop(Depth,grid %>% st_transform(crs(Depth)))

Much like SST we can get a set of values for our survey points and grid. You will notice a few transformations happening below. This is to make sure that the projection of our points and grids match the projection of the raster layer before being converted back to WGS84.

### Do extraction for survey data
BLSC <- extract(Depth_Crop,BLSC %>% st_transform(crs(Depth_Crop)),bind=T) %>% st_as_sf() %>% st_transform(4326)
grid <- extract(Depth_Crop,grid %>% st_transform(crs(Depth_Crop)),fun="mean",bind=T) %>% st_as_sf() %>% st_transform(4326)

### See added column with data
head(BLSC)

## Simple feature collection with 6 features and 3 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -123.0275 ymin: 48.98027 xmax: -122.8367 ymax: 49.04377
## Geodetic CRS:  WGS 84
##   Abundance analysed_sst SS_Bathymetry                   geometry
## 1        44        7.298     -6.034532 POINT (-122.9805 49.01849)
## 2        86        7.287     -6.033193  POINT (-122.885 49.00976)
## 3        36        7.266     -1.037675 POINT (-123.0275 49.04377)
## 4        38        7.306    -16.794600 POINT (-122.8367 48.99122)
## 5       131        7.449    -29.292519 POINT (-122.9432 48.98027)
## 6        63        7.315     -6.834748 POINT (-122.9317 49.01492)

head(grid)

## Simple feature collection with 6 features and 2 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -123.0625 ymin: 48.90728 xmax: -123.0488 ymax: 48.99558
## Geodetic CRS:  WGS 84
##   analysed_sst SS_Bathymetry                       geometry
## 1      7.22900    -132.48201 POLYGON ((-123.0556 48.9072...
## 2      7.20900    -130.37502 POLYGON ((-123.0556 48.9228...
## 3      7.21125    -115.75745 POLYGON ((-123.0556 48.9384...
## 4      7.22850     -48.45209 POLYGON ((-123.0557 48.9540...
## 5          NaN           NaN POLYGON ((-123.0557 48.9696...
## 6          NaN           NaN POLYGON ((-123.0557 48.9851...

Bottom slope can also easily be calculated in R using the terrain() function within the terra package. You can read more about this function and the alternative metrics that can be calulated from terrain rasters here.

Slope <- terrain(Depth_Crop, v="slope", neighbors=8, unit="degrees")

Slope values can then be extracted for our survey data just like did for depth. One important thing to note is that since we are working with slope which is in degrees the mean must be calculated differently. A function for doing so is provided below. Unfortunately, this function can’t be used within the extract() function so averages have to be calculated manually for each grid polygon by extracting all slope values and averaging them using the custom function.

####
mean_angle <- function(a, angle=c("degree", "radians")) {
  angle=angle[1]
  deg2rad <- function(x) { x * pi/180} 
  rad2deg <- function(x) { x * 180/pi }
  deg2vec <- function(x, ang = c("degree", "radians")) { 
    if(ang == "degree") {
      a <- c(sin(deg2rad(x)), cos(deg2rad(x)))
    } else if(ang == "radians") {
      a <- c(sin(x), cos(x))
    }
    return(a)
  }
  vec2deg <- function(x) {
    res <- rad2deg(atan2(x[1], x[2]))
      if (res < 0) { res <- 360 + res }
    return(res)
  }
  mean_vec <- function(x) {
    y <- lapply(x, deg2vec, ang=angle)
    Reduce(`+`, y)/length(y)
  }
  return( vec2deg(mean_vec(a)) ) 
}
### Do extraction for survey data
BLSC <- extract(Slope,BLSC %>% st_transform(crs(Slope)),bind=T) %>% st_as_sf() %>% st_transform(4326)
### Average slope more complicated must do this manually 
grid <-grid %>% mutate(AverageSlope=NA)
for (i in 1:nrow(grid)) {
  gridvalues <- extract(Slope,vect(grid[i,] %>% st_transform(crs(Slope)))) 
if(!all(is.na(gridvalues$slope))){
 grid$AverageSlope[i] <-  mean_angle(na.omit(gridvalues$slope))} else {
   grid$AverageSlope[i] <- NA
 }
  
}
 

### See added column with data
head(BLSC)

## Simple feature collection with 6 features and 4 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -123.0275 ymin: 48.98027 xmax: -122.8367 ymax: 49.04377
## Geodetic CRS:  WGS 84
##   Abundance analysed_sst SS_Bathymetry       slope                   geometry
## 1        44        7.298     -6.034532 0.023155068 POINT (-122.9805 49.01849)
## 2        86        7.287     -6.033193 0.261033239  POINT (-122.885 49.00976)
## 3        36        7.266     -1.037675 0.008198081 POINT (-123.0275 49.04377)
## 4        38        7.306    -16.794600 0.118249444 POINT (-122.8367 48.99122)
## 5       131        7.449    -29.292519 0.061369546 POINT (-122.9432 48.98027)
## 6        63        7.315     -6.834748 1.160786780 POINT (-122.9317 49.01492)

head(grid)

## Simple feature collection with 6 features and 3 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -123.0625 ymin: 48.90728 xmax: -123.0488 ymax: 48.99558
## Geodetic CRS:  WGS 84
##   analysed_sst SS_Bathymetry                       geometry AverageSlope
## 1      7.22900    -132.48201 POLYGON ((-123.0556 48.9072...    0.4780118
## 2      7.20900    -130.37502 POLYGON ((-123.0556 48.9228...    0.3352073
## 3      7.21125    -115.75745 POLYGON ((-123.0556 48.9384...    1.1366872
## 4      7.22850     -48.45209 POLYGON ((-123.0557 48.9540...    3.3713673
## 5          NaN           NaN POLYGON ((-123.0557 48.9696...    1.3949719
## 6          NaN           NaN POLYGON ((-123.0557 48.9851...           NA

The raster tools outlined above will also work for other raster layers such as the AIS Vessel Transit Counts data.

3.3.4 Working with Linear Covariates

There are a number of linear covariates mentioned above that also require a bit of data manipulation or calculation in order to be useful. Specifically distance to nearest shoreline. Using the Prototype Global Shoreline Data mentioned above I will outline how a user would go about calculating this for both our BLSC points and our grids. You will need to download data for the appropriate region (region 26 in this case) and because it covers a larger area than needed lets clip it to the extent of the Salish Sea Bioregion. This can be done using the st_intersection() function from sf. Note that columns from both are normally included but I have just kept the columns from the original shoreline data.

### Read in shoreline data using sf
Shoreline <- read_sf("Data\\shapefile26\\cd26.shp") %>% st_transform(4326)

### Clip shoreline data to just salish sea to make it smaller and only keep original columns
Shoreline <- st_intersection(Shoreline,SalishSeaBoundary) %>% select(names(Shoreline))

Calculating the distance to nearest shoreline is probably more complicated than it needs to be but I will explain the steps. Because this data set comes as a bunch of linear features compared to a single line for shoreline we first need to subset the segments that are closest to each of our BLSC points. This can be done using the st_nearest_feature() function which basically gives us the index of the nearest feature for each point (some might be duplicated but that is okay). The next step is to get the nearest points between each pair of BLSC point and shoreline segment using the st_nearest_points() function. We can then at the same time calculate the distance between these two points using the st_length() function.

### Get nearest shoreline features for each point - should be 20
nearest <- Shoreline %>% slice(st_nearest_feature(BLSC,Shoreline))

### Get distance in meters and make sure it is numeric
distance <- as.numeric(st_length(st_nearest_points(BLSC, nearest, pairwise = T)))

### Add distance to BLSC data
BLSC <- BLSC %>% mutate(DistanceToShore=distance)

Click on the points below to see their calculated distance to nearest shore. Use the added measurement tool on the map to test out the calculations (some variation will be expected).

This process would also work for nearest distance to a ferry lane or other linear feature. As for calculations for the grid we created this can also be done easily and it would be recommended to do so to the centroid of each grid using the st_centroid() function.

nearest <- Shoreline %>% slice(st_nearest_feature(st_centroid(grid),Shoreline))

### Get distance in meters and make sure it is numeric
distance <- as.numeric(st_length(st_nearest_points(st_centroid(grid), nearest, pairwise = T)))

### Add distance to BLSC data
grid <- grid %>% mutate(DistanceToShore=distance)

Extracting values from other linear features would work very much the same way. For example, shoreline inventory data with information on shoreline type, one could extract information from that layer based on the nearest feature. I will give a quick example of how to do this using the shoreline data. Lets extract the ID of each shoreline segment. This could easily be any field such as shoreline type etc.

### Get nearest segment for each BLSC point
nearest <- Shoreline %>% slice(st_nearest_feature(BLSC,Shoreline))

### Get just the ID
ShorelineID <- nearest %>% pull(ID)

### Add the ID to the BLSC data
BLSC <- BLSC %>% mutate(ShorelineID=ShorelineID)

3.3.5 Climate Data

Various climate data variables can be obtained using the ClimateNAr package. This package is not on CRAN so you will have to register for ClimateBC/NA and download and install the package as per there instructions. You can use this map to get the different types of inputs needed for getting climate variables for the specific period you want. Below is an example of how you could go about downloading seasonal climate variables for the BLSC data assuming this data was for the year 2022.

### Create column to store average winter temperature. This can be one or more of the values you are interested in. 
BLSC <- BLSC %>% mutate(Tave_wt=NA)

### Loop through each point and extract climate temp

for (i in 1:nrow(BLSC)) {
  ### extract coordinates for each point
  coords <- st_coordinates(BLSC[i,]) %>% as.data.frame() 
  ### Query climate NA for seasonal data for the year 2022
  climate_vars <- ClimateNA_API(ClimateBC_NA='NA', c(coords$Y,coords$X,0), period='Year_2022.ann', MSY='S')
  ### Populate value for that point. Can also do more than one value at a time.
  BLSC$Tave_wt[i] <- climate_vars$Tave_wt
}

3.3.6 Other Spatial Data

There are many different types of data sets one might come across that may even be outside the ones listed here or gone over in more detail above. However, many of the tools mentioned above will still apply. For instance, instead of calculating the nearest distance to shore the same could be done on point data such as BC Aquaculture sites or distance to Eelgrass etc. Not all use cases have been explored here as there are endless ways to work with this type of data. Below are some additional resources that can be helpful for understanding spatial analysis in R.

3.4 References

Lamb JS, Paton PW, Osenkowski JE, Badzinski SS, Berlin AM, Bowman T, Dwyer C, Fara LJ, Gilliland SG, Kenow K. 2020. Assessing year‐round habitat use by migratory sea ducks in a multi‐species context reveals seasonal variation in habitat selection and partitioning. Ecography. 43(12):1842–1858.

Michel N. 2021. Avian Habitat Suitability Models for Puget Sound Estuary Birds. Prepared for the Puget Sound Ecosystem Monitoring Program, Puget Sound Partnership. Tacoma, WA.

Rickbeil GJ, Coops NC, Drever MC, Nelson TA. 2014. Assessing coastal species distribution models through the integration of terrestrial, oceanic and atmospheric data. Journal of biogeography. 41(8):1614–1625.

Zipkin EF, Gardner B, Gilbert AT, O’Connell AF, Royle JA, Silverman ED. 2010. Distribution patterns of wintering sea ducks in relation to the North Atlantic Oscillation and local environmental characteristics. Oecologia. 163:893–902. >>>>>>> edebd789d7ea92f60591d5086ec0b09ade850ab7