vignettes/articles/mapping-observations.Rmd
mapping-observations.Rmd
In this article we’ll walk through how to create various types of
maps of the observations downloaded with naturecounts
to
get a sense of the spatial distribution.
The following examples use the “testuser” user which is not available to you. You can quickly sign up for a free account of your own to access and play around with these examples. Simply replace
testuser
with your own username.
To do so we’re going to use the following packages:
library(naturecounts)
library(sf)
library(rnaturalearth)
library(ggplot2)
library(ggspatial)
library(dplyr)
library(tidyr)
library(mapview)
First we’ll use download some data:
house_finches <- nc_data_dl(species = 20350, region = list(statprov = "AB"),
username = "testuser", info = "nc_tutorial")
## Using filters: species (20350); fields_set (BMDE2.00-min); statprov (AB)
## Collecting available records...
## collection nrecords
## 1 ABATLAS1 10
## 2 ABATLAS2 202
## 3 ABBIRDRECS 4
## 4 BBS 14
## 5 BBS50-CAN 41
## 6 GBIF_50C9509D 799
## ...
## Total records: 7,211
##
## Downloading records for each collection:
## ABATLAS1
## Records 1 to 10 / 10
## ABATLAS2
## Records 1 to 202 / 202
## ABBIRDRECS
## Records 1 to 4 / 4
## BBS
## Records 1 to 14 / 14
## BBS50-CAN
## Records 1 to 41 / 41
## GBIF_50C9509D
## Records 1 to 799 / 799
## GBIF_6AC3F774
## Records 1 to 1 / 1
## GBIF_8A863029
## Records 1 to 19 / 19
## GBIF_B1047888
## Records 1 to 2 / 2
## PFW
## Records 1 to 5000 / 6110
## Records 5001 to 6110 / 6110
## WILDTRAX1
## Records 1 to 7 / 7
## WILDTRAX19
## Records 1 to 1 / 1
## WILDTRAX41
## Records 1 to 1 / 1
head(house_finches)
## record_id collection project_id protocol_id protocol_type species_id
## 1 225617858 ABATLAS1 1048 NA NA 20350
## 2 225620472 ABATLAS1 1048 NA NA 20350
## 3 225630776 ABATLAS1 1048 NA NA 20350
## 4 225655455 ABATLAS1 1048 NA NA 20350
## 5 225665275 ABATLAS1 1048 NA NA 20350
## 6 225671997 ABATLAS1 1048 NA NA 20350
## statprov_code country_code SiteCode latitude longitude bcr subnational2_code
## 1 AB CA 1547 56.17056 -117.5639 6 CA.AB.19
## 2 AB CA 1548 56.26028 -117.5650 6 CA.AB.17
## 3 AB CA 2991 51.29972 -114.9200 6 CA.AB.15
## 4 AB CA 4156 50.03028 -113.5828 11 CA.AB.03
## 5 AB CA 5307 49.87333 -112.1828 11 CA.AB.02
## 6 AB CA 1536 56.07972 -117.7228 6 CA.AB.19
## iba_site utm_square survey_year survey_month survey_week survey_day
## 1 N/A 11UMC62 1990 5 2 12
## 2 N/A 11UMC63 1991 5 2 12
## 3 N/A 11UPS48 1988 6 2 10
## 4 N/A 12UUA14 1990 8 2 9
## 5 N/A 12UVA12 1989 1 1 1
## 6 N/A 11UMC51 1990 NA NA NA
## breeding_rank GlobalUniqueIdentifier CatalogNumber Locality
## 1 10 URN:NatureAlberta:ABATLAS1:I4063-HOFI I4063-HOFI 11UMN62
## 2 10 URN:NatureAlberta:ABATLAS1:I5194-HOFI I5194-HOFI 11UMN63
## 3 10 URN:NatureAlberta:ABATLAS1:C2013-HOFI C2013-HOFI 11UPG48
## 4 0 URN:NatureAlberta:ABATLAS1:A4101-HOFI A4101-HOFI 12UUL14
## 5 0 URN:NatureAlberta:ABATLAS1:B3021-HOFI B3021-HOFI 12UVL12
## 6 40 URN:NatureAlberta:ABATLAS1:I4060-HOFI I4060-HOFI 11UMN51
## TimeCollected CollectorNumber FieldNumber Remarks ProjectCode
## 1 <NA> 1960 NA <NA> ABATLAS1
## 2 <NA> 1960 NA <NA> ABATLAS1
## 3 <NA> 1317 NA <NA> ABATLAS1
## 4 <NA> 1128 NA <NA> ABATLAS1
## 5 <NA> 1250 NA <NA> ABATLAS1
## 6 <NA> 1941 NA <NA> ABATLAS1
## ProtocolType ProtocolCode ProtocolURL SurveyAreaIdentifier
## 1 Breeding Bird Atlas <NA> <NA> 1547
## 2 Breeding Bird Atlas <NA> <NA> 1548
## 3 Breeding Bird Atlas <NA> <NA> 2991
## 4 Breeding Bird Atlas <NA> <NA> 4156
## 5 Breeding Bird Atlas <NA> <NA> 5307
## 6 Breeding Bird Atlas <NA> <NA> 1536
## SamplingEventIdentifier SamplingEventStructure RouteIdentifier
## 1 I4063 <NA> <NA>
## 2 I5194 <NA> <NA>
## 3 C2013 <NA> <NA>
## 4 A4101 <NA> <NA>
## 5 B3021 <NA> <NA>
## 6 I4060 <NA> <NA>
## TimeObservationsStarted TimeObservationsEnded DurationInHours
## 1 <NA> <NA> <NA>
## 2 <NA> <NA> <NA>
## 3 <NA> <NA> <NA>
## 4 <NA> <NA> <NA>
## 5 <NA> <NA> <NA>
## 6 <NA> <NA> <NA>
## TimeIntervalStarted TimeIntervalEnded TimeIntervalsAdditive NumberOfObservers
## 1 <NA> <NA> NA 0
## 2 <NA> <NA> NA 0
## 3 <NA> <NA> NA 1
## 4 <NA> <NA> NA 0
## 5 <NA> <NA> NA 0
## 6 <NA> <NA> NA 0
## NoObservations ObservationCount ObservationDescriptor ObservationCount2
## 1 <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA>
## ObservationDescriptor2 ObservationCount3 ObservationDescriptor3
## 1 <NA> <NA> <NA>
## 2 <NA> <NA> <NA>
## 3 <NA> <NA> <NA>
## 4 <NA> <NA> <NA>
## 5 <NA> <NA> <NA>
## 6 <NA> <NA> <NA>
## ObservationCount4 ObservationDescriptor4 ObservationCount5
## 1 <NA> <NA> <NA>
## 2 <NA> <NA> <NA>
## 3 <NA> <NA> <NA>
## 4 <NA> <NA> <NA>
## 5 <NA> <NA> <NA>
## 6 <NA> <NA> <NA>
## ObservationDescriptor5 ObservationCount6 ObservationDescriptor6
## 1 <NA> <NA> <NA>
## 2 <NA> <NA> <NA>
## 3 <NA> <NA> <NA>
## 4 <NA> <NA> <NA>
## 5 <NA> <NA> <NA>
## 6 <NA> <NA> <NA>
## AllIndividualsReported AllSpeciesReported
## 1 <NA> Unknown
## 2 <NA> Unknown
## 3 <NA> Unknown
## 4 <NA> Unknown
## 5 <NA> Unknown
## 6 <NA> Unknown
Here we’re going to take a quick look at the spatial distribution
using ggplot2
and ggspatial
to get some
basemaps.
First let’s get an idea of how many distinct points there are (often multiple observations are recorded for the same location).
nrow(house_finches)
## [1] 7211
## [1] 1263
So we have 1263 sites for 7211 observations.
Next let’s convert our data to spatial data so we can plot it
spatially (i.e. make a map!). Note that we’re using CRS EPSG code of
4326 because that’s reflects unprojected, GPS data in lat/lon. First we
omit NA
s because sf
data frames cannot have
missing locations.
house_finches <- drop_na(house_finches, "longitude", "latitude")
house_finches_sf <- st_as_sf(house_finches,
coords = c("longitude", "latitude"), crs = 4326)
Now we’re ready to make a map of the distribution of observations.
We’ll use a baselayer from OpenStreetMap and then add our observations.
ggplot() +
annotation_map_tile(type = "osm", zoomin = 0) +
geom_sf(data = house_finches_sf) +
labs(caption = "Copyright OpenStreetMap contributors")
## Please note that rgdal will be retired during October 2023,
## plan transition to sf/stars/terra functions using GDAL and PROJ
## at your earliest convenience.
## See https://r-spatial.org/r/2023/05/15/evolution4.html and https://github.com/r-spatial/evolution
## rgdal: version: 1.6-7, (SVN revision 1203)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 3.4.1, released 2021/12/27
## Path to GDAL shared files: /usr/share/gdal
## GDAL binary built with GEOS: TRUE
## Loaded PROJ runtime: Rel. 8.2.1, January 1st, 2022, [PJ_VERSION: 821]
## Path to PROJ shared files: /home/runner/.local/share/proj:/usr/share/proj
## PROJ CDN enabled: TRUE
## Linking to sp version:2.0-0
## To mute warnings of possible GDAL/OSR exportToProj4() degradation,
## use options("rgdal_show_exportToProj4_warnings"="none") before loading sp or rgdal.
## Zoom: 6
## Fetching 12 missing tiles
##
|
| | 0%
|
|====== | 8%
|
|============ | 17%
|
|================== | 25%
|
|======================= | 33%
|
|============================= | 42%
|
|=================================== | 50%
|
|========================================= | 58%
|
|=============================================== | 67%
|
|==================================================== | 75%
|
|========================================================== | 83%
|
|================================================================ | 92%
|
|======================================================================| 100%
## ...complete!
Let’s count our observations for each site.
cnt <- house_finches_sf %>%
count(geometry)
ggplot() +
annotation_map_tile(type = "osm", zoomin = 0) +
geom_sf(data = cnt, aes(size = n)) +
labs(caption = "Copyright OpenStreetMap contributors")
## Zoom: 6
If we want to get fancy we can also create interactive maps using the
mapview
packages (see also the leaflet
for R
package).
For more complex, or detailed maps, we can use a variety of spatial data files to layer our data over maps of the area.
For this we’ll get some outlines of Canada and it’s Provinces and
Territories from rnaturalearth
.
canada <- ne_states(country = "canada", returnclass = "sf") %>%
st_transform(3347)
ggplot() +
theme_bw() +
geom_sf(data = canada)
Let’s add our observations (note that the data are transformed to
match the projection of the first layer, here the canada
data).
We can also focus on Alberta
ab <- filter(canada, name == "Alberta")
ggplot() +
theme_bw() +
geom_sf(data = ab) +
geom_sf(data = house_finches_sf, size = 0.5)
Perhaps we should see how many of these observations were made in parks.
First we’ll download and extract the Park shapefiles available from the Alberta Parks website.
dir.create("alberta_parks") # Create a folder for the data
download.file(
url = "https://www.albertaparks.ca/media/2941843/parks_and_protected_areas_alberta.zip",
destfile = "alberta_parks/parks_protected_areas_alberta.zip")
unzip("alberta_parks/parks_protected_areas_alberta.zip", exdir = "alberta_parks")
parks <- st_read("alberta_parks/Parks_Protected_Areas_Alberta.shp")
## Reading layer `Parks_Protected_Areas_Alberta' from data source
## `/home/runner/work/naturecounts/naturecounts/vignettes/articles/alberta_parks/Parks_Protected_Areas_Alberta.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 477 features and 18 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 170844.3 ymin: 5425575 xmax: 860845.9 ymax: 6659216
## Projected CRS: NAD83 / Alberta 10-TM (Forest)
Add this layer to our plot.
ggplot() +
theme_bw() +
geom_sf(data = ab) +
geom_sf(data = parks, colour = "darkgreen", fill = "forestgreen") +
geom_sf(data = house_finches_sf, size = 0.5)
Well it’s actually a bit difficult to tell, there are lots of small parks!
To solve this problem, we can merge our observations with the parks and plot those inside parks separately from those outside parks.
First we’ll transform our observation data to match the CRS of
parks
, then we’ll join the park information to our
observations, based on whether the observations overlap a park polygon
(by default this is a left join), and finally we’ll create a new column
outside_park
that is a category for out or in the park,
based on whether the observation was joined to a park name
(OC_NAME
).
house_finches_sf <- house_finches_sf %>%
st_transform(st_crs(parks)) %>%
st_join(parks) %>%
mutate(outside_park = if_else(is.na(OC_NAME), "Outside Park", "Inside Park"))
And now we can see that there are quite a few, if not more, observations outside of parks than in.
ggplot() +
theme_bw() +
geom_sf(data = ab) +
geom_sf(data = parks, colour = "darkgreen", fill = "forestgreen") +
geom_sf(data = house_finches_sf, size = 1) +
facet_wrap(~outside_park)
We might also be interested in observations over time.
First we’ll bin our yearly observations
house_finches_sf <- mutate(house_finches_sf,
years = cut(survey_year,
breaks = seq(1960, 2010, 10),
labels = seq(1960, 2000, 10), right = FALSE))
We’ll also want to see how many sample years there are per decade.
years <- house_finches_sf %>%
group_by(years) %>%
summarize(n = length(unique(survey_year)), .groups = "drop")
Now we can see how House Finch observations change over the years
ggplot() +
theme_bw() +
geom_sf(data = ab) +
geom_sf(data = parks, colour = "darkgreen", fill = "forestgreen") +
geom_sf(data = house_finches_sf, size = 1.5) +
geom_sf_text(data = years, x = 4427134, y = 2965275, hjust = 0, vjust = 1,
aes(label = paste0("n = ", n))) +
facet_wrap(~years)
We can also use some of the naturecounts
helper
functions to create presence/absence maps.
Here we download data from the RCBIOTABASE
collection,
make sure to keep only observations where all species and the location
were reported, create a new presence
column which is either
TRUE, FALSE, or NA for each sampling event. Finally we use the
format_zero_fill()
function to fill in sampling events
where cardinals (species_id
19360) were not detected
(presence would then be 0).
cardinals <- nc_data_dl(collection = "RCBIOTABASE", username = "testuser",
info = "nc_tutorial")
## Using filters: collections (RCBIOTABASE); fields_set (BMDE2.00-min)
## Collecting available records...
## collection nrecords
## 1 RCBIOTABASE 12838
## Total records: 12,838
##
## Downloading records for each collection:
## RCBIOTABASE
## Records 1 to 5000 / 12838
## Records 5001 to 10000 / 12838
## Records 10001 to 12838 / 12838
cardinals_zf <- cardinals %>%
filter(AllSpeciesReported == "Yes", !is.na(latitude), !is.na(longitude)) %>%
group_by(species_id, AllSpeciesReported, SamplingEventIdentifier) %>%
summarize(presence = sum(as.numeric(ObservationCount)) > 0, .groups = "drop") %>%
format_zero_fill(species = 19360, by = "SamplingEventIdentifier",
fill = "presence")
## - Converted 'fill' column (presence) from logical to numeric
A bit convoluted, but here we’ll grab coordinates for each Sampling Event. This is only necessary if there are errors (worth reporting!) where there are more than one lat/lon combo for each sampling event.
coords <- cardinals %>%
select("SamplingEventIdentifier", "latitude", "longitude") %>%
group_by(SamplingEventIdentifier, latitude, longitude) %>%
mutate(n = n()) %>% # Count number of unique coordinates
group_by(SamplingEventIdentifier) %>%
slice_max(n, with_ties = FALSE) %>% # Grab the most common coordinates for each event
select(-"n")
cardinals_zf <- left_join(cardinals_zf, coords, by = "SamplingEventIdentifier")
head(cardinals_zf)
## SamplingEventIdentifier species_id presence latitude longitude
## 1 RCBIOTABASE-10000-1 19360 0 45.58604 -77.48721
## 2 RCBIOTABASE-10001-1 19360 0 45.51110 -77.50533
## 3 RCBIOTABASE-10002-1 19360 0 45.50803 -77.50786
## 4 RCBIOTABASE-10003-1 19360 0 45.51110 -77.50533
## 5 RCBIOTABASE-10004-1 19360 0 45.51110 -77.50533
## 6 RCBIOTABASE-10005-1 19360 0 45.51110 -77.50533
Now that we have our presence/absence data for cardinals, we can create a map.
cnt <- st_as_sf(cardinals_zf, coords = c("longitude", "latitude"), crs = 4326) %>%
group_by(presence) %>%
count(geometry)
ggplot() +
annotation_map_tile(type = "osm", zoomin = 0) +
geom_sf(data = cnt, aes(size = n, colour = factor(presence)), alpha = 0.5) +
scale_colour_manual(name = "Presence/Absence", values = c("#31688E", "#440154"),
labels = c("1" = "Present", "0" = "Absent")) +
scale_size_continuous(name = "Number of Sampling Events", range = c(1, 20)) +
labs(caption = "Copyright OpenStreetMap contributors",
title = "Presence/Absence of Cardinals in the RCBIOTABASE collection")
## Zoom: 9
## Fetching 16 missing tiles
##
|
| | 0%
|
|==== | 6%
|
|========= | 12%
|
|============= | 19%
|
|================== | 25%
|
|====================== | 31%
|
|========================== | 38%
|
|=============================== | 44%
|
|=================================== | 50%
|
|======================================= | 56%
|
|============================================ | 62%
|
|================================================ | 69%
|
|==================================================== | 75%
|
|========================================================= | 81%
|
|============================================================= | 88%
|
|================================================================== | 94%
|
|======================================================================| 100%
## ...complete!
unlink("alberta_parks/", recursive = TRUE)